Category Archives: Uncategorized

The Implications of How We Tag Software Artifacts: Exploring Different Schemata and Metadata for Tags

Social tagging has been adopted by software developers in various contexts from source code to work items and build definitions. While the success of tagging is usually attributed to the simplicity of tags, the implementation details of tagging systems vary significantly in terms of metadata, schemata and semantics. In a position paper that Peggy and I recently wrote for Web2SE, we argue that academia and industry should be aware of these differences and that we should start to examine their implications.

The idea of analyzing different dimensions of tagging systems is not new. A very detailed taxonomy is given by Marlow et al. They identify the following seven dimensions in the design of a tagging system:

  • Tagging rights: Users can tag everybody’s resources vs. users can only tag their own resources.
  • Tagging support: Blind tagging (users cannot see each other’s tags) vs. viewable tagging (users can see each other’s tags) vs. suggestive tagging (the system suggests tags to users).
  • Aggregation: Bag model (allows duplicate tags per resource) vs. set model (no duplicates).
  • Type of object: Type of the resource to be tagged.
  • Source of material: Resource is supplied by the systems vs. resource is supplied by the users.
  • Resource connectivity: Linked vs. grouped vs. none (possible connections between the resources).
  • Social connectivity: Linked vs. grouped vs. none (possible connections between the users).

While these dimensions apply to tagging systems used by software developers, studying tagging systems used by software developers such as ICICLE, TagSEA, IBM’s Jazz, BITKit, Google Code, ConcernMapper and Concern Graphs reveals additional dimensions on top of Marlow’s taxonomy.

We identified the following additional dimensions:

  • Pre-defined vs. user-defined: Most current tagging systems are based on the concept of tags as “freely-chosen keywords or terms that are associated with or assigned to a piece of information”. However, in older tagging systems such as ICICLE, possible keywords were pre-defined, and software developers were not able to add new keywords to the system. In a dynamic environment such as software development, the just-in-time addition of new tags is the more promising approach.
  • Metadata: Different tagging systems store different amounts of metadata. For example, in the case of tagging work items in IBM’s Jazz, information such as the tag author and the time a tag was applied to a work item can only be identified by browsing the work item’s history. In other systems such as TagSEA, the author and time can be explicitly added to each tag instance, and tags can be searched by their authors and creation time. In order to keep the simplicity, tag authors should not be required to add metadata. However, all metadata that can be recorded automatically should be stored to provide additional context.
  • Semantics: While most tagging systems treat keywords simply as terms that are associated with artifacts, some systems go beyond that and add semantics to tags. An interesting approach is taken by labels in the issue tracker of Google Code, which goes beyond basic labels to support key-value labels. Key-value labels contain one or more dashes, and the part before the first dash is considered to be a field name while the part after that dash is considered to be the value. Studying the use of key-value labels in Google Code is part of our ongoing work.
  • Hierarchies: Some tagging systems explicitly support tag hierarchies, using a dot-notation (e.g., TagSEA). Keywords that have dots in them can be treated as hierarchical, and they can be displayed in tree-views. In other systems such as IBM’s Jazz, some developers use the dot-notation even though there is no explicit support for hierarchies. A flexible approach that offers additional views when needed is promising.
  • Single type of resource vs. multiple types: Software developers handle many different kinds of artifacts from source code and work items to build scripts. Nevertheless, many tagging systems for software developers only support tagging a single kind of artifact. One exception is TagSEA. It allows software developers to tag locations in source code — called waypoints — and artifacts such as files, and it shows different kinds of artifacts in a single view. This allows for grouping and relating different kinds of artifacts while keeping the simplicity of tags.
  • Integration: Another dimension is the extent to which the tagging mechanism is integrated with other tooling. Some systems support social tagging of source code, but require the user to post code fragments on public servers before tags can be applied to code fragments (e.g., DZone Snippets and ByteMycode). In other systems such as IBM’s Jazz or TagSEA, the tagging mechanism is part of the IDE. With the recent trend of moving the IDE into the browser, tagging artifacts online is a promising approach.

Update [June 6, 2010]: The paper is now available here (ACM Digital Library).

Advertisements

Bridging Lightweight and Heavyweight Task Organization: The Role of Tags in Adopting New Task Categories

Our submission to the NIER (New Ideas and Emerging Results) track at ICSE 2010 has been accepted!

In this paper, Peggy and I explore the idea that the use of tags in task management systems could lead to the introduction of new task categories. This research was prompted by the observation that the software development team in one of our Jazz case studies used tags such as linux and windows to indicate that a particular task was specific to an operating system. During the replication of our study a year later, we observed that the team didn’t use these tags anymore, instead the category Operating System had been made a field in every task.

Looking at this phenomenon in more detail, we found that there were other instances: Tags such as testing and selfhosting had been replaced by a How Found category, and tags such as eclipse, firefox and ie had been replaced by a Client category.

Based on these preliminary insights, we pose the following research questions:

RQ How can tagging play a role in bridging lightweight task management and heavyweight task management?

  1. What role do tags play in the adoption of new task categories?
  2. How can data on the use of tags help determine the right balance between lightweight and heavyweight task organization?
  3. How is software developers’ use of tags for task organization different from the tag use of users outside of software development?

We have focused on the first sub question for the NIER paper, but the positive comments from our reviewers encouraged us to keep working on this research. Looking beyond Jazz, a very interesting case is given by the way labels are implemented in Google Code (see: http://code.google.com/p/support/wiki/IssueTracker#Labels). Their concept of labels is similar to social tagging in other task management systems. However, the Google Code issue tracker goes beyond basic labels to support key-value labels. Key-value labels contain one or more dashes, and the part before the first dash is considered to be a field name while the part after that dash is considered to be the value. For each project, a list of predefined labels and their meaning can be specified. Studying how developers make use of this approach is on top of our to-do list.

This is the preliminary abstract of our NIER paper:

In collaborative software development projects, tasks are often used as a mechanism to coordinate and track shared development work. Modern development environments provide explicit support for task management where tasks are typically organized and managed through predefined categories. Although there have been many studies that analyze data available from task management systems, there has been relatively little work on the design of task management tools. In this paper we explore how tagging with freely assigned keywords provides developers with a lightweight mechanism to further categorize and annotate development tasks. We investigate how tags that are frequently used over a long period of time reveal the need for additional predefined categories of keywords in task management tool support. Finally, we suggest future work to explore how integrated lightweight tool features in a development environment may improve software development practices.

Update [June 6, 2010]: The paper is now available here (ACM Digital Library).

The Role of Emergent Knowledge Structures in Collaborative Software Development

One step closer to the PhD candidacy: My submission to the doctoral symposium at ICSE 2010 has been accepted!

In the research abstract, I propose to study the role of emergent knowledge structures in software development. We define emergent knowledge structures as “tangible ad hoc artifacts that are created as part of informal individual or collaborative processes in software development but that are not part of the software product”. Examples include comments on source code and tasks, wikis, blogging, micro-blogging and tagging. Due to the recent addition of Web 2.0 style collaboration functionality such as social tagging and wikis to software development environments, more and more of these knowledge structures “emerge”. There are usually no prescribed processes attached to their creation.

We have two main research questions:

  1. Which software development processes do lightweight emergent structures support?
  2. How can tools for software developers leverage the knowledge from lightweight emergent structures?

We expect the following main contributions from this research:

  • Identifying the software development processes that are supported by lightweight emergent structures.
  • Characterizing the knowledge that lightweight emergent structures add to a software project.
  • Improving tool support for lightweight emergent structures.

Our recent papers on tagging, dashboards and feeds and the tool ConcernLines as well as our Web2SE workshop all fit into this research, and we are looking forward to conducting more exciting research on this topic.

This is the preliminary abstract of the research abstract:

The focus of software development tools has shifted towards team-aware tools that explicitly support communication and cooperation. Many of these collaboration features draw on lightweight technologies that are associated with Web 2.0 such as social tagging or wikis. We propose to study the role of emergent knowledge structures that are created through lightweight collaboration in software development. Using a mixed-methods approach, we aim to investigate which software development processes are supported by emergent knowledge structures and how tool support can leverage these structures. Our goal is to assist both managers and developers in their decisions about using lightweight collaboration tools in their software development projects.

Update [June 6, 2010]: The paper is now available here (ACM Digital Library).

Awareness 2.0: Staying Aware of Projects, Developers and Tasks using Dashboards and Feeds — ICSE 2010

Our paper on Awareness 2.0 in software development has been accepted at ICSE 2010!

In this paper, Peggy and I describe the results of an empirical study we conducted in summer the 2009 on the role of dashboards and feeds in collaborative software development. The developers in our study used IBM’s Jazz. We were surprised to find that dashboards and feeds become particularly important in critical project phases, such as the last few weeks before a release. Software developers rely on them for task prioritization, for the identification of bottlenecks, and for staying aware of other projects, developers and their tasks. As with our previous study on the role of tags in software development, we collected both qualitative and quantitative data and focused on the how and why of developers’ tool use.

We’re looking forward to presenting this work in South Africa in May 2010.

This is the preliminary abstract:

Software development teams need to maintain awareness of various different aspects ranging from overall project status and process bottlenecks to current tasks and incoming artifacts. Currently there is a lack of theoretical foundations to guide tool selection and tool design to best support awareness tasks. In this paper, we explore how the combination of highly configurable project, team and contributor dashboards along with individual event feeds is used to accomplish extensive awareness. Our results stem from an empirical study of several large development teams, with a detailed study of a team of 150 developers and additional data from another four project teams. We present how dashboards become pivotal to task prioritization in critical project phases and how they stir competition while feeds are used for short term planning. Our findings indicate that the distinction between high-level and low-level awareness is often unclear and that integrated tooling could improve development practices.

Update [June 6, 2010]: The paper is now available here (ACM Digital Library).

Web2SE: First Workshop on Web 2.0 for Software Engineering at ICSE 2010

http://sites.google.com/site/web2se/

Web 2.0 technologies such as wikis,  blogs,  tags and feeds have been adopted and adapted by software engineers. With Web2SE, we aim to provide a venue for pertinent work by highlighting current state-of-the-art research, by identifying future research directions, and by discussing implications of Web 2.0 on software engineering.

Web2SE has been accepted as a workshop at ICSE 2010 and is organized by Margaret-Anne (Peggy) Storey, Arie van Deursen, Kate Ehrlich and myself.

We will accept research papers (max. 6 pages) and poster papers as well as position papers (max. 2 pages). The final version of the accepted papers will be published in the ICSE Companion and will also be made available during the workshop. The deadline for submission is January 31, 2010 (Call for papers). We look forward to reading your papers.

Here’s the abstract:

Social software is built around an “architecture of participation” where user data is aggregated as a side-effect of using Web 2.0 applications. Web 2.0 implies that processes and tools are socially open, and that content can be used in several different contexts. Web 2.0 tools and technologies support interactive information sharing, data interoperability and user centered design. For instance, wikis, blogs, tags and feeds help us organize, manage and categorize content in an informal and collaborative way. One goal of this workshop is to investigate how these technologies can improve software development practices. Some of these technologies have made their way into collaborative software development processes such as Agile and Scrum, and in development platforms such as Rational Team Concert which draw their inspiration from Web 2.0. These processes and environments are just scratching the surface of what can be done by incorporating Web 2.0 approaches and technologies into collaborative software development. This workshop aims to improve our understanding of how Web 2.0, manifested in technologies such as mashups or dashboards, can change the culture of collaborative software development.

Moving to WordPress

Welcome to my new website, now powered by WordPress!

I’ve decided to move to WordPress for three reasons: One, it gave me an opportunity to clean up the HTML of my website content that was the result of mixing HTML editing and Visual editing. Two, by having this website in form of a blog, I’ll be able to keep you all updated with my latest activities. Finally, after recently purchasing a new smartphone, I appreciate the explicit support that WordPress provides for mobile browsing.