After focusing on wikis, blogs and microblogs in the last post, in this entry I’ll summarize the results of my (non-systematic) literature review on what we know about the use of tags, feeds, and social networks by software developers.
Due to some inconsistencies of what exactly a tag is, we have defined the term in our recent TSE paper [»]: A tag is a freely-chosen keyword or term that is associated with or assigned to a piece of information. In the context of software development, tags are used to annotate resources such as source files or test cases in order to support the process of finding these resources. Multiple tags can be assigned to one resource.
The concept of annotating resources is not new to software development. At IWPC 2004, Ahmed Hassan and Richard Holt presented an approach that recovers information from source control systems and attaches this information to the static dependency graph of a software system [»]. They refer to this attached information as source sticky notes and show that the sticky notes can help developers understand the architecture of large software systems. A complimentary approach that employs annotations edited by humans rather than automatically generated annotations was presented by Andrea Brühlmann and colleagues at Models 2008 [»]. They propose a generic approach to capture informal human knowledge in form of annotations during the reverse engineering process. Annotation types in their tool Metanool can be iteratively defined, refined and transformed, without requiring a fixed meta-model to be defined in advance. This strength of annotations — the ability to refine them iteratively — is also employed by the tool BITKit presented by Harold Ossher and colleagues [»]. In BITKit, tags are used to identify and organize concerns during pre-requirements analysis. The resulting tag structures can then be hardened into classifications to capture important concerns.
There is a considerable body of work on the use of annotations in source code. The use of task annotations in Java source code such as TODO, FIXME or HACK was studied by Margaret-Anne Storey and colleagues [»]. They describe how task management is negotiated between the more formal issue tracking systems and the informal annotations that developers write within their source code. They found that task annotations in source code have different meanings and are dependent on individual, team and community use. Based on this research, the tool TagSEA was developed [»]. TagSEA is a framework for tagging locations of interest within the Eclipse IDE, and it adds semantic information to annotations. The tool combines the concept of tagging and geographic navigation to make it easier to find information. TagSEA was evaluated in two longitudinal empirical studies that indicated that TagSEA was used to support reminding and refinding [»]. TagSEA was extended to include a Tours feature that allows programmers to give live technical presentations that combine static slides with dynamic content based on TagSEA annotations in the IDE [»].
TagSEA has been applied by other researchers for more advanced use cases. Quentin Boucher and colleagues describe how they have used source code tagging to identify features in the source code [»]. These tags are then used to prune source code for a pragmatic approach to software product line management. In eMoose, presented by Uri Dekel and James Herbsleb [»], developers can associate annotations or tag directives within API documentation. eMoose then pushes these annotations to the context of the invoking code by decorating the calls and augmenting the hover mechanism. The tool was evaluated in a lab study that demonstrated the directive awareness problem in traditional documentation use and the potential benefits of the eMoose approach [»].
In our own work, we have studied the role of tags in task management systems. In our initial study, we examined how tagging had been adopted and adapted in the work item tracking system of IBM’s Rational Team Concert IDE [»]. We found that the tagging mechanism was eagerly adopted by the team in our study, and that it had become a significant part of many informal processes. We distinguished the tags into four different categories: lifecycle-related, component-specific, cross-cutting, and idiosyncratic. When we replicated the study a year later with a different team, we were able to refine some of the categories and we found that tags were used to support finding of tasks, articulation work and information exchange. Implicit and explicit mechanisms had evolved to manage the tag vocabulary [»]. Our study was also replicated by Fabio Calefato and colleagues [»]. They confirmed our initial findings, and they discovered two additional tag categories, namely IDE-specific tags and divorced tags, i.e. tags that could only to be interpreted as part of a compound name.
Closely related to concept of tagging is the idea of social bookmarking, a method for organizing, storing, managing and searching bookmarks online. Such bookmarks are often organized using tags. Anja Guzzi and colleagues present Pollicino, a tool for collective code bookmarking. Their implementation was based on requirements gathered through an online survey and interviews with software developers. A user study found that Pollicino can be effectively used to document developer’s findings that can be used by other developers. This research will be presented at ICPC this year [»].
Feeds in software development are used to achieve awareness in collaborative development settings. As we found in a study published at ICSE 2010, feeds allow the tracking of work at a small scale, i.e. on a per bug or per commit basis [»]. The feed reader then becomes a personal inbox for development tasks that helps developers plan their day. However, in these kinds of feeds, information overload quickly becomes a problem. Thomas Fritz proposed an approach that integrates dynamic and static information in a development environment to allow developers to continuously monitor the relevant information in context of their work [»]. In a CHI paper from this year, Thomas Fritz and Gail Murphy give more details on their concept [»]: Through a series of interviews, they identified four factors that help developers determine relevancy of events in feeds: content, target of content, relation with the creator, and previous interactions.
Fabio Calefato and colleagues proposed a tool to embed social network information about distributed co-workers into IBM’s Jazz [»]. They used the FriendFeed aggregator to enhance the current task focused feed implementation with additional awareness information to foster the building of organizational values, attitudes, and inter-personal relationships.
Research on social networking related to software development can be distinguished into two subgroups: Understanding the network structure in existing projects, and transferring ideas from social networking sites such as facebook into software development. Since the former isn’t really part of the adoption of Web 2.0 in software development, I’ll focus on the latter here. (For an overview of social networks mined from development projects, see the work of Christian Bird and colleagues from MSR 2006 [»] or from FSE 2008 [»] as well as Walt Scacchi’s FSE 2007 paper [»]).
Concepts from social networking sites such as facebook have been brought into software development through the Codebook project by Andrew Begel and colleagues [»]. Codebook is a social networking service in which developers can be friends not only with other developers but also with work artifacts. These connections then allow developers to keep track of task dependencies and to discover and understand connections in source code. Codebook was inspired by a survey with developers that found that the most impactful problems concerned finding and keeping track of other developers [»]. Based on the Codebook framework, Andrew Begel and colleagues also introduced a newsfeed [»] and a search portal [»].