Effective Communication of Software Development Knowledge Through Community Portals

Effective management and exchange of knowledge is key in every software organization. Although various forms of documentation exist, software projects encounter many knowledge management challenges: How should knowledge be distributed? How should knowledge be kept up to date? How should feedback be solicited? How should knowledge be organized for easy access?

There is no roadmap for what kind of information is best presented in a given artifact and new forms of documentation, such as wikis and blogs, have evolved. Unlike more formal mechanisms, wikis and blogs are easy to create and maintain. However, wikis and blogs do not offer the same authoritativeness that comes with traditional documentation, and they can become outdated and less concise over time. While the informality of a wiki page or blog is sometimes enough, users often expect reviewed technical articles.

One mechanism that brings various communication channels together is the use of web or community portals. Web portals are not just used in software communities, but are essential to companies, such as Amazon.com, eBay or TripAdvisor, where they enable the development of communities around products. Similarly, many of today’s software projects wish to solicit feedback and input from a broader community of users, beta-testers, and stakeholders. Examples of community portals in software organizations include Microsoft’s MSDN or IBM’s Jazz.

In a paper that Peggy and I will present at ESEC/FSE 2011 in Szeged, Hungary, we report on an empirical study of the community portal for IBM’s Jazz: jazz.net. Using grounded theory, we developed a model that characterizes documentation artifacts in a community portal along the following eight dimensions:

  • Content: the type of content typically presented in the artifact.
  • Audience: the audience for which the artifact is intended.
  • Trigger: the motivation that triggers the creation of a new artifact.
  • Collaboration: the extent of collaboration during the creation of a new artifact.
  • Review: the extent to which new artifacts are reviewed before publication.
  • Feedback: the extent to which readers can give feedback.
  • Fanfare: the amount of fanfare with which a new artifact is released.
  • Time Sensitivity: the time sensitivity of information in the artifact.

In our study, we focused on four kinds of artifacts: the official product documentation accessible through the web and the product help menus, technical articles available in the jazz.net library, the Jazz team blog, and the team wiki.

These are some of our main findings:

  • Content in wiki pages is often stale. Therefore, readers will not look at the wiki for reliable information, but rather use it as a backup option if information is not available elsewhere. To communicate important information to the community, articles and blog posts are better suited.
  • The official product documentation is reviewed rigorously. With that in mind, it can serve as the most reliable way to communicate knowledge in a community portal.
  • When content is produced by a separate documentation team, updates may not be feasible. In such a case, information may become outdated quickly and can only be fixed with a new product release.
  • New blog posts created more “buzz” or fanfare than articles or wiki pages. Thus, if there is a need to make a project’s community aware of something, a blog post may be the best-suited medium.
  • Writing can be time-consuming, in particular for technical articles and blog posts. In addition, those media forms may need to undergo a review process. To get content out quickly, the wiki may be the best solution. However, readers may only find wiki pages if they are pointed to them explicitly.
  • To solicit feedback from readers, articles and blog posts typically offer more comment functionality than the official product documentation or the wiki.

One of the goals of our research is to provide advice to managers and developers on how to effectively use a community portal. Based on the findings from our study, we make the following recommendations:

  • Make content available, but clearly distinguish different media forms.
  • Move content into more formal media formats where appropriate.
  • Be aware of the implications of different media artifacts and channels.
  • Offer developers a medium with a low entry barrier for quick externalization of knowledge.
  • Involve the community in a project’s documentation.
  • Provide readers with an option to give feedback.

A pre-print of the paper is available here
(© ACM, 2011. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in the Proceedings of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) 2011.)

This is the abstract of the paper:

Knowledge management plays an important role in many software organizations. Knowledge can be captured and distributed using a variety of media, including traditional help files and manuals, videos, technical articles, wikis, and blogs. In recent years, web-based community portals have emerged as an important mechanism for combining various communication channels. However, there is little advice on how they can be effectively deployed in a software project.

In this paper, we present a first study of a community portal used by a closed source software project. Using grounded theory, we develop a model that characterizes documentation artifacts along several dimensions, such as content type, intended audience, feedback options, and review mechanisms. Our findings lead to actionable advice for industry by articulating the benefits and possible shortcomings of the various communication channels in a knowledge-sharing portal. We conclude by suggesting future research on the increasing adoption of community portals in software engineering projects.


What we know about Web 2.0 in Software Engineering — Part 2: Tags, feeds, and social networks

After focusing on wikis, blogs and microblogs in the last post, in this entry I’ll summarize the results of my (non-systematic) literature review on what we know about the use of tags, feeds, and social networks by software developers.


Due to some inconsistencies of what exactly a tag is, we have defined the term in our recent TSE paper [»]: A tag is a freely-chosen keyword or term that is associated with or assigned to a piece of information. In the context of software development, tags are used to annotate resources such as source files or test cases in order to support the process of finding these resources. Multiple tags can be assigned to one resource.

The concept of annotating resources is not new to software development. At IWPC 2004, Ahmed Hassan and Richard Holt presented an approach that recovers information from source control systems and attaches this information to the static dependency graph of a software system [»]. They refer to this attached information as source sticky notes and show that the sticky notes can help developers understand the architecture of large software systems. A complimentary approach that employs annotations edited by humans rather than automatically generated annotations was presented by Andrea Brühlmann and colleagues at Models 2008 [»]. They propose a generic approach to capture informal human knowledge in form of annotations during the reverse engineering process. Annotation types in their tool Metanool can be iteratively defined, refined and transformed, without requiring a fixed meta-model to be defined in advance. This strength of annotations — the ability to refine them iteratively — is also employed by the tool BITKit presented by Harold Ossher and colleagues [»]. In BITKit, tags are used to identify and organize concerns during pre-requirements analysis. The resulting tag structures can then be hardened into classifications to capture important concerns.

There is a considerable body of work on the use of annotations in source code. The use of task annotations in Java source code such as TODO, FIXME or HACK was studied by Margaret-Anne Storey and colleagues [»]. They describe how task management is negotiated between the more formal issue tracking systems and the informal annotations that developers write within their source code. They found that task annotations in source code have different meanings and are dependent on individual, team and community use. Based on this research, the tool TagSEA was developed [»]. TagSEA is a framework for tagging locations of interest within the Eclipse IDE, and it adds semantic information to annotations. The tool combines the concept of tagging and geographic navigation to make it easier to find information. TagSEA was evaluated in two longitudinal empirical studies that indicated that TagSEA was used to support reminding and refinding [»]. TagSEA was extended to include a Tours feature that allows programmers to give live technical presentations that combine static slides with dynamic content based on TagSEA annotations in the IDE [»].

TagSEA has been applied by other researchers for more advanced use cases. Quentin Boucher and colleagues describe how they have used source code tagging to identify features in the source code [»]. These tags are then used to prune source code for a pragmatic approach to software product line management. In eMoose, presented by Uri Dekel and James Herbsleb [»], developers can associate annotations or tag directives within API documentation. eMoose then pushes these annotations to the context of the invoking code by decorating the calls and augmenting the hover mechanism. The tool was evaluated in a lab study that demonstrated the directive awareness problem in traditional documentation use and the potential benefits of the eMoose approach [»].

In our own work, we have studied the role of tags in task management systems. In our initial study, we examined how tagging had been adopted and adapted in the work item tracking system of IBM’s Rational Team Concert IDE [»]. We found that the tagging mechanism was eagerly adopted by the team in our study, and that it had become a significant part of many informal processes. We distinguished the tags into four different categories: lifecycle-related, component-specific, cross-cutting, and idiosyncratic. When we replicated the study a year later with a different team, we were able to refine some of the categories and we found that tags were used to support finding of tasks, articulation work and information exchange. Implicit and explicit mechanisms had evolved to manage the tag vocabulary [»]. Our study was also replicated by Fabio Calefato and colleagues [»]. They confirmed our initial findings, and they discovered two additional tag categories, namely IDE-specific tags and divorced tags, i.e. tags that could only to be interpreted as part of a compound name.

Closely related to concept of tagging is the idea of social bookmarking, a method for organizing, storing, managing and searching bookmarks online. Such bookmarks are often organized using tags. Anja Guzzi and colleagues present Pollicino, a tool for collective code bookmarking. Their implementation was based on requirements gathered through an online survey and interviews with software developers. A user study found that Pollicino can be effectively used to document developer’s findings that can be used by other developers. This research will be presented at ICPC this year [»].


Feeds in software development are used to achieve awareness in collaborative development settings. As we found in a study published at ICSE 2010, feeds allow the tracking of work at a small scale, i.e. on a per bug or per commit basis [»]. The feed reader then becomes a personal inbox for development tasks that helps developers plan their day. However, in these kinds of feeds, information overload quickly becomes a problem. Thomas Fritz proposed an approach that integrates dynamic and static information in a development environment to allow developers to continuously monitor the relevant information in context of their work [»]. In a CHI paper from this year, Thomas Fritz and Gail Murphy give more details on their concept [»]: Through a series of interviews, they identified four factors that help developers determine relevancy of events in feeds: content, target of content, relation with the creator, and previous interactions.

Fabio Calefato and colleagues proposed a tool to embed social network information about distributed co-workers into IBM’s Jazz [»]. They used the FriendFeed aggregator to enhance the current task focused feed implementation with additional awareness information to foster the building of organizational values, attitudes, and inter-personal relationships.

Social networking

Research on social networking related to software development can be distinguished into two subgroups: Understanding the network structure in existing projects, and transferring ideas from social networking sites such as facebook into software development. Since the former isn’t really part of the adoption of Web 2.0 in software development, I’ll focus on the latter here. (For an overview of social networks mined from development projects, see the work of Christian Bird and colleagues from MSR 2006 [»] or from FSE 2008 [»] as well as Walt Scacchi’s FSE 2007 paper [»]).

Concepts from social networking sites such as facebook have been brought into software development through the Codebook project by Andrew Begel and colleagues [»]. Codebook is a social networking service in which developers can be friends not only with other developers but also with work artifacts. These connections then allow developers to keep track of task dependencies and to discover and understand connections in source code. Codebook was inspired by a survey with developers that found that the most impactful problems concerned finding and keeping track of other developers [»]. Based on the Codebook framework, Andrew Begel and colleagues also introduced a newsfeed [»] and a search portal [»].

What we know about Web 2.0 in Software Engineering — Part 1: Wikis, blogs, and microblogs

Web 2.0 technologies such as wikis, blogs, tags and feeds have been adopted and adapted by software engineers. A week before our second workshop on Web 2.0 for Software Engineering (Web2SE), I’ve conducted a (non-systematic) literature review to find out what we already know about the use of Web 2.0 by software developers. In this first of two blog posts, I’ll focus on wikis, blogs and micro-blogs.


One of the first articles describing the advantages and disadvantages of wikis in software development was authored by Panagiotis Louridas for IEEE Software in 2006 [»]. He explains that wikis offer flexibility that relieves project managers from having to get everything perfect from the beginning. Wikis are easy to change, but due to their flexibility, they require tolerance and openness in the organization that they are implemented in.

So far, the use of wikis by software developers has been studied in four main areas: documentation, requirements engineering, collaboration, and knowledge sharing.

Wikis cater to many of the patterns for consistent software documentation presented by Filipe Correia and colleagues [»]: information proximity, co-evolution, domain-structured information and integrated environments. One major advantage of using wikis for documentation is the possibility to combine different kinds of content into a single document. For example, Ademar Aguiar and colleagues introduce XSDoc wiki [»] to enable the combination of different kinds of software artifacts in a wiki structure. However, in a recent paper, Barthélémy Dagenais and Martin Robillard found that open source developers who originally selected a public wiki to host their documentation eventually moved to a more controlled infrastructure because of maintenance costs and decrease of authoritativeness [»].

The flexibility of wikis is well-suited for a process with as much uncertainty as requirements engineering [»]. Steffen Lohmann and colleagues present a wiki-based platform that implements several community-oriented features aimed at engaging a larger group of stakeholders in collecting, discussing, developing, and structuring software requirements [»]. David Ferreira and Alberto Rodrigues da Silva also present an approach for enhancing requirements engineering through wikis. Their approach focuses on the integration of a wiki into other software development tools [»]. Semantic wikis that enhance traditional wikis with ontologies offer the opportunity to further process wiki content. For example, Peng Liang and colleagues present an approach for requirements reasoning using semantic wikis [»].

Khalid Rames Al-asmari and Liguo Yu report on early experiences that show how wikis can support various collaboration tasks. They describe that wikis are easy to use, reliable, and inexpensive compared to other communication methods [»]. Several wiki-based tools aimed at supporting collaboration in software development have been proposed. Ken Bauer and colleagues present WikiDev 2.0, a wiki implementation that integrates information about various artifacts, clusters them, and presents views that cross tool boundaries. Annoki, a tool from the same research group presented by Brendan Tansey and Eleni Stroulia at Web2SE 2010 [»], supports collaboration by improving the organization, access, creation and display of content stored on the wiki. Galaxy Wiki presented by WenPeng Xiao and colleagues takes the integration of wikis into software development processes a step further [»]: Developers write source code in the form of a wiki page and are able to compile, execute and debug programs in wiki pages. Wikigramming by Takashi Hattori follows a similar approach [»]: Each Wiki page contains a Scheme function which is executed on a server. Users can edit any function at any time, and see the results of their changes right away. Wikigramming will be presented at Web2SE 2011.

Closely related to the use of wikis for collaboration is their use for knowledge sharing. Ammy Phuwanartnurak and David Hendry report preliminary findings on information sharing that they discovered through the analysis of wiki logs [»]. They found that many wiki pages were co-authored, and that wiki participation differs by role. Viktor Clerc and colleagues found that wikis are particularly good for managing and sharing architectural knowledge [»]. Another paper by Jörg Rech and colleagues describes that the knowledge sharing enabled by wikis allows for increased reuse of software artifacts of different kinds [»]. Filipe Correia and colleagues propose Weaki, a weakly-typed wiki prototype designed to support incremental formalization of structured knowledge in software development [»]. Semantic wikis also play a role in knowledge sharing: Björn Decker and colleagues propose support for structuring the knowledge on wikis in form of semantic wikis [»]. A more recent paper by Walid Maalej and colleagues provides a survey of the state-of-the-art of semantic wikis [»]. They conclude that semantic wikis provide lightweight, incremental and machine-readable software knowledge articulation and sharing facilities.

Further applications of wikis in software engineering include traceability and rationale management as described by Michael Geisser and colleagues [»], and the role of wiki-based editing functionalities on Q&A websites as we describe in our recent ICSE NIER paper [»].


The role of blogs in software development is not as well understood as the role of wikis. The most comprehensive study of the use of blogs by software developers to date was conducted by Dennis Pagano and Walid Maalej and will be published at MSR this year [»]. The authors found that the most popular topics that developers blog about are high-level concepts such as functional requirements and domain concepts and that developers are more likely to blog after corrective engineering and management activities than after forward engineering and re-engineering. An earlier study from the same research group by Walid Maaley and Hans‐Jörg Happel looks at how software developers describe their work [»]. They found that blogs and other media are a step towards diaries for software developers.

The theme of requirements engineering through blogs was also addressed by Shelly Park and Frank Maurer [»]. They discuss strategies for gathering requirements information through blogs. However, as observed by Norbert Seyff and colleagues, end-user involvement in software development is an ambivalent topic [»]. They present a mobile requirements elicitation tool that enables end-users to blog their requirements in situ without facilitation by analysts.

In a recent study that I conducted with Chris Parnin for Web2SE 2011, we found that blogs also play a role in documentation [»]. By analyzing the Google results for API calls of the jQuery API, we found that 87.9% of the API methods were covered by blogs, mainly featuring tutorials and personal experiences about those API methods.


The use of microblogging services such as twitter was first studied by Sue Black and colleagues for Web2SE 2010 [»]. Most respondents in their survey reported using several social media tools, and twitter as well as instant messaging were among the most popular tools. A more detailed study on the use of twitter by software developers was authored by Gargi Bougie and colleagues and will be published at Web2SE 2011 [»]. They found that software developers use twitter’s capabilities for conversation and information sharing extensively and that the use of twitter by software developers notably differs between users from different projects.

Several researchers have started to integrate microblogging services into development environments. Wolfgang Reinhardt presents an Eclipse-based prototype for integrating twitter and other microblogging services into the IDE [»]. Adinda, a browser-based IDE introduced by Arie van Deursen and colleages [»], also includes microblogging for traceability purposes, in particular aimed at tracking which requirements are responsible for particular design decisions, code fragments, test cases, etc. A more detailed description of the microblogging integration envisioned by this research group is given by Anja Guzzi and colleagues [»]. They present an approach that combines microblogs with interaction data collected from the IDE. Participants in their study used the microblogging tool to communicate future intentions, ongoing activities, reports about the past, comments and todos.

How do Programmers Ask and Answer Questions on the Web?

Stackoverflow.com is a popular Q&A website featuring questions and answers on a wide range of programming topics. In the 2 years since its foundation in 2008, more than 1 million questions have been asked on Stack Overflow, and more than 2.5 million answers have been provided.

Stack Overflow — like many other Q&A websites such as Yahoo! Answers, Answers.com or the recently created Facebook Questions — is founded on the success of social media and built around an “architecture of participation” where user data is aggregated as a side-effect of using Web 2.0 applications.

To understand the role of Q&A websites in the software documentation landscape, Ohad Barzilay, Peggy and I wrote a paper in which we pose research questions and report preliminary results to identify the role of Q&A websites in software development using qualitative and quantitative research methods. The paper was just accepted at the NIER (New Ideas and Emerging Results) track of ICSE 2011 in Hawaii.

We pose the following five research questions:

  1. What kinds of questions are asked on Q&A websites for programmers?
  2. Which questions are answered and which ones remain unanswered?
  3. Who answers questions and why?
  4. How are the best answers selected?
  5. How does a Q&A website contribute to the body of software development knowledge?

For the NIER paper, we focused on the first two questions. We created a script to extract questions along with all answers, tags and owners using the Stack Overflow API. We then analyzed quantitative properties of questions, answers and tags, and we applied qualitative codes to a sample of tags and questions.

Our preliminary findings indicate that Stack Overflow is particularly effective at code reviews, for conceptual questions and for novices. The most common questions include how-to questions and questions about unexpected behaviors.

Understanding the processes that lead to the creation of knowledge on Q&A websites will enable us to make recommendations on how individuals and companies, as well as tools for programmers, can leverage the knowledge and use Q&A websites effectively. It will also shed light on the credibility of documentation on these websites.

A pre-print of the paper is available here
(© ACM, 2011. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in the Proceedings of the International Conference on Software Engineering (ICSE) 2011.)

This is the abstract of the paper:

Question and Answer (Q&A) websites, such as Stack Overflow, use social media to facilitate knowledge exchange between programmers and fill archives with millions of entries that contribute to the body of knowledge in software development. Understanding the role of Q&A websites in the documentation landscape will enable us to make recommendations on how individuals and companies can leverage this knowledge effectively. In this paper, we analyze data from Stack Overflow to categorize the kinds of questions that are asked, and to explore which questions are answered well and which ones remain unanswered. Our preliminary findings indicate that Q&A websites are particularly effective at code reviews and conceptual questions. We pose research questions and suggest future work to explore the motivations of programmers that contribute to Q&A websites, and to understand the implications of turning Q&A exchanges into technical mini-blogs through the editing of questions and answers.

Web2SE 2011: 2nd International Workshop on Web 2.0 for Software Engineering


Web 2.0 technologies such as wikis, blogs, tags and feeds have been adopted and adapted by software engineers. With Web2SE, we aim to provide a venue for pertinent work by highlighting current state-of-the-art research, by identifying future research directions, and by discussing implications of Web 2.0 on software engineering.

Following the success of the first Web2SE at ICSE 2010, Web2SE 2011 (the 2nd International Workshop on Web 2.0 for Software Engineering) has been accepted at ICSE 2011 in Honolulu, Hawaii. We welcome research papers (max. 6 pages) as well as poster and position papers (max. 2 pages) as submissions. The final version of the accepted papers will be published in the ICSE Proceedings. The deadline for submission is January 28, 2011 (Call for papers). We look forward to reading your papers!

Web2SE 2011 is organized by Margaret-Anne (Peggy) Storey, Arie van Deursen, Andrew Begel, Sue Black and myself. The workshop website is online at https://sites.google.com/site/Web2SE2011/.

You can also follow us on twitter and connect with us on facebook or on LinkedIn.

Here’s the abstract:

Social software is built around an “architecture of participation” where user data is aggregated as a side-effect of using Web 2.0 applications. Web 2.0 implies that processes and tools are socially open, and that content can be used in several different contexts. Web 2.0 tools and technologies support interactive information sharing, data interoperability and user centered design. For instance, wikis, blogs, tags and feeds help us organize, manage and categorize content in an informal and collaborative way. Some of these technologies have made their way into collaborative software development processes and development platforms. These processes and environments are just scratching the surface of what can be done by incorporating Web 2.0 approaches and technologies into collaborative software development. Web 2.0 opens up new opportunities for developers to form teams and collaborate, but it also comes with challenges for developers and researchers. Web2SE aims to improve our understanding of how Web 2.0, manifested in technologies such as mashups or dashboards, can change the culture of collaborative software development.

FlexiTools 2011

I’m on the Program Committee for the Workshop on Flexible Modeling Tools (FlexiTools) at ICSE 2011 in Honolulu, Hawaii.

The workshop addresses the problem that formal modeling tools (such as UML diagram editors) and more informal but flexible, free-form approaches (such as white boards or office tools) have complementary strengths and weaknesses. Whichever practitioners choose for a particular task, they lose the advantages of the other, with attendant frustration and loss of productivity. The goal of this workshop is to develop flexible modeling tools that blend the advantages of modeling tools and the more free-form approaches.

The focus of FlexiTools 2011 will be on challenge problems in the area of flexible modeling with the goal to identify a foundational set of challenges and concerns for the field of flexible modeling, and promising directions for addressing each.

Prospective participants are invited to submit 2-5 page position papers on any topic relevant to the dichotomy between modeling tools and more free-form approaches.

I’m looking forward to reading your papers! Submissions are due on February 7, 2011.

This is the workshop website http://www.ics.uci.edu/~nlopezgi/flexitoolsICSE2011/

2010 – A year in travels

2010 has been a year on board of planes, ferries, buses, trains and cars for me. I even reached the threshold for Air Canada Elite status a few months ago.

New Year’s Eve is a nice opportunity to look back on a year of travels, as I’m sitting at the ferry terminal south of Vancouver waiting for my last trip of the year — the ferry back to Victoria.

In terms of travels, 2010 started for me in February with a trip to Vancouver for the 2010 Winter Olympics. In March, I did a trip that combined a fun visit to Las Vegas, a talk at the University of California in Irvine, and a visit to my family in Germany. The next trip in April took me to IBM Toronto for the CAS University Days.

The highlight of the year was the trip to Cape Town, South Africa, via London in May for ICSE (see separate blog post here). Following ICSE, I spent a good three months at IBM in Ottawa, from late May until early September. During that time, I flew out to Vancouver for a weekend and did several weekend trips to Montreal. I also drove down to New York state for cross-border shopping, to Providence, Rhode Island, and the Boston area to visit friends, and to IBM Hawthorne to give a talk.

After returning to Canada’s west coast in September, I spent a week travelling around southern British Columbia with relatives from Germany and I attended SPLASH in Reno, CSER and CASCON in Toronto, and FSE in Santa Fe. The final trip of the year was a drive down to Seattle from Vancouver.

Here’s a Bing map as a summary: http://goo.gl/6c7th. I’m looking forward to another year of travels starting soon!

Work Item Tagging: Communicating Concerns in Collaborative Software Development

After a round of minor revisions, the extension of our ICSE 2009 paper on tagging in software development has been accepted for the TSE Special Issue of Best Papers from ICSE 2009.

For the TSE paper, Peggy and I replicated the previously reported case study on how software developers use tags for work items in two ways:

  1. We extended the study of the Jazz team in several ways; and
  2. We conducted another case with a different team at IBM that was not as likely to be biased towards using the tagging feature.

For the Jazz case study, we analyzed another year of archival data on the tagging activities; we further observed the team and tagging of work items through an additional 5 more months on-site and we conducted interviews with two additional team members. For the second case study, we observed a different group of developers for 2 weeks, we conducted interviews with 6 individuals of various roles and we analyzed tagging data from a one year period. Many of the original findings were replicated through these additional case studies, but the key differences in the extended paper are that:

  1. Our findings on the collaborative aspects of tagging are more comprehensive because of the extended ethnography and additional 8 interviews over both cases.
  2. The categorization for the types of tags used is treated much more thoroughly with new categories emerging from the analysis (many across both cases).

This is the abstract of the paper:

In collaborative software development projects, work items are used as a mechanism to coordinate tasks and track shared development work. In this paper, we explore how “tagging”, a lightweight social computing mechanism, is used to communicate matters of concern in the management of development tasks. We present the results from two empirical studies over 36 and 12 months respectively on how tagging has been adopted and what role it plays in the development processes of several professional development projects with more than 1,000 developers in total. Our research shows that the tagging mechanism was eagerly adopted by the teams, and that it has become a significant part of many informal processes. Different kinds of tags are used by various stakeholders to categorize and organize work items. The tags are used to support finding of tasks, articulation work and information exchange. Implicit and explicit mechanisms have evolved to manage the tag vocabulary. Our findings indicate that lightweight informal tool support, prevalent in the social computing domain, may play an important role in improving team-based software development practices.

Update [November 11, 2010]: The preprint of the paper is now available here (IEEE Computer Society).

Congress 2010 — An interdisciplinary experience in Montreal

As suggested by the outside member on my PhD committee — Ray Siemens from the Department of English at UVic — I attended a day of Congress 2010 in Montreal in early June.

Congress 2010 refers to the Congress of the Humanities and Social Sciences, an event that is held once a year at a Canadian University. The Congress is the premiere destination for Canada’s scholarly community in the Humanities and Social Sciences. Congress 2010 in Montreal at Concordia University had about 9,000 attendees.

The main reason I attended was a panel discussion between Pierre Levy and Alan Liu, two of the leading minds in the field of new technologies and their impact on society. It was a bilingual conversation (truly Canadian, with simultaneous translation) called “Collective Intelligence or Silicon Cage?: Digital culture in the 21st century”. Levy’s point was basically that digital media can help us understand our knowledge as a society because it works as a mirror of collective intelligence, while Liu warned that we run the risk of monotony and singularity, and that everything converges towards the same idea if we don’t have several institutions in between individual and universal. I was fortunate enough to have lunch with both panelists, and could discuss some of my research with them. Collective intelligence and emergent knowledge structures in software development are closely related.

It was very stimulating and inspiring to attend an event from a different discipline, and I found it also really interesting to see how these events are organized in other disciplines. Some good ideas that we might be able to adapt for Software Engineering venues:

  • Use a mix of panel discussions and paper presentations, to foster a more interactive environment. We have controversial issues in Software Engineering as well.
  • Produce youtube clips with highlights from every day such as this one. It’s a great way to keep people in the loop who are unable to attend, and it captures the spirit of the event.
  • Choose a university campus as venue, especially one that’s right downtown. It was great to be fully emerged into Montreal during the lunch breaks, with a huge selection of lunch places.
  • Use a different pricing model. I paid a total of 15 dollars to attend Congress 2010.
  • Be interdisciplinary. Meeting researchers from other disciplines can be very inspiring. It forces us to focus on the essence of our work, gives us the opportunity to find a broader perspective, and can lead to great ideas. When it comes to related work, I’m thinking 15th century now…

ICSE 2010 highlights

Now that the papers from ICSE 2010 are available in the ACM digital library (Volume 1, Volume 2, workshops), it’s time for a blog post about my personal highlights from ICSE 2010 in Cape Town, South Africa, in May 2010. This is of course very subjective, and it follows the tradition that Jorge Aranda started last year.


For me, ICSE 2010 started off with SUITE, the 2nd International Workshop on Search-Driven Development organized by Sushil Bajracharya, Adrian Kuhn, Joel Ossher and Yunwen Ye.

I was pleasantly surprised by how well the very discussion-focused format of SUITE worked. The paper presentations were short (5 minutes) and they were all done in the morning. That left quite a bit of time for discussion in the morning, and even more time for discussion in the afternoon. These discussions didn’t seem overly regulated, and it was great to see how topics emerged.

Topics of the workshop ranged from API search and immediate search in the IDE to dynamic filtering and Semantic Web. As discussion topics for the afternoon we selected IDE integration, developer needs, and the creation of a reference collection for SUITE researchers. It was great to see that developer needs turned out to be the most popular topic — a first sign that the ICSE community is focusing on human aspects more and more.


FlexiTools, the workshop on Flexible Modeling Tools, organized by Harold Ossher, André van der Hoek, Margaret-Anne Storey, John Grundy and Rachel Bellamy covers a really interesting area. The workshop addresses the problem that formal modeling tools (such as UML diagram editors) and more informal but flexible, free-form approaches (such as white boards or office tools) have complementary strengths and weaknesses. The goal of this workshop is to develop flexible modeling tools that have the advantages of both approaches.

The workshop was structured into 3 paper sessions and a concluding discussion session. The day started with requirements for flexible modeling tools, focusing on support for creative work, support for incremental work and changing conditions, support for alternatives, and support for capturing the evolution of models. In the following session on “Unstructured to Structured”, we discussed how unstructured informal models could incrementally be transformed into structured formal models. Several tools such as BITKit were demoed in the afternoon session on Tool Infrastructure.


Due to several double-bookings, I wasn’t able to attend all of the working conference on Mining Software Repositories, but I did manage to present our MSR challenge paper on bug lifetimes in FreeBSD.

My personal highlight of MSR was Michele Lanza‘s keynote “The Visual Terminator”. His brilliant slides are online on slideshare.net. The keynote was about software visualization, a term he defined as “The use of computer graphics to understand software”. After telling the stories behind tools such as CodeCrawler and CodeCity, he argued that it is time to rethink software, that software is more than text, and that visualization is the key. While I didn’t agree to all the points he made — I don’t think “empirical validation of visualizations is suicide” — the keynote had everything that a good keynote should have: an outsider’s perspective (visualization is not the core topic of MSR), a great speaker, and some provocative insights. My favorite quote: “Academic research is like Formula 1: driving around in circles, wasting gasoline … but it generates spin-off values!”


Our workshop on Web 2.0 for Software Engineering went really well. Two presentations by Sue Black outlining the results of surveys on the use of social media and Web 2.0 in software development set the stage for the rest of the workshop and also provided insights into her own use of social media, in particular twitter. The following two sessions were more tool focused — from tagging and commit messages to Codebook, mashups and wikis. Three topics were chosen by the participants for the concluding panel discussion: Information overflow, Privacy and Ethics, and the potential move of the IDE to the browser. The detailed notes from the discussion are available here.

To address information overflow, several solutions such as generating less information, generating summaries, voting, context-sensitive labeling, automated categorization, interaction mining and interruption management were discussed.

The discussion regarding privacy and ethics started with the example of the use of twitter at conferences, in particular the question whether it is ethical to quote other individuals such as keynote speakers in conference related tweets. We moved on to discuss our ethical obligations — both moral and legal — before reading communication channels from Open Source projects. A general problem identified in this discussion is that we do not have good metaphors for privacy in social media. The metaphors of filing cabinets and public art projects were suggested.

Looking at projects such as Mozilla Bespin and Heroku, there seems to be the trend of moving the IDE into the browser. First of all, it needs to be noted that neither the use of Web 2.0 mechanisms nor the storage of data “in the cloud” implies that the IDE has to be in the browser. Nonetheless, despite challenges such as concurrency, editor speed and naming schemes, there are reasons why the IDE could move to the browser: accessibility, collaboration, data integration, the same configuration for everybody and superficial reasons such as “the browser is the future”. To conclude the discussion, we had a vote among the workshop participants asking if IDEs should move to the browser. 10 votes yes, 4.5 voted no, and there was 1 maybe.

In the spirit of the workshop topic, we used Web 2.0 tools throughout the workshop to take notes collaboratively. While our Google Waves suffered from low bandwidth, twitter with the #web2se hashtag was quite active.

Doctoral Symposium

Due to the overlap with Web2SE, I wasn’t able to attend the doctoral symposium, and only went there to present my paper on emergent knowledge structures.

Main Conference

The main conference started off with a video welcome address by Desmond Tutu, and an excellent keynote by Clem Sunter. Without notes or slides, he gave a highly entertaining lecture on scenario planning with regard to South Africa and the world. He used the metaphors of foxes and hedgehogs to demonstrate different approaches to dealing with business decisions. According to Clem, foxes embrace uncertainty and change their mind when they realize that there’s something better out there. They reach the optimal decisions through their knowledge of the system as a whole. Hedgehogs on the other hand simplify life around one idea, more or less disregarding everything else. Clem did a great job relating these ideas to current events and leaders. His website mindofafox.com has more details.

The presentations of our paper on Awareness 2.0 and our NIER paper on tags went well. The highlights from other paper presentations included Andy Begel‘s presentation on Codebook. The room was very crowded when he talked about the survey they did at Microsoft which revealed that engineers need better ways to find connections between each other. The Codebook framework addresses this issue. In his talk about Supporting Developers with Natural Language Queries, Michael Würsch presented a framework that is able to process guided-input natural language queries that resemble plain English. The approach is based on an OWL ontology and the Semantic Web. The next paper in the same session addressed a similar problem, focusing on questions that require the integration of different kinds of project information. Thomas Fritz and Gail Murphy started by identifying 78 questions that developers want to ask but for which support is missing. They introduced an information fragment model along with a prototype implementation that was evaluated with positive results. Rachel Bellamy did a great presentation on the paper Moving into a New Software Project Landscape. They conducted a grounded theory study with 18 newcomers across 18 projects and found a wide range of interesting things such as the three primary factors that impact the integration of newcomers: early experimentation, internalizing structures and cultures, and progress validation. Thomas Fritz‘ presentation in their distinguished paper A Degree-of-Knowledge Model to Capture Source Code Familiarity started off with a cartoon clip that did a great job at outlining the idea of the paper: If several people collaborative on writing a paper, the degree of knowledge in the paper of one individual author will increase when this author edits to paper, and it will decrease as soon as another author edits the paper. Transferring that idea to source code, they showed that the degree-of-knowledge model can provide better results than existing approaches.

Cape Town and surroundings

Cape Town turned out to be a great place for a conference: Spectacular scenery, amazing food and wildlife not far away. I posted my best pictures here (public facebook album).