Stackoverflow.com is a popular Q&A website featuring questions and answers on a wide range of programming topics. In the 2 years since its foundation in 2008, more than 1 million questions have been asked on Stack Overflow, and more than 2.5 million answers have been provided.
Stack Overflow — like many other Q&A websites such as Yahoo! Answers, Answers.com or the recently created Facebook Questions — is founded on the success of social media and built around an “architecture of participation” where user data is aggregated as a side-effect of using Web 2.0 applications.
To understand the role of Q&A websites in the software documentation landscape, Ohad Barzilay, Peggy and I wrote a paper in which we pose research questions and report preliminary results to identify the role of Q&A websites in software development using qualitative and quantitative research methods. The paper was just accepted at the NIER (New Ideas and Emerging Results) track of ICSE 2011 in Hawaii.
We pose the following five research questions:
- What kinds of questions are asked on Q&A websites for programmers?
- Which questions are answered and which ones remain unanswered?
- Who answers questions and why?
- How are the best answers selected?
- How does a Q&A website contribute to the body of software development knowledge?
For the NIER paper, we focused on the first two questions. We created a script to extract questions along with all answers, tags and owners using the Stack Overflow API. We then analyzed quantitative properties of questions, answers and tags, and we applied qualitative codes to a sample of tags and questions.
Our preliminary findings indicate that Stack Overflow is particularly effective at code reviews, for conceptual questions and for novices. The most common questions include how-to questions and questions about unexpected behaviors.
Understanding the processes that lead to the creation of knowledge on Q&A websites will enable us to make recommendations on how individuals and companies, as well as tools for programmers, can leverage the knowledge and use Q&A websites effectively. It will also shed light on the credibility of documentation on these websites.
A pre-print of the paper is available here
(© ACM, 2011. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in the Proceedings of the International Conference on Software Engineering (ICSE) 2011.)
This is the abstract of the paper:
Question and Answer (Q&A) websites, such as Stack Overflow, use social media to facilitate knowledge exchange between programmers and fill archives with millions of entries that contribute to the body of knowledge in software development. Understanding the role of Q&A websites in the documentation landscape will enable us to make recommendations on how individuals and companies can leverage this knowledge effectively. In this paper, we analyze data from Stack Overflow to categorize the kinds of questions that are asked, and to explore which questions are answered well and which ones remain unanswered. Our preliminary findings indicate that Q&A websites are particularly effective at code reviews and conceptual questions. We pose research questions and suggest future work to explore the motivations of programmers that contribute to Q&A websites, and to understand the implications of turning Q&A exchanges into technical mini-blogs through the editing of questions and answers.