Improving Access to Software Documentation — Two ICSE 2016 papers

This is a cross-post from the University of Adelaide’s CREST blog

Software development is knowledge-intensive, and the effective management and exchange of knowledge is key in every software project. While much of the information needed by software developers is captured in some form of documentation, it is often not obvious where a particular piece of information is stored. Different documentation formats, such as wikis or blogs, contain different kinds of information, written by different individuals and intended for different purposes. Navigating this documentation landscape is particularly challenging for newcomers.

In collaboration with researchers from Canada and Brazil, we are envisioning, developing and evaluating tool support around software documentation for different stakeholders. Two of these efforts will be presented at the International Conference on Software Engineering — the premier conference in software engineering — this year.

In the first project in collaboration with Martin Robillard from McGill University in Canada, we developed an approach to automatically augment API documentation with “insight sentences” from Stack Overflow — sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. The preprint of the corresponding paper is available here.



Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with “insight sentences” from Stack Overflow — sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.

The second project was developed in collaboration with three Brazilian researchers: Igor Steinmacher from the Federal University of Technology — Paraná, Tayana Conte from the Federal University of Amazonas, and Marco Gerosa from the University of São Paulo. We developed and evaluated FLOSScoach, a portal to support project newcomers, which we found to be effective at lowering project entry barriers. The preprint of the corresponding paper is available here and FLOSScoach is available here.



Community-based Open Source Software (OSS) projects are usually self-organized and dynamic, receiving contributions from distributed volunteers. Newcomers are important to the survival, long-term success, and continuity of these communities. However, newcomers face many barriers when making their first contribution to an OSS project, leading in many cases to dropouts. Therefore, a major challenge for OSS projects is to provide ways to support newcomers during their first contribution. In this paper, we propose and evaluate FLOSScoach, a portal created to support newcomers to OSS projects. FLOSScoach was designed based on a conceptual model of barriers created in our previous work. To evaluate the portal, we conducted a study with 65 students, relying on qualitative data from diaries, self-efficacy questionnaires, and the Technology Acceptance Model. The results indicate that FLOSScoach played an important role in guiding newcomers and in lowering barriers related to the orientation and contribution process, whereas it was not effective in lowering technical barriers. We also found that FLOSScoach is useful, easy to use, and increased newcomers’ confidence to contribute. Our results can help project maintainers on deciding the points that need more attention in order to help OSS project newcomers overcome entry barriers.