While much of the knowledge needed to develop software is captured in some form of documentation, there is often a gap between the information needs of software developers and the structure of this documentation. Any kind of hierarchical structure with sections and subsections can only enable effective navigation if the section headers are adequate cues for the information needs of developers.
To help developers navigate documentation, during my PostDoc with Martin Robillard at McGill University, we developed a technique for automatically extracting task descriptions from software documentation. Our tool, called TaskNav, suggests these task descriptions in an auto-complete search interface for software documentation along with concepts, code elements, and section headers.
We use natural language processing (NLP) techniques to detect every passage in a documentation corpus that describes how to accomplish some task. The core of the task extraction process is the use of grammatical dependencies identified by the Stanford NLP parser to detect every instance of a programming action described in a documentation corpus. Different dependencies are used to account for different grammatical structures (e.g., “returning an iterator”, “return iterator”, “iterator returned”, and “iterator is returned”):
In the easiest case, a task is indicated by a direct object relationship, as in the example shown above. TaskNav uses this information to extract two tasks descriptions from the example sentence: “generate receipt” and “generate other information”.
Once the TaskNav user runs a search query after selecting the search terms from auto-complete, search results are presented in a sidebar. When the user selects a result, the corresponding document is opened in TaskNav. The paragraph that matched the query is highlighted, and the document is automatically scrolled to that paragraph.
We conducted a field study in which six professional developers used TaskNav for two weeks as part of their ongoing work. We found search results identified through extracted tasks to be more helpful to developers than those found through concepts, code elements, and section headers.
TaskNav can automatically analyze and index any documentation corpus based on a starting URL and some configuration parameters, such as which HTML tags should be ignored. Documentation users can benefit from TaskNav by taking advantage of the task-based navigation offered by the auto-complete search interface. For documentation writers, TaskNav provides analytics that show how documentation is used (e.g., top queries, most frequently read documents, and unsuccessful searches). Researchers can benefit from the data accumulated by TaskNav’s logging mechanism as it provides detailed data on how software developers search and use software documentation.