|
Motivation:
Program comments have long been used as a common practice for
improving inter-programmer communication and code readability.
Modern software including Linux kernel contains
millions lines of comments.
These comments contains lots of useful information
and can guide the design of annotation langauges, debugging tools, programming languages, IDEs, etc.
Unfortunately, comments are not used to their maximum potential.
In addition, unlike source code, comments cannot be tested. As a result, incorrect or
obsolete comments can mislead programmers and introduce new bugs
later.
What we have done in this direction: We take an initiative to investigate how to explore comments beyond their current usage. Specifically, we study the feasibility and benefits of automatically analyzing comments to detect software bugs and bad comments. Our solution, iComment, combined four techqniues, Natural Language Processing (NLP), Machine Learning, Statistics, and Program Analysis, to achive the goal above. We evaluated iComment on four large code bases: Linux, Mozilla, Wine and Apache. Our experimental results show that iComment automatically extracts 1832 rules from comments with 90.8-100% accuracy and detects 60 comment-code inconsistencies, 33 new bugs and 27 bad comments, in the latest versions of the four programs. Nineteen of them (12 bugs and 7 bad comments) have already been confirmed by the corresponding developers while the others are currently being analyzed by the developers. Related Publications: |
||||
| ||||
|
NLP Tools We Used: Semantic Role Labeler, Weka |
||||
|
Future work (more to come): |