As rapid advances in computing hardware have led to dramatic improvement in
computer performance, the issues of reliability, availability, maintainability,
and cost of ownership are becoming increasingly important. Unfortunately,
software bugs continue to be frequent, accounting for as much as 40% of
computer system failures. Software bugs can crash the system, making the
service unavailable. Moreover, "silent" bugs that go undetected can
corrupt information, generating wrong outputs or control commands, and
destroying valuable information. According to the National Institute of
Standards and Technology, software bugs cost the U.S. economy an estimated
$59.5 billion annually, or approximately 0.6% of the gross domestic product!
Unfortunately, identifying and fixing software bugs is a task that requires enormous human labor. Despite this enormous effort, software released to end-users still contains numerous bugs. These bugs continue to consume human time in the form of bug reporting at the user site, user-vendor communication, and subsequent "bug-fix" software releases. We need, above all, techniques that automate the process of debugging as much as possible.
In particular, debugging parallel applications is especially difficult because parallel programs suffer from not only bugs that commonly exist in sequential programs but also special types of bugs such as deadlocks and data races. Many of these bugs are non-deterministic, making interactive debugging a time-consuming process, which significantly affects the productivity of parallel application development. This problem is becoming increasingly severe as the demand for ever more computational capabilities has driven the creation of terascale parallel systems. Most existing parallel debugging tools are insufficient to meet such challenge because it is prohibitively difficult to use an interactive debugger with only basic functionalities to debug a parallel program on a system with more than thousands of nodes. Moreover, many timing-related bugs surface only on terascale systems and are not exposed on a small-scale testbed.
To address the above problems, the goal of our project is to efficiently and effectively detect bugs in software to improve the software robustness and security. In addition, we also explore techniques to let software surviving bugs without restarting.
Our project consists of the following research thrusts:-
Static checking: exploit intelligent techniques such as data mining in source code analysis
- AutoISES: Automatically Inferring Security Specifications and Detecting Violations. [USENIX Security'08]
- /* iComment: Bugs or Bad Comments? */ [SOSP'07]
- MUVI: Automatically Inferring Multi-Variable Access Correlations and Detecting Related Semantic and Concurrency Bugs. [SOSP'07]
- HotComments: How to Make Program Comments More Useful? [HotOS]
- CP-Miner: A Tool forFinding Copy-paste and Related Bugs in Operating System Code
- PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code [ESEC/FSE'05]
-
Dynamic checking: Dynamic bug detection with novel architecture and operating system support
- MUVI: Automatically Inferring Multi-Variable Access Correlations and Detecting Related Semantic and Concurrency Bugs. [SOSP'07]
- HARD: Hardware-Assisted Lockset-based Race Detection [HPCA'07]
- AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants [ASPLOS'06]
- PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection [Micro'06]
- SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs [HPCA'05]
- iWatcher: Efficient Architecture Support for Software Debugging [ISCA'04] (Also selected by IEEE Micro Special Issue on Top Picks from Architecture Conferences) [Also published in ACM TACO]
- AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-based Invariants [Micro'04]
- Triage: Diagnosing Production Run Failures at the User's Site. [SOSP'07]
- Delta Execution for Software Reliability [HotDep'07]
- Flashback: A Light-weight Extension for Rollback and Deterministic Replay for Software Debugging [USENIX'04]
- Rx: Treating bugs as allergies---a safe method to survive software failure [SOSP'05]
- Treating bugs as allergies: A safe method for surviving software failures [HotOS'05].
- AutoISES: Automatically Inferring Security Specifications and Detecting Violations. [USENIX Security'08]
- Sweeper: A Lightweight End-to-end System for Defending Against Fast Worms [EuroSys'07]
- LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting General Security Attacks [Micro'06]
- Yuanyuan (YY) Zhou (Professor)
- Rini Kaushik
- Shan Lu
- Lin Tan
- Joe Tucek
- Weiwei Xiong
- Zuoning Yin
- Yoann Padioleau (Post-doctor)
- Soyeon Park (Post-doctor)
- Zhenmin Li (graduated)
- Feng Qin (graduated)
- Pin Zhou (graduated)
Collaborators:
- Professor Joseph Torrellas
- Professor Jiawei Han
- Professor Sam Midkiff (Purdue)
Funding:
- NSF, is a part of the medium ITR project called PROBE
- UIUC Startup Grant