Systems are becoming ever so complex. It is not a trivial task to provide the high performance, dependability, scalability and manageability that are demanded by enterprise customers. In addition, new Internet-based applications have led to highly diverse system workloads and access behaviors, which makes customizing a system for different workloads a more challenging and sometimes a formidable task. The increasing complexity also worsens system dependability. Since modern systems consist of many components that can fail, the failure rate for a system is much higher than before. Another result of the increasing system complexity is that the administrative (maintenance) cost has become a significant part of a system's TCO (Total Cost of Ownership). To maintain a system, skilled IT professionals are required to install, configure, operate, backup and tune it.
To address the above problem, it is desirable to build systems that can "look after themselves". Such systems are called autonomic systems. The main idea of autonomic systems is to have a system manage itself with minimum human intervention. An autonomic system includes several elements:
- Self-tuning, adapt system algorithms or policies to the occurring workloads in order to achieve the best possible performance
- self-protection, detect, identify and protect itself against various types of attacks to maintain data security and integrity
- self-managing: configure itself dynamically to adapt to different workloads, user demands and environment
- self-healing: quickly recover from errors or failures
To provide autonomic capabilities, a system first needs to automatically capture and characterize the occurring storage access behavior, based on which it can change its control policies or configurations to adapt to application workloads. However, without proper analyzing tools, storage access behaviors are difficult to characterize, especially for workloads whose behaviors change dynamically from one time period to another. Typical storage access behaviors include temporal locality, spatial locality, access frequencies, regular access patterns, block correlations, distributions and many other complex patterns.
This project investigates a novel technology called system mining that applies data mining techniques to improve system performance, dependability and manageability. We will use storage systems as our experimental systems to investigate and evaluate our ideas.
Publications:- Lin Tan, Ding Yuan, Gopal Krishna and Yuanyuan Zhou. /* iComment: Bugs or Bad Comments? */. To appear in the Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), October 2007.
- Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, Yuanyuan Zhou. MUVI: Automatically Inferring Multi-Variable Access Correlations and Detecting Related Semantic and Concurrency Bugs. To appear in the Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), October 2007.
- Lin Tan, Ding Yuan and Yuanyuan Zhou. HotComments: How to Make Program Comments More Useful?. In the Proceedings of the 11th Workshop on Hot Topics in Operating Systems (HotOS), May 2007. San Diego, California.
- Zhenmin Li, Zhifeng Chen and Yuanyuan Zhou. Mining Block Correlations to Improve Storage Performance. ACM Transactions on Storage [ACM-TOS], May 2005
- Zhenmin Li, Shan Lu, Suvda Myagmar and Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. Proceedings of the Sixth Symposium on Operating System Design and Implementation [OSDI'04], December, 2004 (14% acceptance rate, 27/193).
- Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan and Yuanyuan Zhou. Mining Block Correlations in Storage Systems. Proceedings of 3rd USENIX Conference on File and Storage Technologies [FAST’04], March 2004.
- Zhenmin Li, Jed Taylor, Elizabeth Partridge, Yuanyuan Zhou, William Yurcik, Cristina Abad, James J. Barlow, and Jeff Rosendale, UCLog: A Unified, Correlated Logging Architecture for Intrusion Detection, 12th International Conference on Telecommunication Systems - Modeling and Analysis [ICTSM], 2004.
- Zhenmin Li, Sudarshan M. Srinivasan, Zhifeng Chen, Yuanyuan Zhou, Peter Tzvetkov, Xifeng Yan, and Jiawei Han. Using Data Mining to Discover Patterns in Autonomic Storage Systems. Poster presentation and short paper, 1st Workshop on Algorithms and Architectures for Self-Managing Systems in conjunction with ISCA and SIGMETRICS, 2003
- William Yurcik, Jim Barlow, YuanYuan Zhou, Hrishikesh Raje, Yifan Li, Xiaoxin Yin, Mike Haberman, Jeff Rosendale, Dora Cai and Duane Searsmith. Scalable Data Management Alternatives to Support Data Mining Heterogeneous Logs for Computer Network Security. 2003 SIAM Workshop on Data Mining for Counter Terrorism and Security, June 2003.
- Yuanyuan (YY) Zhou (Professor)
- Lin Tan
- Zhenmin Li (graduated)
- Sudarshan Srinivasan (graduated)
- Zhifeng Chen (graduated)
Collaborators:
- Professor Jiawei Han
Funding:
- NSF CAREER Award
- UIUC Startup Grant