BugBench: A Benchmark for Evaluating Bug Detection Tools Shan
6 Slides225.50 KB
BugBench: A Benchmark for Evaluating Bug Detection Tools Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou and Yuanyuan Zhou University of Illinois, Urbana-Champaign
Content of This Talk Share our experience Bug/application characteristics analysis BugBench has been used by Our previous work [Micro’04, ISCA’04, HPCA’ 05] Other research groups: UCSD, Purdue, NCS U, etc.
Current Benchmark Suite Name Program Source NCOM POLY ncompress-4.2.4 Red Hat polymorph-0.4.0 GNU GZIP COMP GO MAN BC SQUD CALB CVS YPSV PFTP SQUD2 HTPD MSQL1 MSQL2 MSQL3 PSQL HTPD2 gzip-1.2.4 129.compress 099.Go man-1.5h1 bc-1.06 squid-2.3 cachelib cvs-1.11.4 ypserv-2.2 proftpd-1.2.9 squid-2.4 httpd-2.0.49 msql-4.1.1 msql-3.23.56 msql-4.1.1 postgresql-7.4.2 httpd-2.0.49 GNU SPEC95 SPEC95 Red Hat GNU squid UIUC GNU Linux NIS ProFTPD squid Apache MySQL MySQL MySQL PostgreSQL Apache LOC Crash Latency Bug Type 1.9K 0.7K N/A 9040K Inst 8.2K 2.0K 29.6K 4.7K 17.0K 93.5K 6.6K 114.5K 11.4K 68.9K 104.6K 224K 1028K 514K 1028K 559K 224K 15K Inst N/A N/A 29.5M Inst 189K Inst 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Stack smash Stack smash & Global buffer overflow Global buffer overflow Global buffer overflow Global buffer overflow Global buffer overflow Global buffer overflow Global buffer overflow Uninitialized read Double free Memory leak Memory leak Memory leak Data race Data race Atomicity Atomicity Semantic Semantic Other type of bugs: In searching memory related multi-thread related semantic
Functionality Name Catch Bug? Related Memory Object Type Valgrind Purify CCured NCOM No No Yes Stack POLY Vary Yes Yes Stack & global buffer GZIP Yes Yes Yes COMP No No Yes GO No Yes Yes MAN Yes Yes Yes BC Yes Yes Yes Valgrind SQUD Yes miss stack Yes Global buffer Heap buffer buffer N/Aoverflow miss moderate global-buffer overflow Purify miss stack buffer overflow miss 1 Byte global-buffer overflow CCured Failed to apply
Memory Alloc Freq. (# per MInst) Heap Usage Ratio [Heap/(Heap Stack)] NCOM 138 0 BC 76.6% 85.1% 23.9% 0% .5 .52 769 480 NCOM .48 1.35X BC BC Mem. Access Freq. (# per Instruction) 69% Purify Ccured 28% 4% Valgrind: 6.4X (NCOM) 119X (BC) Purify: 28% (POLY) 76X (BC) CCured: 4% (POLY) 3.7X (GZIP) Valgrind 18% 120 100 80 60 40 20 0 Overhead Overhead .55 99% NCOM .62 .65 .69 .85
Experience Summary Building benchmark is a time-consuming and long-term work Motivate automatic tools to extract bugs Bug/application characteristics are important for selecting applications Need cooperation from entire community