Distributed Dynamic Partial Order Reduction based Verification of
72 Slides557.00 KB
Distributed Dynamic Partial Order Reduction based Verification of Threaded Software Yu Yang (PhD student; summer intern at CBL) Xiaofang Chen (PhD student; summer intern at IBM) Ganesh Gopalakrishnan Robert M. Kirby School of Computing University of Utah SPIN 2007 Workshop Presentation Supported by: Microsoft HPC Institutes NSF CNS 0509379 1
Thread Programming will become more prevalent FV of thread programs will grow in importance 2
Why FV for Threaded Programs 80% of chips shipped will be multi-core (photo courtesy of Intel Corporation.) 3
Model Checking will Increasingly be thru Dynamic Methods Also known as Runtime or In-Situ methods 4
Why Dynamic Verification Methods Even after early life-cycle modeling and validation, the final code will have far more details Early life-cycle modeling is often impossible - Use of libraries (API) such as MPI, OpenMP, Shmem, - Library function semantics can be tricky - The bug may be in the library function implementation 5
Model Checking will often be “stateless” 6
Why Stateless One may not be able to access a lot of the state - e.g. state of the OS . It is expensive to hash and lookup revisits . Stateless is easier to parallelize 7
Partial Order Reduction is Crucial ! 8
Why POR? Process P0: ------------------------------0: MPI Init 1: MPI Win lock 2: MPI Accumulate 3: MPI Win unlock 4: MPI Barrier 5: MPI Finalize ONLY DEPENDENT OPERATIONS Process P1: ------------------------------0: MPI Init 1: MPI Win lock 2: MPI Accumulate 3: MPI Win unlock 4: MPI Barrier 5: MPI Finalize 504 interleavings without POR (2 * (10!)) / (5!) 2 2 interleavings with POR !! 9
Dynamic POR is almost a “must” ! ( Dynamic POR as in Flanagan and Godefroid, POPL 2005) 10
Why Dynamic POR ? a[ j ] a[ k ]-- Ample Set depends on whether j k Can be very difficult to determine statically Can determine dynamically 11
Why Dynamic POR ? The notion of action dependence (crucial to POR methods) is a function of the execution 12
Computation of “ample” sets in Static POR versus in DPOR { BT }, { Done } Ample determined using “local” criteria Add Red Process to “Backtrack Set” This builds the Ample set incrementally based on observed dependencies Nearest Dependent Transition Looking Back Blue is in “Done” set Current State Next move of Red process 13
Putting it all together We target C/C PThread Programs Instrument the given program (largely automated) Run the concurrent program “till the end” Record interleaving variants while advancing When # recorded backtrack points reaches a soft limit, spill work to other nodes In one larger example, a 11-hour run was finished in 11 minutes using 64 nodes Heuristic to avoid recomputations was essential for speed-up. First known distributed DPOR 14
A Simple DPOR Example t0: lock(t) unlock(t) {}, {} t1: lock(t) unlock(t) t2: lock(t) unlock(t) 15
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {}, {} t1: lock(t) unlock(t) t2: lock(t) unlock(t) 16
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {}, {} t0: unlock t1: lock(t) unlock(t) t2: lock(t) unlock(t) 17
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {}, {} t0: unlock t1: lock(t) unlock(t) t1: lock t2: lock(t) unlock(t) 18
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock t2: lock(t) unlock(t) 19
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {}, {} t1: unlock t2: lock(t) unlock(t) t2: lock 20
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {t2}, {t1} t1: unlock t2: lock(t) unlock(t) t2: lock 21
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {t2}, {t1} t1: unlock t2: lock(t) unlock(t) t2: lock t2: unlock 22
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {t2}, {t1} t1: unlock t2: lock(t) unlock(t) t2: lock 23
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) {t2}, {t1} t2: lock(t) unlock(t) 24
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1,t2}, {t0} t0: unlock t1: lock(t) unlock(t) t2: lock {}, {t1, t2} t2: lock(t) unlock(t) 25
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1,t2}, {t0} t0: unlock t1: lock(t) unlock(t) t2: lock {}, {t1, t2} t2: unlock t2: lock(t) unlock(t) 26
A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1,t2}, {t0} t0: unlock t1: lock(t) unlock(t) {}, {t1, t2} t2: lock(t) unlock(t) 27
A Simple DPOR Example t0: lock(t) unlock(t) {t2}, {t0,t1} t1: lock(t) unlock(t) t2: lock(t) unlock(t) 28
A Simple DPOR Example t0: lock(t) unlock(t) t1: lock {t2}, {t0, t1} t1: unlock t1: lock(t) unlock(t) t2: lock(t) unlock(t) 29
For this example, all the paths explored during DPOR For others, it will be a proper subset 30
Idea for parallelization: Explore computations from the backtrack set in other processes. “Embarrassingly Parallel” – it seems so, anyway ! 31
We first built a sequential DPOR explorer for C / Pthreads programs, called “Inspect” Multithreaded Multithreaded C/C C/C program program instrumentation instrumented instrumented program program compile executable executable thread 1 thread n request/permit re q ue er s t/p scheduler mit Thread Threadlibrary library wrapper wrapper 32
We then made the following observations Stateless search does not maintain search history Different branches of an acyclic space can be explored concurrently Simple master-slave scheme can work here – one load balancer workers 33
We then devised a work-distribution scheme load balancer Request unloading report result idle node id worker a work description worker b 34
We got zero speedup! Why? Deeper investigation revealed that multiple nodes ended up exploring the same interleavings 35
Illustration of the problem (1 of 5) t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock t2: lock t2: unlock 36
Illustration of the problem (2 of 5) t0: lock {t1}, {t0} To Node 1 t0: unlock t1: lock {t2}, {t1} t1: unlock t2: lock t2: unlock Heuristic : Handoff DEEPEST backtrack point for another node to explore Reason : Largest number of paths emanate from there 37
Detail of (2 of 5) Node 0 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t0: lock { }, {t0,t1} t0: unlock t1: lock {t2}, {t1} t1: unlock t1: unlock t2: lock t2: lock t2: unlock t2: unlock 38
Detail of (2 of 5) Node 0 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t0: lock { }, {t0,t1} Node 1 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock t1: unlock t2: lock t2: lock t2: unlock t2: unlock 39
t1 is forced into DONE set before work handed to Node 1 Detail of (2 of 5) Node 0 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t0: lock { }, { t0,t1 } Node 1 t0: lock { t1 }, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock t1: unlock t2: lock t2: lock t2: unlock t2: unlock Node 1 keeps t1 in backtrack set 40
Illustration of the problem (3 of 5) t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} To Node 1 Decide to do THIS work at Node 0 itself t1: unlock t2: lock t2: unlock 41
Illustration of the problem (4 of 5) t0: lock {}, {t0,t1} t0: unlock {t1}, {t0} Being expanded by Node 1 {t2}, {t1} Being expanded by Node 0 42
Illustration of the problem (5 of 5) t0: lock {t2}, {t0,t1} t0: unlock {}, {t2} t2: lock t2: unlock 43
Illustration of the problem (5 of 5) t0: lock {t2}, {t0,t1} t0: unlock t1: lock {t1}, {t0} t1: unlock {}, {t2} t2: lock t2: unlock 44
Illustration of the problem (5 of 5) t0: lock {t2}, {t0,t1} t0: unlock t1: lock {t2}, {t0, t1} t1: unlock {}, {t2} {}, {t2} t2: lock t2: lock t2: unlock t2: unlock Redundancy! 45
New Backtrack Set Computation: Aggressively mark up the stack! t0: lock {t1,t2}, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock Update the backtrack sets of ALL dependent operations! Forms a good allocation scheme Does not involve any synchronizations Redundant work may still be performed Likelihood is reduced because a node aggressively “owns” one operation and all its dependants t2: lock t2: unlock 46
Implementation and Evaluation Using MPI for communication among nodes Did experiments on a 72-node cluster – 2.4 GHz Intel XEON process, 2GB memory/node – Two (small) benchmarks Indexer & file system benchmark used in Flanagan and Godefoid’s DPOR paper – Aget -- a multithreaded ftp client – Bbuf – an implementation of bounded buffer 47
Sequential Checking Time Benchmark Threads Runs Time (sec) fsbench 26 8,192 291.32 indexer 16 32,768 1188.73 aget 6 113,400 5662.96 bbuf 8 1,938,816 39710.43 48
Speedup on indexer & fs (small exs); so diminishing returns 40 nodes 49
Speedup on aget 50
Speedup on bbuf 51
Conclusions and Future Work Method described is VERY promising We have an in-situ model checker for MPI programs also! (EuroPVM / MPI 2007) – Will be parallelized using MPI for work distribution! The C/PThread Work needs to be pushed a lot more: – – – – Automate Instrumentation Try many new examples Improve work-distribution heuristic in response to findings Release tool 52
Questions? 53
Answers ! Properties: Currently – Local “assert”s – Deadlocks – Uninitialized Variables No plans for liveness Tool release likely in 6 months That is a very good question. Let’s talk! 54
Extra Slides 55
Concurrent operations on some database Class A operations: Class B operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); 56
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 57
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 58
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 59
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 60
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 61
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 62
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 63
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 64
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 65
Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 66
Initial random execution Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 67
Initial random execution Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count-a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 68
Initial random execution Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 69
Dependent operations? Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 70
Start an alternative execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 71
Get a deadlock! Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 72