Distributed Dynamic Partial Order Reduction based Verification of

January 19, 2023

61 Less than a minute

72 Slides557.00 KB

Save Document

Distributed Dynamic Partial Order Reduction based Verification of Threaded Software Yu Yang (PhD student; summer intern at CBL) Xiaofang Chen (PhD student; summer intern at IBM) Ganesh Gopalakrishnan Robert M. Kirby School of Computing University of Utah SPIN 2007 Workshop Presentation Supported by: Microsoft HPC Institutes NSF CNS 0509379 1

Thread Programming will become more prevalent FV of thread programs will grow in importance 2

Why FV for Threaded Programs 80% of chips shipped will be multi-core (photo courtesy of Intel Corporation.) 3

Model Checking will Increasingly be thru Dynamic Methods Also known as Runtime or In-Situ methods 4

Why Dynamic Verification Methods Even after early life-cycle modeling and validation, the final code will have far more details Early life-cycle modeling is often impossible - Use of libraries (API) such as MPI, OpenMP, Shmem, - Library function semantics can be tricky - The bug may be in the library function implementation 5

Model Checking will often be “stateless” 6

Why Stateless One may not be able to access a lot of the state - e.g. state of the OS . It is expensive to hash and lookup revisits . Stateless is easier to parallelize 7

Partial Order Reduction is Crucial ! 8

Why POR? Process P0: ------------------------------0: MPI Init 1: MPI Win lock 2: MPI Accumulate 3: MPI Win unlock 4: MPI Barrier 5: MPI Finalize ONLY DEPENDENT OPERATIONS Process P1: ------------------------------0: MPI Init 1: MPI Win lock 2: MPI Accumulate 3: MPI Win unlock 4: MPI Barrier 5: MPI Finalize 504 interleavings without POR (2 * (10!)) / (5!) 2 2 interleavings with POR !! 9

Dynamic POR is almost a “must” ! ( Dynamic POR as in Flanagan and Godefroid, POPL 2005) 10

Why Dynamic POR ? a[ j ] a[ k ]-- Ample Set depends on whether j k Can be very difficult to determine statically Can determine dynamically 11

Why Dynamic POR ? The notion of action dependence (crucial to POR methods) is a function of the execution 12

Computation of “ample” sets in Static POR versus in DPOR { BT }, { Done } Ample determined using “local” criteria Add Red Process to “Backtrack Set” This builds the Ample set incrementally based on observed dependencies Nearest Dependent Transition Looking Back Blue is in “Done” set Current State Next move of Red process 13

Putting it all together We target C/C PThread Programs Instrument the given program (largely automated) Run the concurrent program “till the end” Record interleaving variants while advancing When # recorded backtrack points reaches a soft limit, spill work to other nodes In one larger example, a 11-hour run was finished in 11 minutes using 64 nodes Heuristic to avoid recomputations was essential for speed-up. First known distributed DPOR 14

A Simple DPOR Example t0: lock(t) unlock(t) {}, {} t1: lock(t) unlock(t) t2: lock(t) unlock(t) 15

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {}, {} t1: lock(t) unlock(t) t2: lock(t) unlock(t) 16

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {}, {} t0: unlock t1: lock(t) unlock(t) t2: lock(t) unlock(t) 17

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {}, {} t0: unlock t1: lock(t) unlock(t) t1: lock t2: lock(t) unlock(t) 18

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock t2: lock(t) unlock(t) 19

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {}, {} t1: unlock t2: lock(t) unlock(t) t2: lock 20

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {t2}, {t1} t1: unlock t2: lock(t) unlock(t) t2: lock 21

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {t2}, {t1} t1: unlock t2: lock(t) unlock(t) t2: lock t2: unlock 22

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) t1: lock {t2}, {t1} t1: unlock t2: lock(t) unlock(t) t2: lock 23

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1}, {t0} t0: unlock t1: lock(t) unlock(t) {t2}, {t1} t2: lock(t) unlock(t) 24

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1,t2}, {t0} t0: unlock t1: lock(t) unlock(t) t2: lock {}, {t1, t2} t2: lock(t) unlock(t) 25

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1,t2}, {t0} t0: unlock t1: lock(t) unlock(t) t2: lock {}, {t1, t2} t2: unlock t2: lock(t) unlock(t) 26

A Simple DPOR Example t0: lock(t) unlock(t) t0: lock {t1,t2}, {t0} t0: unlock t1: lock(t) unlock(t) {}, {t1, t2} t2: lock(t) unlock(t) 27

A Simple DPOR Example t0: lock(t) unlock(t) {t2}, {t0,t1} t1: lock(t) unlock(t) t2: lock(t) unlock(t) 28

A Simple DPOR Example t0: lock(t) unlock(t) t1: lock {t2}, {t0, t1} t1: unlock t1: lock(t) unlock(t) t2: lock(t) unlock(t) 29

For this example, all the paths explored during DPOR For others, it will be a proper subset 30

Idea for parallelization: Explore computations from the backtrack set in other processes. “Embarrassingly Parallel” – it seems so, anyway ! 31

We first built a sequential DPOR explorer for C / Pthreads programs, called “Inspect” Multithreaded Multithreaded C/C C/C program program instrumentation instrumented instrumented program program compile executable executable thread 1 thread n request/permit re q ue er s t/p scheduler mit Thread Threadlibrary library wrapper wrapper 32

We then made the following observations Stateless search does not maintain search history Different branches of an acyclic space can be explored concurrently Simple master-slave scheme can work here – one load balancer workers 33

We then devised a work-distribution scheme load balancer Request unloading report result idle node id worker a work description worker b 34

We got zero speedup! Why? Deeper investigation revealed that multiple nodes ended up exploring the same interleavings 35

Illustration of the problem (1 of 5) t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock t2: lock t2: unlock 36

Illustration of the problem (2 of 5) t0: lock {t1}, {t0} To Node 1 t0: unlock t1: lock {t2}, {t1} t1: unlock t2: lock t2: unlock Heuristic : Handoff DEEPEST backtrack point for another node to explore Reason : Largest number of paths emanate from there 37

Detail of (2 of 5) Node 0 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t0: lock { }, {t0,t1} t0: unlock t1: lock {t2}, {t1} t1: unlock t1: unlock t2: lock t2: lock t2: unlock t2: unlock 38

Detail of (2 of 5) Node 0 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t0: lock { }, {t0,t1} Node 1 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock t1: unlock t2: lock t2: lock t2: unlock t2: unlock 39

t1 is forced into DONE set before work handed to Node 1 Detail of (2 of 5) Node 0 t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} t0: lock { }, { t0,t1 } Node 1 t0: lock { t1 }, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock t1: unlock t2: lock t2: lock t2: unlock t2: unlock Node 1 keeps t1 in backtrack set 40

Illustration of the problem (3 of 5) t0: lock {t1}, {t0} t0: unlock t1: lock {t2}, {t1} To Node 1 Decide to do THIS work at Node 0 itself t1: unlock t2: lock t2: unlock 41

Illustration of the problem (4 of 5) t0: lock {}, {t0,t1} t0: unlock {t1}, {t0} Being expanded by Node 1 {t2}, {t1} Being expanded by Node 0 42

Illustration of the problem (5 of 5) t0: lock {t2}, {t0,t1} t0: unlock {}, {t2} t2: lock t2: unlock 43

Illustration of the problem (5 of 5) t0: lock {t2}, {t0,t1} t0: unlock t1: lock {t1}, {t0} t1: unlock {}, {t2} t2: lock t2: unlock 44

Illustration of the problem (5 of 5) t0: lock {t2}, {t0,t1} t0: unlock t1: lock {t2}, {t0, t1} t1: unlock {}, {t2} {}, {t2} t2: lock t2: lock t2: unlock t2: unlock Redundancy! 45

New Backtrack Set Computation: Aggressively mark up the stack! t0: lock {t1,t2}, {t0} t0: unlock t1: lock {t2}, {t1} t1: unlock Update the backtrack sets of ALL dependent operations! Forms a good allocation scheme Does not involve any synchronizations Redundant work may still be performed Likelihood is reduced because a node aggressively “owns” one operation and all its dependants t2: lock t2: unlock 46

Implementation and Evaluation Using MPI for communication among nodes Did experiments on a 72-node cluster – 2.4 GHz Intel XEON process, 2GB memory/node – Two (small) benchmarks Indexer & file system benchmark used in Flanagan and Godefoid’s DPOR paper – Aget -- a multithreaded ftp client – Bbuf – an implementation of bounded buffer 47

Sequential Checking Time Benchmark Threads Runs Time (sec) fsbench 26 8,192 291.32 indexer 16 32,768 1188.73 aget 6 113,400 5662.96 bbuf 8 1,938,816 39710.43 48

Speedup on indexer & fs (small exs); so diminishing returns 40 nodes 49

Speedup on aget 50

Speedup on bbuf 51

Conclusions and Future Work Method described is VERY promising We have an in-situ model checker for MPI programs also! (EuroPVM / MPI 2007) – Will be parallelized using MPI for work distribution! The C/PThread Work needs to be pushed a lot more: – – – – Automate Instrumentation Try many new examples Improve work-distribution heuristic in response to findings Release tool 52

Questions? 53

Answers ! Properties: Currently – Local “assert”s – Deadlocks – Uninitialized Variables No plans for liveness Tool release likely in 6 months That is a very good question. Let’s talk! 54

Extra Slides 55

Concurrent operations on some database Class A operations: Class B operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); 56

Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 57

Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 63

Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 64

Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 65

Initial random execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 66

Initial random execution Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 67

Initial random execution Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count-a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 68

Dependent operations? Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); b count--; if (b count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 70

Start an alternative execution Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); a count--; if (a count 0) pthread mutex unlock(res); pthread mutex unlock(mutex); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 71

Get a deadlock! Class A operations: pthread mutex lock(mutex); a count ; if (a count 1) pthred mutex lock(res); pthread mutex unlock(mutex); pthread mutex lock(mutex); Class B operations: pthread mutex lock(mutex); b count ; if (b count 1) pthred mutex lock(res); a1 : acquire mutex a2 : a count a3 : a count 1 a4 : acquire res a5 : release mutex b1 : acquire mutex b2 : b count b3 : b count 1 a6 : acquire mutex a7 : a count -a8 : a count 0 a9 : release res a10 : release mutex b4 : acquire res b5 : release mutex b6 : acquire mutex b7 : b count b8 : b count 0 b9 : release lock b10 : release mutex 72

January 19, 2023

61 Less than a minute

Related Articles

File System and Full Volume Encryption Sachin Patel CSE 590TU 3/9/2006

Events Management Engage for Events

嵌入式處理器架構與程式設計 王建民 中央研究院 資訊所 2008 年 7 月

Emotional Intelligence View 360 Administration and Interpretation

CIS 2033 based on Dekking et al. A Modern Introduction to Probability

College Credit Now at Forsyth Central High School DUAL ENROLLMENT AND

嵌入式處理器架構與程式設計王建民中央研究院資訊所 2008 年 7 月