Tuesday, November 10, 2009

PRES: Probabilistic Replay with Execution Sketching on Multiprocessors

- What do you expect from a debugging tool?
+breakpoints
+stepping through the code line by line examining which branches are taken
+examining the state of variables
+examining the call stack

- What debugging tools do you usually use? What are good and bad about them?
+Visual Studio
+Covers all of the above. Has the ability to move backwards in a function; if you set a breakpoint & need to inspect what happened two lines earlier, you can drag it back and reply those lines. Conditional breakpoints. Can execute commands against the system's current state. Can modify the system's state & observer what impact that has.
+Doesn't have good support for "edit and continue," where you can change the inner workings of a method without needing to recompile & relaunch the application

- What are the challenges of debugging a sequential program? What are additional challenges of debugging a parallel program?
Debugging parallel programs can be extremely difficult due to heisenbugs that are timing related. These bugs happen only with certain interleavenings of the code, which are typically difficult to reproduce. The paper we read earlier in the semester on Chess from Microsoft Research is a good treatment of this problem and a possible solution.

- How important is replay for debugging?
Replay is essential for effective debugging. If the error cannot be consistently reproduced, it is often very difficult to diagnose and cure.

- If you need to design a deterministic replay system, how are you going to do it for sequential programs? Does it work for parallel programs? If now, how to make it work?
- What are the additional challenges to make your tool work on multi-core or multi-processor systems?
These questions seems like they would need a ridiculously long answer. The sequential problem is not too difficult, and you can refer to the Chess paper from earlier in the semester for the parallel case.

- Given a replay system, what else should we do to make it really help debugging?
Solve the halting problem.

- How would virtual machine technologies help?
Virtual machines could help by being able to completely capture every aspect of the system's state, providing you with a snapshot of the system's state that includes active processes, RAM utilization, etc. The environment can often have an impact on a system that behaves as expected internally, and those factors are often difficult to capture & especially difficult to repeat.

- What are the state of arts?
Not sure what the question is...

What the paper mentions in section 1.2 on "State of the Art" is based around systems to record executions. This is different from Chess, which did not attempt to record bugs but instead tried to find a way to efficiently and exhaustively examine different thread interleavenings. This seems to fall into the authors' bucket of existing software practices that repeatedly replay an execution to search for bugs caused by interleavenings. Granted, Chess should be drastically more efficient than the common naïve practices, but it is still in that category. PRES attempts to record enough information that a small amount of replays are needed to reproduce the bug, but it limits the amount of observations that are recorded in order to minimize the overhead of running a production system with PRES active.

PRES has three key components:
-recording only a subset of events. Although there may not be enough information to replay every intermediate event exactly as it occurred, enough information is recorded to reproduce the bug. Once PRES has reproduced the bug once, it can then consistently reproduce it 100% of the time.
-a replay system that reproduces unrecorded actions & events
-using data from unsuccessful replays to help guide subsequent attempts

No comments:

Post a Comment