Tuesday, November 10, 2009

OPL: Loop Parallelism, Task Queue, & Graph Partitioning

Loop Parallelism
1. The author says the goal of incremental parallelization is to "evolve" a sequential program into a parallel program. If you have experience in refactoring code from sequential to parallel, how did your approach differ from the steps presented in the paper?

I do not have experience doing this, but I do not imagine that you would have great results. I believe we talked about how evolutionary parallelism can achieve minor performance gains, but the architecture really needs to support the parallelism using the appropriate patterns for the problem domain for the speedups to be significant. Amdahl's Law says that we are limited by the amount of code that must be done sequentially, and I imagine refactoring loops will not parallelize a large percentage of a typical codebase. Though I am not a fan of premature optimization, there is a threshold above which you can be certain that an area of the code will be significantly computationally expensive relative to the rest of the code & look for parallelization opportunities there.

2. Many of us have used Eclipse or another IDE, with their respective plugins, for execution profiling. What has been your experience with these tools? What is your preferred method of locating the code in question (automatic or manual)?

I have had very good success with profiling tools in the .NET world. They have helped easily identify down to the line where performance bottlenecks are.

3. The author states, "The application of many transformations to many loops may be necessary before efficient performance is achieved." What is your opinion on this statement?

That is true. See my response to 1.

Task Queue Implementation Pattern
1. How would you compare the Task Queue pattern to other similar distribution patterns like divide-and-conquer and fork/join pattern?

I think the task queue is by definition a divide-and-conquer algorithm. Otherwise, there would be no need for a queue because there would be only one, large task. Perhaps James meant recursive splitting & fork-join.

The distinction with recursive splitting is that here the work items in the queue are broken down into their smallest-level pieces, which are then grabbed by threads that need more work. The problem that recursive splitting has that the task queue does not is that some effort should be spent to make sure that the splitting points for dividing the work into recursive tasks should split the work-load evenly. Otherwise, it can be quite easy for the load to become unbalanced, e.g. a worst-case quicksort that takes O(N^2) instead of the average O(N log N).

This pattern interests me because I have read several blog posts recently talking about it in different contexts. The typical example for explaining its optimality in queuing theory is people queuing at a grocery store. If customers divide themselves and form N queues for N cashiers, customers can become blocked for an inordinately long time behind a difficult/slow customer. In addition, adding and removing cashiers causes a significant amount of churn in the customers. By contrast, if all customers form a single queue and the customer at the head goes to the first available cashier, no customer is unfairly stuck behind another difficult/slow one, and cashiers can be added and removed depending on the queue length without affecting the waiting customers. The former is the distributed queue version and the later is the centralized queue version of this pattern.

Perhaps my memory is failing me a bit, but fork-join seems like a specific implementation of the decentralized version task queue pattern. In fact, the distributed task-stealing pattern that was first presented in the fork-join paper is explicitly mentioned here.

2. It was nice to see a pattern with some actual code examples. How important were the examples in understanding the concepts presented in this paper?

I did not find the code samples useful. I thought that the concept was explained quite well in the first few sections of the paper; better than most of the others, in fact.

Graph Partitioning Implementation Strategy Pattern
1. This was probably one of the more math-oriented patterns I have seen and/or read. How often have you experienced a situation where the graph partition strategy would have been useful? If never, can you think of any future situations where this pattern might be come in handy and could be applied?

I felt that the majority of the coverage was dedicated to the nitty-gritty details of several graph-partitioning algorithms. I think more high-level analysis would have been beneficial. That being said, I do think that this is a useful pattern to have in your toolbox because of the large number of problems that could be mapped to a set of nodes and weighted, optionally directed edges. Examples could be physical distance between points, latency between nodes on a network, or monetary costs to ship a package between two cities.

No comments:

Post a Comment