In geometric decomposition, they mention recursive data structures as good candidates for recursive algorithms and then mention arrays as good candidates for geometric decomposition. Arrays are recursive data structures, though, and have been used as examples in almost every one of the recursive algorithm patters, e.g. quicksort. I think the authors need to give more context to when arrays are good candidates for being geometrically decomposed as opposed to recursively processed...which they seem to do in the next paragraph. According to the authors, recursive algorithms are fine when the range of data that you're dealing with doesn't need to know about any of the other data in the structure. When there is some locality, though, then geometric decomposition is the way to go.
It's interesting to note in the multidimensional array examples that the surface area resulting from the data decomposition is important to consider. Since each block of data may need to communicate with adjacent blocks, the more surface area there is, the more communication is necessary. A similar idea can likely be extended to most data structures. In graphs, for example, you can consider the number of outgoing edges or the cut size from a section of the graph. There is a graph partitioning paper coming up, so hopefully that's yet another layer deep and not something parallel to this. Otherwise, I think I'm missing the point of this paper.
Another novel idea is to duplicate the neighbors of cells on the boundary of a chunk. In certain scenarios, the overhead incurred by duplicating data can make up for the fact that chunks should not need to communicate with each other. However, this only seems practical to me if the cells on the boundary only need the initial values of their duplicated neighbors. If they need the mostly updated data from their neighbors, then I don't see how this eases the communication burden. Also, if each border chunk has access to all the needed information when the process begins, then I think it is a data parallelism and not geometric decomposition situation.
Load balancing can become a problem if the data is not evenly distributed amongst the chunks of the data structure or if the processing causes the distribution to become uneven. The solution presented here, which makes sense to me, is to create many more chunks than UEs so no UE is likely to sit idle while another has a significant amount of processing to do.
It seems strange that the pipeline paper only differentiates itself (in the authors' words) from the pipes and filters pattern, which we've already read twice, simply by explicitly addressing parallelism. I think to avoid confusion, this should be made as a section of the pipes and filters pattern. Given that, it takes a while before this paper adds anything new to the two that we've already covered
Since each stage can typically only process one piece of data at a time, the more stages there are, the more operations can be performed in parallel. However, each stage comes at the cost of increasing communication overhead and latency. In order to make optimal use of the resources available at each stage, they should all require roughly the same amount of work so quicker stages don't leave resources idle for long while waiting for the longer ones to finish. This later point (and perhaps the former, too) was made in the previous treatments of this topic, though.
Is this even worth discussing? The entire treatment of the topic was only three sentences.