Monday, October 19, 2009

OPL: Event-based Implicit invocation & Map-reduce

Event-based, Implicit invocation

I’m not seeing a difference between this pattern and the observer pattern at the OOD level. The pattern is more general and can be applied to communication between systems, e.g. a SOA where the pricing module will fire an event to indicate a sale but not be concerned with what other modules (billing, CRM) are interested. Now that I’m re-reading Richard’s questions, that last example fits in with the pub-sub architectural model. I’m not sure that I see a difference between these patterns other than levels of abstraction. I think that the different examples at the end of the “Solutions” section do a good job of showing other levels of abstraction where the pattern works.

I’m not sure why you would prefer a synchronous dispatch over an asynchronous one. I’m sure there is a valid reason, but other than hardware limitations, I can’t think of one. Since the announcer shouldn’t know anything about its subscribers, why should it have to wait for them? It’s typically not expecting to get anything back. If a sender does want to reply to a message, it won’t be via a return value or anything like that; it will reply by raising an event of its own.

As far as error handling goes, the announcer should certainly not be notified of errors with the subscribers. The entire point of this pattern is to decouple the two types of entities. Since the announcer won’t know anything about the subscribers, it will likely have no idea how to handle any error that it receives from them.



Map-reduce

This pattern has been all the rage due to the fact that Google uses it. One example that they give, which I find much clearer than any of the examples given in the reading, is counting the number of times a string appears in a document. If there are w workers and p pages in the document, each worker can take p/w pages and find the number of occurrences of that string. Then since the reduce method is a simple summation, which is a distributive function (the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning), so even the reduce part of the algorithm can be done in parallel.

I think that the map portion of the pattern is simple and obvious (divide and conquer your independent tasks), but I think there is a significant amount of complexity in reducing in non-naïve manner that was glossed over. Maybe it’s simpler than I think, though, because I haven’t used the pattern before, or maybe doing naïvely is just fine.

This pattern reminds me a lot of the fork-join pattern, and I’m really thinking hard about what the differences are. The work stealing method presented in the fork-join paper seems like a very novel idea, but that is just an implementation detail that I imagine map-reduce could also use.

I don’t know if I like the idea of the framework handling errors. I usually prefer to have the system let the programmer specify exactly how errors should be handled so none get swallowed inadvertently. I think that if the framework attempts to handle the errors itself, it is violating the “fail fast” principal discussed in the other reading for this class. The framework can keep the computation going using data that is potentially incorrect, and one error can lead to drastically compounded problems in the output.

No comments:

Post a Comment