Saturday, October 24, 2009

OPL: Dense Linear Algebra, Graph Algorithms, & Monte Carlo

Dense Linear Algebra

This pattern is trying to address that fact that memory access is much more expensive that CPU processing, so once a piece of data has been loaded, we need to make the most use of it.

“The BLAS level 3 routines stand out because of the smaller order of complexity for data movement (O(N^2)) than for computation (O(N^3)). Due to this fact, BLAS Level 3 routines can approach peak performance on most parallel systems. For this reason, a programmer would greatly benefit from structuring an algorithm to favor BLAS Level 3 functions. For example, replacing a large number of dot products (a BLAS level 1 function) with a single matrix-matrix multiplication (a BLAS level 3 function) can turn a problem bound by memory bandwidth into a compute-bound problem.” I’m still having trouble figuring out what this paragraph is saying, because at first I wasn’t sure why you’d want to take O(N)/O(N) vector operations and turn them into O(N^2)/O(N^3) operations, which are obviously much more expensive. However, once I actually turned by brain on, I realized that an NxN matrix is an aggregation of N 1xN vectors. Things still don’t quite click because it should take O(N^2) time to perform N BLAS level 1 operations on 1xN vectors. I realize that the point is to make the process compute-bound instead of the bottleneck being data movement, but what I’m focusing on is that it takes O(N^3) as a BLAS level 3 operation when it should take O(N^2) either as one BLAS level 2 operation or N BLAS level 1 operations. I know I’m missing something because the key seems to be BLAS level 3’s different bounds on data movement and computation, whereas BLAS levels 1 & 2 have the same bound on both, but I still don’t see how O(N^3) is better than O(N^2). The O(N^2) BLAS level 1 & 2 operations should be able to be parallelized, too, right?

Naïve implementations of matrix multiplications will repeatedly access data in a way that is suboptimal from a data caching and persistence strategy. A matrix is typically stored in either row-major or column-major order. Let’s say that we are using column-major order. That means accessing items that are adjacent in a matrix row is an expensive operation because the data are likely not stored in the same page on the disk and are likely not cached together. This becomes a problem when naively computing a matrix product cell by cell because we must traverse the same row many times. Instead of computing the product cell by cell, the paper proposes using an outer product, “in which the product of a column of A and a row of B acts as a rank one update to the entire product matrix C.” In this way, we do not have to repeatedly traverse each row and/or column, so spatial locality is still a problem, but a less frequent one.

An extension of the above idea is rather than to update all of C by using an entire column from A and an entire row from B, a smaller block that uses a subset of each is computed. An autotuning library is needed to determine what is the best block size for the given hardware, as block size is dependent on everything from the TLB to the number of floating point units available.

The idea of blocking above is typically optimized towards the quantity and speed of registers in the system. It can be extended further to create other layers of blocking optimized towards L1, L2, and, if it exists, L3 caches. The high-level blocks move around within the higher level blocks until the higher level blocks have been fully traversed, at which point they move.

One noteworthy optimization mentioned in the paper is to “use local variables to explicitly remove false dependencies.” If the value of a matrix cell is stored in a local variable, then operations can be reordered in a more optimal way than if there was a write-after-read/anti-dependency.



Graph Algorithms

Nothing new here as far as creating the abstraction goes. This was covered very well in my algorithms class during my undergrad at UIUC (CS 473). My main take away from the class was that almost any problem could be solved by converting it to a graph and using max flow, though the solution may be less than ideal.

I think the list of graph structures (bipartite, connected, complete, k-regular, tree, forest, DAG) and parallelization opportunities listed make a nice starting point when it comes to doing some research on optimizing an algorithm. Same thing with the graph storage structure (vertex list, edge list, adjacency list, adjacency matrix), traversal options (breadth-first search, depth-first search, topological sort, minimum spanning trees [Kruskal, Prim], single source shortest paths [Bellman-Ford, Dijkstra], all-pairs shortest paths [Floyd-Warshall, Johnson], single source longest paths in DAG [Floss], max flow [Edmonds-Karp, push-relabel, relabel to front], graph coloring [Chaitin, Bomen]), and partitioning operations (Kernighan-Lin, Fiduccia-Matheyses, ParMetis). I found myself scouring my notes from CS 225 & CS473 and searching the Web for all the algorithms that I’ve used to know back during my undergrad but have since forgotten.



Monte Carlo

This method of choosing random points in the sample space can be beneficial when it is difficult to arrive at the answer via analytical methods.

I’ve seen SIMD mentioned frequently in this class during the last few weeks’ readings. I read up on it a bit on Wikipedia before, but I’ll have to dig a little deeper and see if/how it can impact my day job.

As mentioned in the paper, map-reduce and Monte Carlo often work well together because the random samplings are independent tasks that can be performed in the map phase and then have their statistics aggregated in the reduce phase.

I think the paper spends a bit too much time (it’s not much time in total, but these patterns seem to favor brevity) on how random number generation is difficult when not relying on any sampling of physical phenomenon. The more interesting discussion is around how to get pseudo random number generation and parallelized systems to play nicely together, which is a problem because the generator relies on state that can’t be shared by different threads without having them produce the same “random” number. The paper presents two possible solutions: either generate random numbers in batches so blocking/locking is not needed as frequently or “[g]enerate multiple unique sequences in parallel, using advanced parallel version of the Mersenne Twister PRNG or SPRNG Ver4.0…”

I found both the “Experiment Execution” and “Result Aggregation” sections lacking. The former focuses on nothing but SIMD lanes, while the later does not dig into any depth or many alternatives for combining results other than “use locks to synchronize threads.”

Friday, October 23, 2009

Joe Armstrong's Thesis "Making reliable distributed systems in the presence of software errors", Chapter 4: Programming Techniques

I took the time to read Chapter 3 on the language itself because I am quite interested in learning new languages whenever possible. One “best practice” for self-development that I often read is to learn one new programming language every year. The goal is not to become a master in the language and use it in production systems, but to become competent with it so you can understand its patterns and paradigms and have those influence your day job. Functional languages have been all the rage lately and have had significant impacts on modern OO languages, e.g. LINQ in C#. An example of applying functional concepts to your OO programming could be to create generic functions that let you program more declaratively and using immutable value objects.

The one section that I found interesting outside the syntax of the language was the treatment on page 65 of tail recursion. Although I am familiar with the concept, it was an interesting refresher on how it works technically with return pointers & how you can convert to tail recursion by introducing an auxiliary function.



Abstracting Concurrency

The author advocates putting all concurrency-related code into generic reusable modules written by expert programmers and putting all sequential code into plug-ins that anyone can write. Obviously concurrent programming is harder, even in a language that natively supports it like Erlang, than writing sequential code due to the complexities of deadlocks, starvation, interprocess communication, etc. He also advocates creating a plug-in architecture because he claims that you cannot write the concurrent code in a side-effect free manner, whereas you can easily do this with sequential code. To answer Ben’s question, I do believe this statement to be true. The language does not allow side effects in terms of mutating state, so the author is not referring to that. He is referring to the fact that the functions in the concurrent modules need to affect the environment by spawning & exiting processes.

I am not sure why the “find” method in the VSHLR responds with an error when it cannot find the person. I thought that each process was supposed to complete successfully or exit. Granted, this is a central server and we do not want it to die, but although it makes logical sense, it seems to go against the whole paradigm that the author has laid out thus far in the paper. (After reading the rest of the chapter & re-reading this section, the server should not need to exit here. The point of the “fail fast” and “let another process handle the problem” philosophies here means that the server should simply return an error and let the client figure out what to do rather than take any corrective action.)

I like how the author was able to “factor out” concurrency in the example of the VSHLR server. I am wondering if this pattern is always so easy to apply, though.

The idea of having supervisor processes monitor the health of the nodes in the system so a client does not have to worry about its server dying seems like it would work as long as there was a supervisor for the supervisor and a supervisor for that supervisor and so on. So far, the application seems durable, but I think that this same level of durability (except in the face of hardware failures) could be easily replicated in Java or C# without having separate JVMs running in separate processes, as the author said was necessary earlier. I have a “catch (Exception ex) {/*show error message*/}” at the top level in most of my apps that works just fine for errors that are not due to hardware failures. There are obviously many other benefits to Erlang that would be missed, but the durability argument is not shaping up to be quite what I thought it would be.

The whole idea of swapping code out on the fly is much simpler here than I imagined it to be. Here the server is just sent a new function/closure/lambda to execute instead of the one that it was using before. I know there is support for a much more extensive upgrade given what was mentioned in Chapter 3, and I’m interested to see how that works.



Error Handling Philosophy

Even after the author lists out four reasons why having another process handle the errors or the faulty process, I still do not see why it is much more advantageous than having the faulty process handle its own errors unless the errors are hardware related and/or unrecoverable and the faulty process dies. In most systems that I build, we rely on failover systems to handle hardware failures, but I imagine most of the systems that go to the lengths of using Erlang want a more bulletproof Plan B than that. None of his arguments related to code clutter or code distribution sell me at all, though (I really want to be sold on the idea & think outside my Java/.NET box, but the author isn’t a convincing salesman thus far or the pitch is going right over my head).

Later…

I guess that after reading a bit farther, I should recant what I said above. Having the error handling completely removed from the code the worker processes will be executing, even if it is not generic, seems much nicer than having to rely on “catch (Exception ex)” blocks sprinkled throughout the code. If the error handling code in the supervisors can be made generic enough to work with multiple different types of worker processes, all the better.

Monday, October 19, 2009

OPL: Event-based Implicit invocation & Map-reduce

Event-based, Implicit invocation

I’m not seeing a difference between this pattern and the observer pattern at the OOD level. The pattern is more general and can be applied to communication between systems, e.g. a SOA where the pricing module will fire an event to indicate a sale but not be concerned with what other modules (billing, CRM) are interested. Now that I’m re-reading Richard’s questions, that last example fits in with the pub-sub architectural model. I’m not sure that I see a difference between these patterns other than levels of abstraction. I think that the different examples at the end of the “Solutions” section do a good job of showing other levels of abstraction where the pattern works.

I’m not sure why you would prefer a synchronous dispatch over an asynchronous one. I’m sure there is a valid reason, but other than hardware limitations, I can’t think of one. Since the announcer shouldn’t know anything about its subscribers, why should it have to wait for them? It’s typically not expecting to get anything back. If a sender does want to reply to a message, it won’t be via a return value or anything like that; it will reply by raising an event of its own.

As far as error handling goes, the announcer should certainly not be notified of errors with the subscribers. The entire point of this pattern is to decouple the two types of entities. Since the announcer won’t know anything about the subscribers, it will likely have no idea how to handle any error that it receives from them.



Map-reduce

This pattern has been all the rage due to the fact that Google uses it. One example that they give, which I find much clearer than any of the examples given in the reading, is counting the number of times a string appears in a document. If there are w workers and p pages in the document, each worker can take p/w pages and find the number of occurrences of that string. Then since the reduce method is a simple summation, which is a distributive function (the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning), so even the reduce part of the algorithm can be done in parallel.

I think that the map portion of the pattern is simple and obvious (divide and conquer your independent tasks), but I think there is a significant amount of complexity in reducing in non-naïve manner that was glossed over. Maybe it’s simpler than I think, though, because I haven’t used the pattern before, or maybe doing naïvely is just fine.

This pattern reminds me a lot of the fork-join pattern, and I’m really thinking hard about what the differences are. The work stealing method presented in the fork-join paper seems like a very novel idea, but that is just an implementation detail that I imagine map-reduce could also use.

I don’t know if I like the idea of the framework handling errors. I usually prefer to have the system let the programmer specify exactly how errors should be handled so none get swallowed inadvertently. I think that if the framework attempts to handle the errors itself, it is violating the “fail fast” principal discussed in the other reading for this class. The framework can keep the computation going using data that is potentially incorrect, and one error can lead to drastically compounded problems in the output.

Sunday, October 18, 2009

Joe Armstrong's Thesis "Making reliable distributed systems in the presence of software errors", Chapter 2: The Architectural Model

I’m very excited for this paper. I’ve heard a lot of buzz about how Erlang facilitates the building of highly reliable systems. I was a little disappointed when the introduction said “The thesis does not cover in detail many of the algorithms used as building blocks for construction fault-tolerant systems—it is not the algorithms themselves which are the concern of this thesis, but rather the programming language in which such algorithms are expressed.” This gives me the impression that the paper will focus on how Erlang helps you develop reliable systems but leave out lessons that I can apply in other languages. We’ll see how things go…

I’m in complete agreement with the definition of architecture presented. The author says that there are myriad such definitions, but this one contains all the elements that I find noteworthy. It’s by Grady Booch, though, and he usually has some good ideas </understatement> .

I think that a philosophy or set of guiding principles is a commonly overlook portion of a software architecture, and I’m glad to see it called out here. As cheesy as this and related ideas like a system metaphor from Extreme Programming may sound, I have found that teams bigger than four or five people really do benefit from having a foundation that architects and developers can refer back to throughout the project’s lifecycle in an attempt to keep a uniform feel. A good example from a developer’s perspective, which bleed over into a “set of construction guidelines,” that still touches on some architectural decisions can be found here: http://www.cauldwell.net/patrick/blog/ThisIBelieveTheDeveloperEdition.aspx.

This sounds like an extremely interesting challenge. Each of the ten requirements would make for a challenging system, but mixing high concurrency, high reliability, in-place upgrades, and complex features sounds like quite the daunting task.

I’ve heard that Erlang isolates very small pieces of code in independent processes to restrict how far errors can propagate & that each process/module should always make the assumption any other modules that it will talk to will fail. However, the thought of truly to use a full process at the operating system level, this sounds like a very heavy-weight solution due to the overhead of having so many processes and the cost of interprocess communication versus intraprocess communication. It’s good to see that the processes used by Erlang in this sense “are part of the programming language and are not provided by the host operating system.” It’s interesting that towards the end of the chapter the author presents the idea that the isolation from errors that lightweight, hardware independent Erlang processes provide can only be matched in Java by running applications in different JVMs in different operating system processes, which is obviously a tad bit more resource intensive.

Concurrency orient programming doesn’t feel like it has too many differences from OOP to me. It just sounds like OOP in a COPL. Real-world activities are mapped to concurrent processes in much the same way the real-world objects and actions are mapped to classes and methods. Nothing about the characteristics of a COPL seem new, but it is a novel idea to have all the functionality listed (support for processes, process isolation, PIDs, communication only via unreliable message passing, failure detection) implemented completely within constructs of the programming language that can be completely independent of the operating system. It does seem like it could be a problem to have to copy data between processes, though. I can’t imagine that processes could always be organized so that large chunks of data never need to be passed between processes.

I’m never a fan of security through obscurity, so when the author mentions that not knowing the PID of a process means you can’t interact with it and thus can’t compromise it in any way. I’m sure that any decent architect would put more layers of security in place, but I dislike security through obscurity so much that I wouldn’t even bother bringing it up.

Message passing seems to be the best way to build reliable systems at any level of abstraction from my recent readings. At the enterprise architecture scale, event driven architectures and message passing systems are at the crux of many of the most durable sites as seen on High Scalability, e.g. http://highscalability.com/amazon-architecture. At the complete opposite end of the spectrum from systems integration is Erlang, which builds these exact same concepts, e.g. asynchronous message passing, into the programming language itself. Since I hear good things about reliable systems being coded in Erlang and reliable enterprise architectures building SOA stacks on EDAs and message passing systems, I’m guessing there’s something to this pattern.

The idea of failing fast as soon as you detect an unrecoverable error seems like common sense to me. Maybe I’m being naïve or not entirely comprehending the details of the idea, but I don’t understand why anyone would have their system continue after encountering an error which they can’t handle. I’ve come across this idea many times in the past but still have yet to feel like I’m not missing something.

Saturday, October 17, 2009

OPL Patterns: Pipes & Filters, Layered Architectures, and Iterative Refinement

Pipes & filters

The version of the pipes and filters pattern for this assignment was much shorter than the last reading because there was no detailed example and no extra fluff. The original had an overly complex example that I found useless. However, some omissions that jumped out differentiating push vs pull filters and problems with error handling and global state. Also, the original made four points on why the gains from parallel processing are often an illusion. The new one didn’t raise or counter any of those concerns.

The “Forces” sections contained some practical considerations that I don’t believe were mentioned in the original. It was nice to have those considerations in a short, independent section to call them out rather than burying them in dozens of pages of text as the POSA reading did, in my opinion. Overall, I found all three more concise because they lacked some verbiage and context, but I feel they retained 90% of the useful information in 10% of the space.

Layered architectures

I think there’s too much emphasis on how layers hurt performance. While true, it doesn’t matter for most applications that don’t involve systems programming or intense computations, though the critical paths in the later type of system can have the layers removed while the rest is layered to maintain modularity.

I liked the point about keeping the API smaller on lower layers to decrease the work necessary to change the underlying implementation.

Iterative Refinement

It sounds like this is saying that you should find high-level steps that need to be done sequentially, and then inside each step, find tasks that can be performed in parallel. It sounds similar to an article that I just read in the November/December issue of IEEE Software: “Parallelizing Bzip2: A Case Study in Multicore Software Engineering.” Four pairs of students were tasked with taking Bzip2 and parallelizing it. Most teams found that simple, low-level optimizations such as parallelizing loops were insufficient for achieving substantial gains. The successful teams found that the best approach was to refactor the code into high-level modules that were independent but needed to be executed in sequence. They then optimized the code within each module (if I’m remembering the article correctly), which sounds just like this pattern.

Thursday, October 15, 2009

Chess

I was instantly interested in this paper as soon as I read the title. Though my experience with concurrent programming is limited, many of the bugs I encountered during that time were “Heisenbugs.” However, the last time I read about an interesting testing library that was based off of a research paper and Microsoft had created a version for its internal testing, Randoop, I was disappointed to find that the .NET code was not available.

Chess focuses on using only a few preemptions in a sequence but interleavening those preemptions in as many different places in the sequence. The tool also allows you to specify which areas to test more and which areas to run atomically.

I find it nice that the tool runs deterministically & that you can instantly get the last thread interleavening, though those are obvious features without which the tool wouldn’t be very useful. It’s also quite nice tht the tool doesn’t need a custom VM to be able to intercept calls.

Good old Lamport timestamps back from my distributed systems class. They sound like they give a good amount of flexibility to the scheduling algorithm by only requiring a partial ordering. It just has to focus on interleavening a few key events at the right times with respect to each other but not worry about absolute times.

Threadpool work items, async callbacks, timer callbacks

Chess just needs to know if a synchronization operation “isWrite,” writing to shared state, or “isRelease,” releasing other tasks that are blocked on a synchronization mechanism. “isWrite” causes edges to be created in the happens-before graph backwards to all previous nodes and forwards to all subsequent nodes. If the API that Chess has to work with can’t be well understood, program correctness will be maintained by setting both to true; only performance and efficiency will be affected.

Chess avoids blocking calls by wrapping them & using the “try” version instead, which does not block. The tool can then move that task off into the “disabled” queue until a release event occurs on that synchronization resource. This prevents Chess from being blocked.

One problem with Chess is that it can’t guarantee the “real-time” semantics of timers, but few programs correctness or control flow rely on the threads being scheduled and executed real-time.

Data races for non-synchronized shared memory must be put into the “happens-before” graph, too. Chess enforces that the threads run in a single-threaded fashion so that Chess can deterministically execute the program and capture replay info.

There are n to the (k to the c) power thread interleavenings with n threads and k atomic operations performed by each and c interleavenings performed with each run. Chess solves this explosion of the search space by: keeping c low, usually around two, and by keeping k low by ignoring system functions and base libraries. Chess has found that keeping c low but examining all possible interleavening give the small magnitude of c has had great success.

When running Chess in a test harness, things implemented by devs to randomize the program like a large number of threads and sleep calls need to be undone to allow Chess to thoroughly and deterministically search the interleaving space.

Wednesday, October 14, 2009

Reentrancer

Reentrancer automatically converts shared, global state into thread-local state. While the refactorings that the paper presents will indeed make the program safer to run concurrently, it may not always be easy to convert shared state to thread-local state. Often the shared state is the result of lazy programming and can be refactored away with a bit of determination. However, there are many times where tasks cannot be run completely in isolation on their own threads and do truly need global state as a coordination mechanism. Given that, I’m not sure how useful I really think Reentrancer is.

The refactoring takes place in three parts: surrounding static fields with accessors, converting static initializers to lazy initializers, and finally making the data thread local. The author did point out that this process can lead to incorrect behavior. It doesn’t seem like it would be worth the risk of making your application both incorrect and slower (according to the authors’ own benchmarks). Not being able to preserve behavior in the face of exceptions could cause an enormous amount of rework to be necessary. Database and file system access were also mentioned as blockers to this refactoring, but I find those to be less problematic because in the applications that I work with, those external accesses are normally separated from key processing of business logic

I found that this paper had overlap with the previous two on automated refactoring tools, but I found it less readable and less useful in practice.

Beautiful Architecture Chapter 14: Rereading the Classics

I think expressing control flow in Smalltalk using an extremely small set of primitive operations provides the language with an elegance and flexibility that I had never seen before I started working with it. However, one downfall that the author points out is the inability of Smalltalk to correctly handle the order of operations for mathematical symbols. This wouldn’t bother me too much because I don’t write many expressions on a daily basis that involve mathematical expressions that I couldn’t easily group with parentheses.

I think that inheritance is such a critical and common piece of OOD that it would be next to impossible to find any decent system that wasn’t using inheritance. I’ve had good experiences with inheritance used to model “is-a” relationships, and I’ve had good experiences using it to implement design patterns. I’ve had bad experiences where the inheritance trees have gotten overly deep and wide, though I’d much prefer a wide one to a deep one. I don’t think there’s much to say about this question because it seems so obvious.

I’m not sure what the argument about the Shapir-Whorf Hypothesis is supposed to be about. I have had experience implementing features in the language that were not natively supported. After working with Squeak, I fell in love with iterator blocks. I implemented them in C# using anonymous delegates and came up with a solution that wasn’t as elegant as Smalltalk, but it was better than what I had before. Fortunately, LINQ came out in .NET 3.5 and introduced lambda expressions and extension methods to facilitate declarative programming against lists using iterator block-like constructs. I have read Steve McConnel’s “Code Complete,” which was mentioned in this chapter, and the distinction between programming into and programming in a language was something that really stuck with me.

I think the dynamic dispatch is scalable. In my vision of how it works, there’s simply a dictionary lookup for each class to see if the method exists. I imagine this would be very fast unless you had an obnoxious class with millions of methods. You’d lose the ability to inline methods like an optimizing compiler for static compiled languages could, but I don’t think the performance implications would be too bad.

I think it’s a good thing that you can modify any class. I don’t like being protected from myself by language designers. If I tinker with a core class and mess it up, that’s my problem. It’s very nice to have the ability to tweak those classes that don’t work out quite right for you. .NET has introduced a great feature in the last version of the framework called extension methods. They allow you to create methods that extend the functionality of any class, whether it’s a sealed core library class or your own code, as long as you don’t hide any existing method of the class. Although this doesn’t let you override existing methods that don’t work quite right for you, the ability to add what look like native methods to the class is extremely convenient.

I don’t think the metaprogramming features provided by Smalltalk, Ruby, or Python are good enough to convince me that a dynamic language is as easily maintainable in an enterprise application as a static one. In theory I believe the quote in the chapter than mentioned that your code is only correct if it passes tests that define the correctness of your program. However, in enterprise applications that are extremely large and have developers of varying skill levels working on them, the compile-time checks provided by static typing along with the productivity boost provided by Intellisense/autocompletion are essential. The example provided by the author of checking whether the parameters respond to a certain type of message results in code that is much uglier than its statically typed equivalent. I do think the great flexibility provided by dynamic typing is something that I would love if I was developing an application all by myself, but I think the safety net provided by static typing is something that I couldn’t live without in my day job as a consultant building enterprise applications in collaboration with unskilled developers on my clients’ teams.

I was hoping that the chapter would touch on a few more of the classics other than the Gang of Four book and Smalltalk. I did like, though, that the author emphasized that the GOF book is great not only because of its classic catalog of patterns but also because of the general guidelines it sets out for good OOD and how to avoid abusing the powerful features that it provides you. I was disappointed, though, by the value extracted from all of the code samples provided. If you don’t know Smalltalk (luckily I do thanks to Prof. Johnson’s CS 598), I think the return on investment for spending the time to read those samples would be quite low.

The Shapir-Whorf Hypothesis is an incredibly interesting theory that I touched on above. I’ve heard this theory mentioned during my days as an undergraduate in both my CS and my French classes. I also find it odd that I came across the SWH twice yesterday, once while reading this book and once while looking at a presentation on Ruby that was explaining how the flexibility of the language allows you to program *into* it very easily, much like Smalltalk.

I thought that I understood the Strategy design pattern until the author mentioned that add:withOccurrences: was a concise implementation of it. I just can’t see it.

I love the quote that Maillart “found that innovation…came not from laboratory work and mathematical theories, but from design offices and construction sites.”

Saturday, October 10, 2009

ReLooper

Although I read this whole paper, I can’t find too much to say about it that wasn’t already covered in the blogs and discussion about Concurrencer. This is because I see a lot of similarity between what ReLooper does and how Concurrencer will convert an algorithm to use the Fork-Join framework automatically for you. What ReLooper does actually seems like an easier and more naïve change that what Concurrencer does with the fork-join framework, so I’m surprised that Relooper gets its own paper and Eclipse plug-in, while Concurrencer does the fork-join optimization as only one of three of its features.

I would be happy with a compiler making these optimizations for me automatically if there were an iron-clad guarantee that no potential problems could be introduced. However, I think that is a hard guarantee to make and would prefer to either have a tool like Relooper imperatively define what I’d like to happen, but I would prefer an option to tell the compiler declaratively, via an attribute in .NET for example (an annotation in Java?), that the marked code is safe to be automatically parallelized.

Beautiful Architecture Chapter 13: Object-Oriented Versus Functional

I found this chapter pretty interesting because the functional and OO debate is alive and well in most of the communities of which I am a part. I feel that this chapter was pitting functional against OO in an all-or-nothing matchup, whereas most of the communities that I follow talk about what specific set of constraints on a problem give functional languages an edge. When I think of functional languages in the enterprise, I do see a place for them but only as libraries to help with certain specific problems. However, I would never think to build a major pillar of an enterprise architecture in a pure functional language. As we saw from the survey that Jerry put together, 100% of the people in our class think that OO is better for the enterprise, and though it is a biased group, there were several respondents who claimed to have experience with functional languages.

I think that in comparing functional languages with OO purely on the terms of modularity is doing a major disservice to functional languages. Meyer does call out that he is ignoring the elegance of a declarative programming style of the equivalent imperative one. Although I am not a big fan of developing purely in a functional language, I respect functional languages for the positive impact that they have had on most modern OO languages. Being a C#/.NET developer by day, most of my examples will focus on that. C#’s support for (anonymous) delegates and declarative programming via extension methods (e.g. LINQ) make the code I write on a daily basis extremely flexible and concise without sacrificing readability. I do things similar to Meyer’s example of making the visitor pattern much more concise via agents.

Meyer also mentions lazy evaluation as a nice feature of functional languages. However, C# fully supports this in a managed OO language via the “yield return” command. That being said, I don’t want to put functional languages down as being useless in the face of C# because I appreciate the fact that this incredibly useful feature was most likely inspired by the functional programming style.f

In the situations that Meyer describes functional languages lacking, such as the fact that a time module that only parses ISO8601 basic cannot be easily extended to support ISO8601 extended, I agree with Meyer and would definitely use OO languages in such a situation. That seems like an almost absurd shortcoming of functional languages.

Meyer makes a good point when he says that it is unnatural to not have a concept of state anywhere in the system. As Danny mentioned during the conversation, when you paint a house red, you don’t make a copy of the original house with red paint and end up with two houses, you just change the original one. I’ve heard people say ad nauseum that immutable objects help with parallel programming. While I’m sure this is true in some respects, I don’t think that it happens automatically. There are many scenarios where you truly do need to share state between threads, and if a change creates a new object, both threads will need to be synchronized in some manner to both use the new object.

Command-query separation is something that I try to adhere to in my day to day programming. As Meyer mentions, I feel that the referential transparency helps with the readability and maintainability of the code. When hidden side effects happen from a query operation, it can be very difficult to tell from reading a client of the guilty method. The only time that I return a value from a state-modifying command is when it’s some sort of error code or success/failure flag.

Eiffel’s nonconforming inheritance sounds, which disallows polymorphism, sounds extremely bizarre to me. I definitely plan on reading up on Meyer’s decision on doing things that way in Eifel since he seems to have many interesting ideas around OO design from various books and articles of this that I’ve read, though I do often disagree with him or find his coverage to be a bit academic and verbose.

Meyer mentioned a benefit of OO languages being that they support dynamic binding and thus “remove the need for client classes to perform multibranch discriminations to perform operations.” Although the pattern matching statements that functional languages like OCaml provide are extremely powerful, I find that they can be much more verbose and, as a result, more confusing compared to a solution that used dynamic dispatch.

Wednesday, October 7, 2009

Refactoring Sequential Java Code for Concurrency via Concurrent Libraries

I’m not sure if there is any task in software engineering that is easier to retrofit into an application than it is to implement at the start of the project that was designed for it. If the task, e.g. introducing parallelism, takes more time than the alternatives, e.g. writing sequential code, it may be wise from a cost/benefit analysis to go back and implement the more costly method only in the spots that start to cause you pain, e.g. excessive slowness, or once you cross the threshold where the cost of delaying the implementation of the more costly method overtakes the cost of the implementation of the costly method (I’m trying to set a world record for the most uses of the word “cost” in one sentence), which I believe is called “the last responsible moment” in the Lean methodology.

The big advantage of using parallel libraries is that very smart people spend a very long time solving very hard problems for you. The disadvantage is the APIs presented by a library are necessarily generic and may not be the absolutely most ideal solution to your problem. However, it is usually worth taking the minor hit of using a somewhat suboptimal solution in order to leverage the knowledge of the experts and avoid making costly mistakes yourself.

The disadvantage to having these types of refactorings done completely automatically is that there will always be some edge cases where for one reason or another you do not want to follow the usual pattern. Though I can’t think of any specific examples against automating the three refactorings presented, there’s *always* some exception to the rule. The advantage is that it saves you time & effort.

Beautiful Architecture Chapter 12: When the Bazaar Sets Out to Build Cathedrals

Making the call to completely redesign KDE to fully utilize the power of Qt 4 was, as the author says, a very risky decision. I have worked on a few projects that contemplated such a task, some which made the decision to do a rewrite and some which didn’t.

A project that made the decision to rewrite the system from scratch was a ColdFusion web application that was relatively feature complete. The decision was made to rewrite the system using the most modern .NET framework, ORM tools, and AJAX. The decision makers felt that the project could not successfully continue on as a ColdFusion app due to the relative scarcity and skill set of CF developers compared to those who work in Java or .NET. A major factor in the decision to rewrite the system was the fact that a freeze has been put on existing functionality in the current CF system. This meant that we had a fixed target to shoot at for our base functionality.

A project that did not make the decision was a VB.NET (yuck) web application that had a crippling dependency on hundreds of stored procedures that were hundreds of lines long for all business logic. The complete absence of unit tests made it terrifying to have to implement any change to the system, no matter how small. The system was also constantly in a state of flux, adding dozens of new features every month. The decision was made to make a focused effort to re-architect and upgrade the system in place both because the ideal target framework for their developers was .NET, which entailed few fixes to upgrade from 2.0 to 3.5, and also because the amount of new functionality constantly being added to the system meant that we had a moving target to hit whose base functionality was constantly moving farther and farther away.

I think the bazaar model works for open source projects because the labor is free to the project owners. That means that different implementations of a solution can be implemented and accepted or rejected with no cost to the owners. With an enterprise application where someone is paying the bill for every hour spent on development, a more cathedral-like style of management is necessary up front to ensure that minimal time is wasted on solutions that would not be acceptable in the end project. That isn’t to say that cathedral-style projects never to any prototyping or always get it right on the first try, but the design and architectural guidance provided by the key members should keep waste to a minimum.

That said, I didn’t get the impression from this chapter that KDE doesn’t have strong cathedral influences. The chapter said that high-level design decisions would be made by a core set of decision makers on an infrequent basis and lower-level decisions would be made on a more frequent basis by a more distributed set of decision makers. Although the decision making isn’t entirely centralized, there still sounds like there is quite a bit of cathedral-style planning taking place.

I found that having the chapter focus on both the design process as well as technical implementation details gave quite a bit more context than other chapters, but I also feel that the volumes of information made the chapter feel as if it would never end. I found the ideas of keeping data and control pipelines separate & initially using IMAP to transfer data quite interesting, though I’m sure the former is likely a common practice.

I didn’t get as much from the ThreadWeaver section, nor was it talked about as much during the class discussion. Although it was noticeable shorter than the Akonadi section, it presented an interesting interface for parallelizing multiple instances of tasks while still making it easy for each task to execute sequentially.

Saturday, October 3, 2009

Beautiful Architecture Chapter 11: GNU Emacs

I think that a system like Emacs could be created in a closed-source system give the right integration points.

I don’t think Firefox will replace Emacs. It seems like an unnatural hack to develop Lips or C++ programs in Firefox. Sure it could be done, but that doesn’t mean that it should be done.

This chapter sucked. Thankfully it was shorter than the last few boring chapters & the other reading for the day was quite interesting. If I had to come up with something that was maybe useful it would be that simplicity and openness can be a plus for certain types of apps. The only thing that I found noteworthy in the whole chapter was how terrible LISP looks. After listening to the class discussion, I still wasn’t able to get anything useful from other people’s experience.

Thursday, October 1, 2009

A Java Fork/Join Framework

I was scared at first by Alexander's first question that this paper would be obsolete, but I was pleasantly surprised to find that the concepts in the paper and the concepts in the reading on the new Java functionality that he provided complemented each other nicely. I think fork/join and other similar patterns like the ones outlined in the OPL pattern will be necessary in modern languages. As parallelizability across threads, cores, processors, & machines becomes the main mode of increasing performance and scalability, the standard mutex and monitor paradigms offered by most languages today are insufficient to prevent the average developer from running into severe problems with deadlock, starvation, and other concurrency-related problems.

I think that it is definitely a worth-while trade-off to present developers with canned approaches to concurrency problems that may not align perfectly with their given tasks. C++ developers were initially skeptical that managed languages like Java and C# could ever perform well, but advances in computer hardware, garbage collection, and virtualization technologies to name a few have overcome most performance problems to make managed languages perform beyond the needs of typical developers. I think something similar will happen with concurrency-related patterns & frameworks; the need to have absolute control will be overshadowed by advances in hardware and software technology so that the general patterns will work well enough.

I'm honestly not sure if I would use the fork/join API presented by the author because I skipped all the code samples. I was more interested in the underlying ideas of how fork/join frameworks work, especially novel ideas such as work stealing. Also, since Java will be coming out with support for fork/join, I didn't see any need to take a look at an interface that won't make it beyond research.

I have the feeling that faster interfaces for just about anything can be developed for C++ than a managed language like Java. I'm sure this rule-of-thumb breaks down for some specific scenarios, but it seems reasonable in general. However, the convenience offered by Java in terms of automatic garbage collection and portability can often outweigh the performance gains that C++ would give you. It depends if you need a bleedingly fast system like something that processes trades for the New York Stock Exchange, or if you just need to crunch a lot of numbers for business reporting purposes and a general fork/join implementation would give you some performance gains over a single-threaded app. But I'm a consultant; my standard answer to questions like this is always "It depends."

I wasn't exactly clear on how forking, dividing work up, breaking it down, and stealing all worked. I got the feeling that some of the large blocks that got stolen would be dependent on smaller tasks that weren't (e.g. one of the big tasks at the top is merge sorting all the data and one of the smaller tasks is directly sorting 5% of the total data), but that doesn't seem like it would work at all so my understanding is wrong somewhere. I'll have to re-read the paper and listen to the class discussion to get a feel for the pros and cons of this model.

As for patterns in the paper, I believe that fork/join is a pattern in and of itself, though that's a bit of a cop-out answer.

I found it odd that the fork/join pattern becomes so problematic when many of the workers empty out their own queues and can't find much work to steal. I was naively thinking that using a monitor to block them and then signaling the monitor when data is ready would work, but I'll have to think a little harder on why that won't work.

This paper was much better than reading about how Guardian did a bad job of naming things and allocating bits :)