Wednesday, September 2, 2009

Beautiful Architecture Chapter 3: Architecting for Scale

Jackpot! Just the type of reading I'd been yearning for from this book. Though the last chapter was quite satisfying, it presented the information in a more abstract way than the Darkstar chapter and was mostly just preaching to the choir.

I found this chapter very interesting because I found many useful similarities to problems I face as an enterprise developer, but the problem was different enough to make it more interesting than my day job.

I thought the geographic partitioning idea to be brilliantly simple. It's obvious to me that with more typical applications and data, users should connect to the server closest to them to help both decrease their latency and balance the load on the system. However, in the interactive and social world of online gaming, I found the idea of having geographical regions in the game correspond to different servers very clever.

The design of the architecture seemed to do a great job in giving the Darkstar team the utmost flexibility in reconfiguring and reimplementing the system with no impact on the clients, though their abstraction ended up being slightly leakier than they had hoped. I found it interesting that transparent to the game servers and clients, different nodes in the system could easily change which Darkstar instances were handling different events based on load, latency, or any other factor.

There were a few big shockers for me. The first thing that surprised me is that their tasks are supposed to be very short lived, with a maximum lifetime of 100ms by default. I would think that the overhead of having so many tasks needing to be distributed, loaded, and unloaded from different cores would be crippling. I was also amazed that the Darkstar team theorized that making each task and all data persistent would not impact their latency give a enough cores and a set of tasks that are easily parallelizable. Although I thought the idea was clever from a fault-tolerance view point, I didn't fully grasp why persistence was a prerequisite for parallelizabilty of the tasks as the author stated toward the end of the "Parallelism and Latency" section: "Remember that by making all of the data persistent, we are enabling the use of multiple threads..."

No comments:

Post a Comment