Building complex queries in Key/Value ecosystems

In the last few years I have had to deal with complex systems and their issues like the C10K Problem, hard to scale infrastructures, asynchronous transaction handling in heterogeneous environments and above all else – Database performance issues. Simple solutions are available for most bottlenecks, but deciding how and where to place the state in an distributed environment can be really hard.

Google introduced Map Reduce, which offers a kind of standardization. The basic idea of divide and conquer (D&C) has existed for years, but there were no standardized APIs to execute higher level tasks. Google invented its own filesystem (GFS), in which data is stored in a distributed system and in memory for performance benefits. They also implemented an interface for the distributed storage called Bigtable which uses MapReduce for query entries. This is also built upon the Google File System.

Several groups of NoSQL Databases emerged. They followed the principle of Bigtable with eventually consistent and distributed storage. The concept behind an in-memory storage and distributed system is a trade-off between consistency and availability. The aim of this is to enforce strong consistence only where it’s required. Such Projects include Project Voldemort (Persistent),  Hazelcast (Non-Persistent) and Scalaris (Non-Persistent).

When we have to work without SQL, a number of other problems arise; How do we build a search system? How do we execute complex queries? A big subset of the issues can be covered by merge-joins. I really recommend taking a look at: Building Scalable, Complex Apps on App Engine.