Seattle Conference on Scalability: Scalable Wikipedia with E


Seattle Conference on Scalability: Scalable Wikipedia with E

Google Tech Talks
June 14, 2008


IGlobal online services at Amazon, eBay, Myspace, YouTube, or Google serve millions of customers with tens of thousands of servers located throughout the world. At this scale, components fail continuously and it is difficult to maintain a consistent state while hiding failures from the application. Peer-to-peer protocols provide availability by replicating services among peers, but they are mostly limited to write-once/read-many data sharing. To extend them beyond the typical file sharing, the support of fast transactions on distributed hash tables (DHTs) is an important yet missing feature.

We will present a distributed key/value store based on a DHT that supports consistent writes. Our system comprises three layers:

– a DHT layer for scalable, reliable access to replicated data,
– a transaction layer to ensure data consistency in the face of concurrent write operations,
– an application layer with an extremely high access rate.
For the application layer, we selected a distributed, scalable Wiki with full transaction support. We will show that our Wiki outperforms the public Wikipedia in terms of served page requests per second and
we will discuss how the development of the distributed code benefited from the use of Erlang.

This is joint work of Zuse Institute Berlin and onScale solutions GmbH.

Speaker: Thorsten Schuett, Zuse Institute Berlin
Thorsten Sch?tt is a senior researcher with the Zuse Institute Berlin (ZIB) and a co-founder of onScale solutions GmbH. He received a CS diploma with distinction in 2002 from the Technical University Berlin. Since then he works as a research staff member in the Computer Science Research Department at ZIB and participates in several EU projects like GridLab, XtreemOS and Selfman. He is the principal system architect of the scalable, transactional key/value store at ZIB. His research interests include distributed data management, scalable grid systems, p2p algorithms and self-managing transactional
storage systems.

Slides for this talk are available at