At ShuttleCloud, we’ve developed a distributed platform than can handle very high loads. This has given me a good knowledge of distributed systems from the perspective of the practitioner. We’re using, for example,CouchDb, pacemaker and corosync, Amazon RDS, Rabbitmq, etc.
However, implementing this kind of software is a completely different beast. It’s a broad and complex field. Most tech people I admire and follow on twitter are working in that field and you can see that is really difficult to keep up. There are too many things to learn! 😀
Lately I’ve been reading a lot about the subject, but reading is not enough. If you want to learn something, you had better start using it. That’s why I’ve decided to implement an In-Memory key-value Store (a very simple one).
The objective is to learn things like:
- Should I implement a Write-ahead log so that the DB can recover from a crash?
- Should I implement a Log-structured to store the values so that the database is not limited by the RAM (well, keys have to fit in memory)?
- How can the database scale reads? and writes? Do I need sharding? Replicas? Should it be a Leader based replication?…..
- How can the data be replicated in different machines?
- How does it know when the Leader has a failure?
My idea is to write post about my decisions on those questions and to publish the implementation in this repo, so that I can learn from other people.