Notes: Designing Data Intensive Applications
Chapter 1. Trade-offs in Data System Architecture
"There are no solutions; there are only trade-offs. [...] But you try to get the best trade-off you can get, and that's all you can hope for."
Thomas Sowell, interview with Fred Barnes (2005)
Data is central to much application development today. It has become normal to store data from many different users in a shared server-based data infrastructure. As users interact with an application, they both read the data that is stored and generate more data.
As the data volumes or the rate of queries grows, it needs to be distributed across multiple machines, which introduces many challenges.
We call an application data-intensive if data management is one of the primary challenges in developing the application. While in compute-intensive systems the challenge is parallelizing a very large computation, in data-intensive applications we usually worry more about things like storing and processing large data volumes, managing changes to data, ensuring consistency in the face of failures and concurrency, and making sure services are highly available.
Building blocks: 1. Databases: store data so that they, or another application, can find it again later;
2. Caches: remember the result of an expensive operation, to speed up reads;
3. Search indexes: allow users to search data by keyword or filter it in various ways;
4. Stream processing: handle events and data changes as soon as they occur;
5. Batch processing: periodically crunch a large amount of accumulated data;
One of the key challenges with data systems is that different people need to do very different things with data. This chapter compares several contrasting concepts and explores their trade-offs. We will consider the following topics:
The difference between operational and analytical systems
The pros and cons of cloud services and self-hosted systems
When to move from a single-node system to distributed systems
Balancing the needs of the business and the rights of the user














