Building A High Performance Web Application
With the growth of popularity of web applications, supporting them inevitably begins to demand more and more resources. The first priority measures to cope with the increasing load include optimizing the algorithms and architecture of the application. But, what if all that can be optimized is already optimized, and the bottleneck is caused by the performance of the database? The solution is to resort to more complex techniques such as parallelization. This article discusses parallelization and, more specifically, the problem of configuring master-slave database replication.
Leading a project for a European customer, Iteratia company provides full support of an online service that we developed a couple of years ago. In a nutshell, this resource is a directory of data with a rather simple structure and a large volume of information.
The amount of data in the database of this online service has grown from a few thousand to more than 10 million records within 2 years. Such a rapid development of the resource led a code that worked fine with the initial number of records in the database to work ten times slower as the number of records increased to about a million. Moreover, the operating speed was degrading even faster than the speed at which the database was growing. In addition to slowing down the time of executing the requests, the number of requests was also growing; more search engines were displaying links to our service, more users were being directed to the service pages and, more page views were being served. Immediate measures were taken to eliminate the effects of the increased load - the programmers optimized the code taking into account expected traffic growth.
This solution improved the situation for some time, but this method of "optimization" has its limits. All subsequent steps of rewriting the code began to demand more and more resources (both time and money), while the system performance increased less and less. Therefore, when code optimization was no longer possible, and the speed of the database continued to fall with its growth, the need arose to find other ways to solve the problem. Our programmers were engaged in finding the solution while the database size continued to grow to 3 million records.
The next step after the code optimization was to improve the capacity of our equipment. The database was moved to a separate, powerful server. However, the growing traffic meant that even this solution was not enough. Each subsequent acquisition of more productive hardware became exponentially more expensive, while the incremental performance of the system was still insignificant. We inefficiently spent a lot of money and inevitably reached the need for parallelization.
While the issue of web server parallelization can be solved quite simply, in the case of a database, it is all a little more complicated. We can almost infinitely distribute the load and cope with the slow work of web servers by launching more and more identically configured machines and balancing the incoming traffic between them, but the bottleneck in this whole scheme is still the database. No matter how many web servers we have - 1, 5, or even 10 - all of them send queries to the same database deployed on a single server.
An effective solution in such a situation is parallelization of the work with the database, an integral part of which is data replication - the synchronization mechanism of multiple copies of an object. As the data is updated on the master server, all copies of the data on the slave servers must be updated at the same time, so that only one version of the data is represented for all users at any time.
Manufacturers of modern DBMS’s offer ready-made replication solutions, but in any working system can possibly introduce "errors" leading to system instability. A failure in the hardware, a long break in the network connection, and various failures and malfunctions in the operating system can all lead to a failure of the data replication entailing desynchronization and requiring immediate action to remove the source of problem and eliminate its consequences.
In other words, the replication process needs to be constantly monitored. If a problem is detected, programmers need time to find the source of a failure, remove and resynchronize the data, and restart the replication, which can take many hours when dealing with large databases. Most importantly, during such failures, a certain amount of incoming requests will not be served, which is absolutely unacceptable for a high-demand online resource.
In addition to data replication, another quite important issue is the organizing of data backups. First of all, this task requires significant effort on the part of the database administrator (accompanied by money to pay him). In order to get rid of some of the routine tasks, administrators automate the creation of backups as much as possible with the help of scripts. However, the reliability of such "helper" programs is questionable because such things are written "for himself" without thorough error handling and proper testing procedures. Any failure of such small utilities can lead to disruption of the entire system of backups. In other words, the system of backups, just like database replication, requires careful monitoring. Finally, performing backups using the built-in DBMS commands of a full data dump may conflict with the entire service because of the data blockage needed for the period of its copying, like, for example, what happens in the popular DBMS PostgreSQL.
Most large systems that deal with similar loads to us have dedicated DBA staff consisting of highly qualified employees to do all that work. We also tried to go down this route, but the organization of such a process takes time and requires a considerable amount of money, so we were faced with the question of how to optimize the cost of server maintenance.
The solution that allowed us to avoid the potential problems described above was to delegate data replication and backup to the service scalr.com. With a high degree of reliability, for a relatively modest fee, it provides database scalability and backup automation services, and it is faster, safer and, most importantly, cheaper than any specialist. This is user-ready software, developed by highly qualified programmers specifically for solving such a class of tasks and tested by thousands of users. Services of the scalr.com system are already used by more than 7,000 companies, among which, giants such as Oracle, Nokia, Samsung, and Walt Disney can be seen.
With the help of this service, you can create a so-called "server farm": All the client has to do is specify the desired minimum and maximum number of servers and the schedule of the backups. Then the right amount of servers is allocated, the user is provided with DNS endpoints, and everything else is done automatically. They make backups and restore everything as needed; if a server fails, they copy the data, launch a new one, and add it to the list of load distribution. At the minimum rate, the service costs only $100 per month, which, in fact, is less than a day’s pay for even the least professional system administrator, who does the same things with less reliability. This system is fully automated.
So, how does it work? The whole system is based on the Amazon Cloud, so you do not need to buy your own servers. It does, in fact, mean that the scalr.com service is suitable only for those customers who already have their system deployed on Amazon servers (like us) or those who are ready to move it there; otherwise, DBMS connection delays will be too high. In addition, not all the typical replication scenarios are supported. For example, you cannot make a strong main server and a small cheap “Hot Standby” solely for the purpose of a live backup, because scalr.com produces an identical hardware configuration for all the servers in the farm.
Such limitations suggest that the ready-made solution that Scalr.com provides is not ideal. Some specific requirements of the system may not suit a particular user. Nevertheless, the automated service is a simple and effective solution that will work in many cases.
In our case, problems of replication and backups imposed by parallelization of the work with the database have been safely and effectively solved by the transition to scalr.com’s services. Scalr has enabled us to use the user-ready reliable automated solution for those two tasks, reduce our monetary and time expenses that could be much higher in the case of alternative solution development, and it has allowed us to forget about regular monitoring and tuning of the servers. This now gives us confidence in the high availability of our online service.