Benchmarking NoSQL solutions
SQL world is easier to understand as compared to NoSQL world. NoSQL landscape is very confusing with multitude of solutions loaded with buzzwords. NoSQL solutions are often divided into categories based on solution type like key value, document or columnar. Sometimes they are also divided based on CAP theorem.
However, it's quite hard to classify an application as key-value or document or columnar. It is very much possible to architect an application to use any kind of nosql database. For example, one can store document as json in a key value store database. Or one can decompose the document into columns and store them in a columnar database. Similarly, a document or columnar database can easily fit in for a key-value solution.
This blog is focused on benchmark performance results between 3 popular NoSQL solutions - Redis, Couchbase, Cassandra. The test was done using the following scenario
DB server was running on AWS m4.large instance
DB Client was running on AWS c4.xlarge instance
Only a single DB server was used to compare performance in single instance solution
Both server and client were in same availability zone
All DB solutions were used as key-value store.
Data was written into database first, compacted, optimized and then test was 100% read workload
DB client was larger than DB server to ensure that server can be pushed to limits
First test was done with just 1,000 entries into the database
Redis was able to give 50k ops/sec, couchbase maxed out at 29k ops/sec, cassandra maxed out at 18k ops/sec. However, the biggest difference was in response times, where redis performed around 150 microseconds per response, cassandra starting reaching 1000 microseconds (1ms) for simple key value read.
Next, the test was done using 500k entries in the database
Here, Redis and Couchbase maxed out at 16k ops/sec. Redis response time was 250 microseconds compared to 375 microseconds for Couchbase. Cassandra started going beyond 1ms when pushed beyond 9000 ops/sec.
So far, all the data could be stored in RAM, since the data size (+indexes) was less than RAM size. So, all the databases were reading from RAM. Going further ahead on number of entries required configuring Redis to use swap along with RAM which is not recommended by Redis. The next test was done using 3M objects, each with 4k size. The total data size was approx 12GB and indexing (on primary key) led to approx 16 GB database size on a 8GB RAM machine. The reads were spread randomly across the 3M objects. Swap used was coming from EBS-SSD. Note that only Redis used Swap, Couchbase and Cassandra were not configured with Swap.
Cassandra was too slow to even compare. So, I dropped it out of the test after first read. Redis performed faster than Couchbase.
However, an interesting thing happened while I was prepping up for 3M test. I ran out of disk space while testing Couchbase. Couchbase became unresponsive, so I rebooted the machine. Once rebooted, couchbase failed to start and read the data from disk. I spent hours trying to get it to read the data, but ultimately, it led to data loss with Couchbase.
Next, I did the same 3M entries test, but read was confined to only 5% of the entries. So, the DB was storing 3M objects, but only 125,000 objects were read from DB.
Here, performance looked similar to 500K test. Redis with the help of Linux OS swap management was able to load keys into RAM. Couchbase already has logic to put cache in memory. However, Redis did outperform couchbase on both ops/sec and response times.
Summary:
Cassandra is quite slow as compared to Redis & Couchbase
Couchbase process list is huge and many components working together. It looks like a product built by acquisition. It's stability for production use is questionable.
Redis is blazing fast. It provides much more ops/sec on similar hardware. So, ops/$ spent is very good for Redis.
Redis Labs already has product for storing cold values on Flash. Hopefully, AWS will provide Flash as cheaper memory, expensive disk option sometime in future.
Elasticache now supports Redis Cluster. If your application can afford (from $ perspective) to put data into RAM, consider that option strongly.















