About db-benchmarks project


https://db-benchmarks.com aims to make database and search engines benchmarks:

โš–๏ธ Fair and transparent - it should be clear under what conditions this or that database / search engine gives this or that performance

๐Ÿš€ High quality - control over coefficient of variation allows producing results that remain the same if you run a query today, tomorrow or next week

๐Ÿ‘ช Easily reproducible - anyone can reproduce any test on their own hardware

๐Ÿ“Š Easy to understand - the charts are very simple

โž• Extendable - pluggable architecture allows adding more databases to test

And keep it all 100% Open Source!

This repository provides a test framework which does the job.

Why is this important?

Many database benchmarks are not objective. Others don’t do enough to ensure results accuracy and stability, which in some cases breaks the whole idea of benchmarks. A few examples:

Druid vs ClickHouse vs Rockset

https://imply.io/blog/druid-nails-cost-efficiency-challenge-against-clickhouse-and-rockset/ :

We actually wanted to do the benchmark on the same hardware, an m5.8xlarge, but the only pre-baked configuration we have for m5.8xlarge is actually the m5d.8xlarge … Instead, we run on a c5.9xlarge instance

Bad news, guys: when you run benchmarks on different hardware, at the very least you can’t then say that something is “106.76%” and “103.13%” of something else. Even when you test on the same bare-metal server, it’s quite difficult to get a coefficient of variation lower than 5%. A 3% difference on different servers can most likely be ignored. Given all that, how can one make sure the final conclusion is true?

Lots of databases and engines


Mark did a great job making the taxi rides test on so many different databases and search engines. But since the tests are made on different hardware, the numbers in the resulting table aren’t really comparable. You always need to keep this in mind when evaluating the results in the table.

ClickHouse vs others


When you run each query just 3 times, you’ll most likely get very high coefficients of variation for each of them. Which means that if you run the test a minute later, you may get a variation of 20%. And how does one reproduce a test on one’s own hardware? Unfortunately, I can’t find how one can do it.