About db-benchmarks project
https://db-benchmarks.com aims to make database and search engines benchmarks:
⚖️ Fair and transparent - it should be clear under what conditions this or that database / search engine gives this or that performance
🚀 High quality - control over coefficient of variation allows producing results that remain the same if you run a query today, tomorrow or next week
👪 Easily reproducible - anyone can reproduce any test on their own hardware
📊 Easy to understand - the charts are very simple
➕ Extendable - pluggable architecture allows adding more databases to test
And keep it all 100% Open Source!
This repository provides a test framework which does the job.
Why is this important?
Many database benchmarks are not objective. Others don’t do enough to ensure results accuracy and stability, which in some cases breaks the whole idea of benchmarks. A few examples:
Druid vs ClickHouse vs Rockset
We actually wanted to do the benchmark on the same hardware, an m5.8xlarge, but the only pre-baked configuration we have for m5.8xlarge is actually the m5d.8xlarge … Instead, we run on a c5.9xlarge instance
Bad news, guys: when you run benchmarks on different hardware, at the very least you can’t then say that something is “106.76%” and “103.13%” of something else. Even when you test on the same bare-metal server, it’s quite difficult to get a coefficient of variation lower than 5%. 3% difference on different servers can highly likely be ignored. Provided all that, how can one make sure the final conclusion is true?
Lots of databases and engines
Mark did a great job making the taxi rides test on so many different databases and search engines. But since the tests are made on different hardware, the numbers in the resulting table aren’t really comparable. You always need to keep this in mind when you evaluate the results in the table.
ClickHouse vs others
When you run each query just 3 times, you’ll most likely get very high coefficients of variation for each of them. Which means that if you run the test a minute later, you may get a variation of 20%. And how does one reproduce a test on one’s own hardware? Unfortunately, I can’t find how one can do it.