Our belief is that a fair database benchmark should follow some key principles:
✅ Test different databases on exactly the same hardware
Otherwise, you should acknowledge an error margin when there are small differences.
✅ Test with full OS cache purged before each test
Otherwise you can’t test cold queries.
✅ Database which is being tested should have all its internal caches disabled
Otherwise you’ll measure cache performance.
✅ Best if you measure a cold run too. It’s especially important for analytical queries where cold queries may happen often
Otherwise you completely hide how the database can handle I/O.
✅ Nothing else should be running during testing
Otherwise your test results may be very unstable.
✅ You need to restart the database before each query
Otherwise, previous queries can still impact current query’s response time, despite clearing internal caches.
✅ You need to wait until the database warms up completely after it’s started
Otherwise, you may end up competing with the database’s warm-up process for I/O which can severely spoil your test results.
✅ Best if you provide a coefficient of variation, so everyone understands how stable your results are and make sure yourself it’s low enough
Coefficient of variation is a very good metric which shows how stable your test results are. If it’s higher than N%, you can’t say one database is N% faster than another.
✅ Best if you test on a fixed CPU frequency
Otherwise, if you are using “on-demand” CPU governor (which is normally a default) it can easily turn your 500ms response time into a 1000+ ms.
✅ Best if you test on SSD/NVME rather than HDD
Otherwise, depending on where your files are located on HDD you can get up to 2x lower/higher I/O performance (we tested), which can make at least your cold queries results wrong.