Our belief is that a fair database benchmark should follow some key principles:
✅ Test different databases on exactly same hardware
Otherwise you at least can’t appeal to little percents differences in test results.
✅ Test with full OS cache purged before each test
Otherwise you can’t test cold queries.
✅ Database which is being tested should have all it’s internal caches disabled
Otherwise you’ll measure cache performance.
✅ Best if you measure a cold run too. It’s especially important for analytical queries where cold queries may happen often
Otherwise you completely hide how the database can handle I/O.
✅ Nothing else should be running during testing
Otherwise your test results may be very unstable.
✅ You need to restart database before each query
Otherwise previous queries can still impact current query’s response time, despite clearing internal caches.
✅ You need to wait until the database warms up completely after it’s started
Otherwise you can at least end up competing with db’s warmup process for I/O which can spoil your test results severely.
✅ Best if you provide a coefficient of variation, so everyone understands how stable your resutls are and make sure yourself it’s low enough
Coefficient of variation is a very good metric which shows how stable your test results are. If it’s higher than N% you can’t say one database is N% faster than another.
✅ Best if you test on a fixed CPU frequency
Otherwise if you are using “on-demand” cpu governor (which is normally a default) it can easily turn your 500ms response time into a 1000+ ms.
✅ Best if you test on SSD/NVME rather than HDD
Otherwise depending on where your files are located on HDD you can get up to 2x lower/higher I/O performance (we tested), which can make at least your cold queries results wrong.