110 million comments from Hacker News: medium data full-text / analytics test

Intro

In this test we use the data collection of 1.1M Hacker News curated comments with numeric fields from https://zenodo.org/record/45901 multiplied by 100. 110 million documents can be considered a medium size data set in the modern world. You you can meet similar size datasets on big blogs and news sites, big online stores, classifieds and so on. It’s typical for such applications to have:

  • not very long textual data in one or multiple fields
  • and a number of attributes

1.1 million comments from Hacker News: small data full-text / analytics test

Intro

In this test we use the data collection of 1.1M Hacker News curated comments with numeric fields from https://zenodo.org/record/45901. In the modern world 1 million of documents can be considered a very small data set which, however, can be typical for many applications: blogs and news sites, online stores, job, automotive and real estate sites and so on. It’s typical for such applications to have:

  • not very long textual data in one or multiple fields
  • and a number of attributes