Performance Testing a Postgres Database vs Elasticsearch 5: Column Statistics

24 January 2017 on . 2 minutes to read

This is the first post on benchmarking a postgres database vs a (1 node) elasticsearch instance. The subject of this test are numeric column statistics, based on 10 Million products inserted into both the database and elasticsearch index.

Up to date list of articles diving into my ecommerce performance investigations:

Rails.logger.level = :info

Benchmark.ips do |x|
  column = :brand_id"Product Brand ID Elasticsearch Stats") {Product.elasticsearch_stats(column)}"Product Brand ID PG Stats") {Product.pg_stats(column)}!
Warming up --------------------------------------
Product Brand ID Elasticsearch Stats
                        42.000  i/100ms
Product Brand ID PG Stats
                         1.000  i/100ms
Calculating -------------------------------------
Product Brand ID Elasticsearch Stats
                        451.179  (± 8.4%) i/s -      2.268k in   5.066563s
Product Brand ID PG Stats
                          3.249  (± 0.0%) i/s -     17.000  in   5.236520s

Product Brand ID Elasticsearch Stats:      451.2 i/s
Product Brand ID PG Stats:        3.2 i/s - 138.86x  slower

Point, blouses Elasticsearch.

Further Breakdown:

column = :brand_id

Product Search (3.6ms)  curl http://localhost:9200/products_development/_search?pretty -d '{"query":{"match_all":{}},"size":0,"from":0,"timeout":"11s","_source":false,"aggs":{"brand_id_stats":{"stats":{"field":"brand_id"}}}}'
  "avg"   => 502.5,
  "count" => 1000000,
  "max"   => 1002.0,
  "min"   => 3.0,
  "sum"   => 502500000.0

   (115.7ms)  SELECT AVG("products"."brand_id") FROM "products"
   (75.1ms)  SELECT COUNT(*) FROM "products"
   (0.3ms)  SELECT MAX("products"."brand_id") FROM "products"
   (0.2ms)  SELECT MIN("products"."brand_id") FROM "products"
   (110.1ms)  SELECT SUM("products"."brand_id") FROM "products"
  :avg   => 502.5,
  :count => 1000000,
  :max   => 1002,
  :min   => 3,
  :sum   => 502500000

As you can see here, the operational time for all the complex queries take an order of magnitude longer than the entire elasticsearch query. The min and max queries are quick, but I’d bet that’s because these operations were performed on an indexed numerical column. I’ll update this post in the future against a non indexed column for comparison.

If you enjoy having free time and the peace of mind that a professional is on your side, then you’d love to have me work on your project.

Contact or view a list of available services to see how I’ll make your life better, easier and bring satisfaction back into you running your business.