Why You Should Benchmark With Production Services: Redis Edition

25 January 2017

Learn How to Benchmark Redis with Ruby

Intro

There comes a time to benchmark your application and its services. However, if it’s not done properly, then your results can very misleading. In this post, I’ll throw some numbers at you to demonstrate the massive differences between benchmarking locally, on production, and locally pointing at a cloud resource. The numbers for each set of tests are different by an order of magnitude, demonstrating the importance of using the proper setup.

Read More...

Searchkick 2: How to Quickly Reindex Your Elasticsearch Models

25 January 2017

Intro

You’ve got your app hooked up to elasticsearch for some blazing speeds, but all of a sudden, your elasticsearch cluster comes under heavy load. The problem? Each model update is being sent to elasticsearch solo, putting huge load on your cluster. See, the best way to keep elasticsearch in sync is with bulk indexing. Luckily for you, Searchkick now supports this, so you don’t have to implement it manually anymore.

Searchkick 2 introduces a great way to keep your elasticsearch cluster in sync with your models: queuing. In prior versions, this had to be implemented manually; having it built into the gem is a huge time saver. In this article, I’ll run through the setup, and crunch some numbers as well.

Setup

#config/initializers/searchkick.rb
Searchkick.redis = ConnectionPool.new { Redis.new }
class Product < ActiveRecord::Base
  searchkick callbacks: :queue
end
#Procfile
searchkick_worker: bundle exec sidekiq -c5 -qsearchkick

Manual Test

To verify your setup is correct:

  • Start your procfile (or run bundle exec sidekiq -c5 -qsearchkick in a console tab)
  • Add some ids to the queue Redis.new.lpush "searchkick:reindex_queue:products_development", Product.limit(3000).pluck(:id)
  • Start the queueing job Searchkick::ProcessQueueJob.perform_later(class_name: "Product")

If successful, you’ll see some Searchkick::ProcessQueueJobs and Searchkick::ProcessBatchJobs kick off in your sidekiq worker.

Reindex Everything

What follows is a quick benchmark against a local elasticsearch instance. The comparison will be timing batched updates for 20 thousand records.

Rails.logger.level = :info
records = Product.order(:id).limit(20000);
single = Benchmark.measure {Product.where("id <= ?", records.last.id).find_each(&:reindex)}
bulk = Benchmark.measure {Searchkick.callbacks(:bulk) {Product.where("id <= ?", records.last.id).find_each(&:reindex)}}

# Note, the following only measure starting the job and adding the IDs to redis. Job times will be added later.
batch_worker = Benchmark.measure {
  Redis.new.lpush "searchkick:reindex_queue:products_development", records.pluck(:id) #The key name for redis
  Searchkick::ProcessQueueJob.perform_later(class_name: "Product")
}


single        #<Benchmark::Tms:0x007ffbd7184e00 @label="", @real=69.67301190498983, @cstime=0.0, @cutime=0.0, @stime=11.990000000000002, @utime=28.52000000000001, @total=40.51000000000001>
bulk          #<Benchmark::Tms:0x007ffbd8406678 @label="", @real=14.314875675016083, @cstime=0.0, @cutime=0.0, @stime=5.420000000000002, @utime=8.309999999999988, @total=13.72999999999999>
batch_worker  #<Benchmark::Tms:0x007ffbc30d1ff8 @label="", @real=0.11151637800503522, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.10999999999998522, @total=0.10999999999998522> # + (06:29.387 - 06:14.327): Time it took jobs to execute. 06:29.387 is time of completion for last job, 06:14.327 is start time for first job

single        28.520000  11.990000  40.510000 ( 69.673012)
bulk          8.310000   5.420000  13.730000 ( 14.314876)
batch_worker  0.110000   0.000000   0.110000 (  0.111516) # Add 15.06 (time for all jobs to complete) = 15.17 (total time)

So, any either of those methods will be much quicker than than using the classic, inline reindex callbacks.

Other Thoughts

Prior to Searchkick 2, this functionality had to implemented manually, so having it baked into the gem is a huge added bonus, for multiple reasons. First, it’s standardized and open source. Second, moving it into a job allows you to monitor via the sidekiq web interface and will also notify and retry should anything go wrong. There are a couple places to improve the background job method, which I’ve used before to implement very similar functionality in Searchkick 1.5.1. I’ll have some PR’s along to add those improvements, hopefully soon. Regardless, Ankane does a fantastic job with the Searchkick gem; it’s hands down the best way to use elasticsearch with Rails. The reasoning behind that as well as the PRs will be featured in an upcoming post.

Read More...

Performance Testing a Postgres Database vs Elasticsearch 5: Column Statistics

24 January 2017

This is the first post on benchmarking a postgres database vs a (1 node) elasticsearch instance. The subject of this test are numeric column statistics, based on 10 Million products inserted into both the database and elasticsearch index.

Up to date list of articles diving into my ecommerce performance investigations:



Rails.logger.level = :info

Benchmark.ips do |x|
  column = :brand_id
  x.report("Product Brand ID Elasticsearch Stats") {Product.elasticsearch_stats(column)}
  x.report("Product Brand ID PG Stats") {Product.pg_stats(column)}
  x.compare!
end
Warming up --------------------------------------
Product Brand ID Elasticsearch Stats
                        42.000  i/100ms
Product Brand ID PG Stats
                         1.000  i/100ms
Calculating -------------------------------------
Product Brand ID Elasticsearch Stats
                        451.179  (± 8.4%) i/s -      2.268k in   5.066563s
Product Brand ID PG Stats
                          3.249  (± 0.0%) i/s -     17.000  in   5.236520s

Comparison:
Product Brand ID Elasticsearch Stats:      451.2 i/s
Product Brand ID PG Stats:        3.2 i/s - 138.86x  slower

Point, blouses Elasticsearch.

Read More...

Intro to the Ecommerce SaaS Benchmark Application

24 January 2017

In my search for speed and scalability, I’ve had the pleasure to spend a lot of time recently with Elasticsearch. It’s fast, powerful and continually updated to make it better at all it does. Besides Elasticsearch, I have my eyes on other technologies such as RELC (Redis Labs Enterprise Cluster), Citus DB, and many others which are geared towards scalability and ultimate performance. As a consultant, much of what I do revolves around enabling businesses to make money quicker and more efficiently. The core of many businesses these days is ecommerce. As such, I’ve created a stubbed out Ecommer SaaS project which will be specifically used to benchmark various technologies and how they scale on different orders of magnitude.

As time progresses, I’ll collect more data, expand the application’s features to more closely mimic an actual ecommerce app so that we can investigate what effects different technologies, platforms and data sets will have on the app’s performance.

Up to date list of articles diving into my ecommerce performance investigations:



Read More...

The ABC of My Life: Always Be Constructing

10 January 2017

Always Be Closing for Productivity and Profit

For sales, there’s the classic line from Glengarry Glen Ross, “ABC: Always Be Closing”. It’s used as the mantra to drive their actions towards the end goal of more sales. Over the last 18 months (since I decided to become a consultant), I’d been living by my own ABC, though I hadn’t sat down and thought about it much til now. The ABC for my life is this: Always Be Constructing.

Read More...

Previous Page: 2 of 10 Next