Searchkick 2: How to Quickly Reindex Your Elasticsearch Models

25 January 2017 on . 4 minutes to read


You’ve got your app hooked up to elasticsearch for some blazing speeds, but all of a sudden, your elasticsearch cluster comes under heavy load. The problem? Each model update is being sent to elasticsearch solo, putting huge load on your cluster. See, the best way to keep elasticsearch in sync is with bulk indexing. Luckily for you, Searchkick now supports this, so you don’t have to implement it manually anymore.

Searchkick 2 introduces a great way to keep your elasticsearch cluster in sync with your models: queuing. In prior versions, this had to be implemented manually; having it built into the gem is a huge time saver. In this article, I’ll run through the setup, and crunch some numbers as well.


Searchkick.redis = { }
class Product < ActiveRecord::Base
  searchkick callbacks: :queue
searchkick_worker: bundle exec sidekiq -c5 -qsearchkick

Manual Test

To verify your setup is correct:

  • Start your procfile (or run bundle exec sidekiq -c5 -qsearchkick in a console tab)
  • Add some ids to the queue "searchkick:reindex_queue:products_development", Product.limit(3000).pluck(:id)
  • Start the queueing job Searchkick::ProcessQueueJob.perform_later(class_name: "Product")

If successful, you’ll see some Searchkick::ProcessQueueJobs and Searchkick::ProcessBatchJobs kick off in your sidekiq worker.

Reindex Everything

What follows is a quick benchmark against a local elasticsearch instance. The comparison will be timing batched updates for 20 thousand records.

Rails.logger.level = :info
records = Product.order(:id).limit(20000);
single = Benchmark.measure {Product.where("id <= ?",}
bulk = Benchmark.measure {Searchkick.callbacks(:bulk) {Product.where("id <= ?",}}

# Note, the following only measure starting the job and adding the IDs to redis. Job times will be added later.
batch_worker = Benchmark.measure { "searchkick:reindex_queue:products_development", records.pluck(:id) #The key name for redis
  Searchkick::ProcessQueueJob.perform_later(class_name: "Product")

single        #<Benchmark::Tms:0x007ffbd7184e00 @label="", @real=69.67301190498983, @cstime=0.0, @cutime=0.0, @stime=11.990000000000002, @utime=28.52000000000001, @total=40.51000000000001>
bulk          #<Benchmark::Tms:0x007ffbd8406678 @label="", @real=14.314875675016083, @cstime=0.0, @cutime=0.0, @stime=5.420000000000002, @utime=8.309999999999988, @total=13.72999999999999>
batch_worker  #<Benchmark::Tms:0x007ffbc30d1ff8 @label="", @real=0.11151637800503522, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.10999999999998522, @total=0.10999999999998522> # + (06:29.387 - 06:14.327): Time it took jobs to execute. 06:29.387 is time of completion for last job, 06:14.327 is start time for first job

single        28.520000  11.990000  40.510000 ( 69.673012)
bulk          8.310000   5.420000  13.730000 ( 14.314876)
batch_worker  0.110000   0.000000   0.110000 (  0.111516) # Add 15.06 (time for all jobs to complete) = 15.17 (total time)

So, any either of those methods will be much quicker than than using the classic, inline reindex callbacks.

Other Thoughts

Prior to Searchkick 2, this functionality had to implemented manually, so having it baked into the gem is a huge added bonus, for multiple reasons. First, it’s standardized and open source. Second, moving it into a job allows you to monitor via the sidekiq web interface and will also notify and retry should anything go wrong. There are a couple places to improve the background job method, which I’ve used before to implement very similar functionality in Searchkick 1.5.1. I’ll have some PR’s along to add those improvements, hopefully soon. Regardless, Ankane does a fantastic job with the Searchkick gem; it’s hands down the best way to use elasticsearch with Rails. The reasoning behind that as well as the PRs will be featured in an upcoming post.

