Event Driven Elasticsearch Bulk Updates

25 January 2017 on . 2 minutes to read

Always update elasticsearch in bulk

You’ve got or are considering elasticsearch hooked up to bring some real speed back to your application, but have hit a snag. Each time a model is updated, it’s reindexed, but doing these operations one at a time has added a huge load to your application, database, and elasticsearch instance because of highly inefficient processing. Sit back, and read how I implemented a fix for this on the Searchkick gem, that brough efficiency back to the reindexing process.

We had the need to be able to handle large updates to our document set, but also to ensure that any updates which needed to happen were immediately picked up. This second requirement removed a scheduled job from our list of possibilities, and since polling redis for updates seemed like a terrible idea, left me with an event driven design. (Note, since Searchkick 2 has been released, very similar functionality, though not quite as fast as my custom implementation here, is now part of the gem) What follows is a close approximation of what was implemented.

High Level:

  • Insert Records IDs to a Redis Set
  • After inserting ID, queue up a Searchkick::ForemanJob
  • Searchkick::Foreman job takes chunks of ids and sends to a Searchkick::BulkUpdaterJob
  • BulkUpdaterJob sends bulk requests to elasticsearch

Callback

#models/products.rb
after_commit :add_id_to_redis_queue

def add_id_to_redis_queue
  Redis.new.sadd :products, id
end

Foreman Job

#jobs/searchkick/foreman_job.rb
def perform
  redis = Redis.new
  product_ids = redis.smembers :products
  product_ids.each_slice(Product.searchkick_index.options[:batch_size] || 1000) do | product_id_batch |
    Searchkick::BulkUpdaterJob.perform_later(product_id_batch)
    redis.srem :products, product_id_batch
  end
end

Bulk Updater Job

#jobs/searchkick/bulk_updater_job.rb
def perform(product_ids)
  products = Product.where(id: product_ids).search_import
  Product.searchkick_index.import(products) #This is a batch operation
end

Wrap Up

Put all together, this leverages redis sets to enforce uniqueness of added keys and Searchkick’s bulk import to both efficiently load data from your database, and then send it off to elasticsearch. This will provide a good boost in speed to reindexing a single model, but really shines when data from other models is being denormalized on the search document. The reason why is to do so requires loading information from associated models, so doing bulk loads from the database dramatically reduces the resource load on your Postgres instance, while at the same time reducing the load on elasticsearch.

Is your current search solution lacking in speed or causing extreme load on your application’s resources and you’d like an expert to check it out? Reach out and I’d be happy to discuss possible solutions to bring your project back up to speed.


If you enjoy having free time and the peace of mind that a professional is on your side, then you’d love to have me work on your project.

Contact or view a list of available services to see how I’ll make your life better, easier and bring satisfaction back into you running your business.