You’ve got or are considering elasticsearch hooked up to bring some real speed back to your application, but have hit a snag. Each time a model is updated, it’s reindexed, but doing these operations one at a time has added a huge load to your application, database, and elasticsearch instance because of highly inefficient processing. Sit back, and read how I implemented a fix for this on the Searchkick gem, that brough efficiency back to the reindexing process.
We had the need to be able to handle large updates to our document set, but also to ensure that any updates which needed to happen were immediately picked up. This second requirement removed a scheduled job from our list of possibilities, and since polling redis for updates seemed like a terrible idea, left me with an event driven design. (Note, since Searchkick 2 has been released, very similar functionality, though not quite as fast as my custom implementation here, is now part of the gem) What follows is a close approximation of what was implemented.
- Insert Records IDs to a Redis Set
- After inserting ID, queue up a Searchkick::ForemanJob
- Searchkick::Foreman job takes chunks of ids and sends to a Searchkick::BulkUpdaterJob
- BulkUpdaterJob sends bulk requests to elasticsearch
#models/products.rb after_commit :add_id_to_redis_queue def add_id_to_redis_queue Redis.new.sadd :products, id end
#jobs/searchkick/foreman_job.rb def perform redis = Redis.new product_ids = redis.smembers :products product_ids.each_slice(Product.searchkick_index.options[:batch_size] || 1000) do | product_id_batch | Searchkick::BulkUpdaterJob.perform_later(product_id_batch) redis.srem :products, product_id_batch end end
Bulk Updater Job
#jobs/searchkick/bulk_updater_job.rb def perform(product_ids) products = Product.where(id: product_ids).search_import Product.searchkick_index.import(products) #This is a batch operation end
Put all together, this leverages redis sets to enforce uniqueness of added keys and Searchkick’s bulk import to both efficiently load data from your database, and then send it off to elasticsearch. This will provide a good boost in speed to reindexing a single model, but really shines when data from other models is being denormalized on the search document. The reason why is to do so requires loading information from associated models, so doing bulk loads from the database dramatically reduces the resource load on your Postgres instance, while at the same time reducing the load on elasticsearch.
Is your current search solution lacking in speed or causing extreme load on your application’s resources and you’d like an expert to check it out? Reach out and I’d be happy to discuss possible solutions to bring your project back up to speed.
If you enjoy having free time and the peace of mind that a professional is on your side, then you’d love to have me work on your project.