05 February 2017
Spree and Deface
Spree is a very powerful and highly configurable Rails based ecommerce solution. It’s loaded full of awesome features which will let you handle most any situation or special edge case you may have for your business needs. From a user’s perspective, it’s awesome. In the current version of the Spree guide for customizing view, the recommendation is to use deface for these customizations. Deface is very powerful and aims to make Spree upgrades seamless. Unfortunately, it adds extra complexity and will make your codebase diverge from the standard Rails conventions. These conventions are part of what make efficient development using Rails possible, so why would you get away from that? Looking for a customized ecommerce solution? Spree is fantastic, shoot me a message if you’d like to discuss your needs.
In this post I’ll run through a quick example of what using Deface looks like, and then propose we return to the built in Rails functionality of overriding views as a means to have a more useable and efficient codebase.
01 February 2017
In my last post, I dove into sub aggregations. This time, we’ll look at aggregations, which are the elasticsearch equivalent of SQL’s count group. If you’ve seen ecommerce sites where they say how many products belong to each category or brand, this is most likely how they accomplish that quickly and at scale.
29 January 2017
Elasticsearch is a fantastic way to store denormalized data for searching or serving up as an API in order to reduce database load. Those cases are just the surface of what elasticsearch has to offer. The next step in is using aggregations (formerly known as filters). Though a simple terms count aggregation (very similar to
count(*) distinct in sql) is a great place to start, I’m going to dive into something more complex and powerful: Sub Aggregations.
25 January 2017
You’ve got or are considering elasticsearch hooked up to bring some real speed back to your application, but have hit a snag. Each time a model is updated, it’s reindexed, but doing these operations one at a time has added a huge load to your application, database, and elasticsearch instance because of highly inefficient processing. Sit back, and read how I implemented a fix for this on the Searchkick gem, that brough efficiency back to the reindexing process.
We had the need to be able to handle large updates to our document set, but also to ensure that any updates which needed to happen were immediately picked up. This second requirement removed a scheduled job from our list of possibilities, and since polling redis for updates seemed like a terrible idea, left me with an event driven design. (Note, since Searchkick 2 has been released, very similar functionality, though not quite as fast as my custom implementation here, is now part of the gem) What follows is a close approximation of what was implemented.
- Insert Records IDs to a Redis Set
- After inserting ID, queue up a Searchkick::ForemanJob
- Searchkick::Foreman job takes chunks of ids and sends to a Searchkick::BulkUpdaterJob
- BulkUpdaterJob sends bulk requests to elasticsearch
#models/products.rb after_commit :add_id_to_redis_queue def add_id_to_redis_queue Redis.new.sadd :products, id end
#jobs/searchkick/foreman_job.rb def perform redis = Redis.new product_ids = redis.smembers :products product_ids.each_slice(Product.searchkick_index.options[:batch_size] || 1000) do | product_id_batch | Searchkick::BulkUpdaterJob.perform_later(product_id_batch) redis.srem :products, product_id_batch end end
Bulk Updater Job
#jobs/searchkick/bulk_updater_job.rb def perform(product_ids) products = Product.where(id: product_ids).search_import Product.searchkick_index.import(products) #This is a batch operation end
Put all together, this leverages redis sets to enforce uniqueness of added keys and Searchkick’s bulk import to both efficiently load data from your database, and then send it off to elasticsearch. This will provide a good boost in speed to reindexing a single model, but really shines when data from other models is being denormalized on the search document. The reason why is to do so requires loading information from associated models, so doing bulk loads from the database dramatically reduces the resource load on your Postgres instance, while at the same time reducing the load on elasticsearch.
Is your current search solution lacking in speed or causing extreme load on your application’s resources and you’d like an expert to check it out? Reach out and I’d be happy to discuss possible solutions to bring your project back up to speed.
25 January 2017
There comes a time to benchmark your application and its services. However, if it’s not done properly, then your results can very misleading. In this post, I’ll throw some numbers at you to demonstrate the massive differences between benchmarking locally, on production, and locally pointing at a cloud resource. The numbers for each set of tests are different by an order of magnitude, demonstrating the importance of using the proper setup.