Back in my days at a student, one of my networking professors brought up the same point every time we’d meet for a class: “When you’re implementing something, always ask yourself, ‘How does it scale?’” Over the last few months I’ve spent a lot of time dealing with systems and making sure they scale well using various technologies. I ran into an unexpected on today, though. It was Spree. Yes, the extremely configurable and awesome e-commerce platform does not play well with certain scenarios.
The issue was that whenever an of an order was updated which altered shipping costs, a callback chain was invoked. This either creates or updates shipping information for each item being ordered. What this means for the application is that (in order to remain highly configurable and track everything perfectly) each individual item from an order has a ShippingItem in the database. Buy 10 widgets, you create 10 ShippingItems. Buy 1000 widgets (or even 200), and Heroku will throw H* timeouts, which introduces [other errors with your Spree Addresses] that have to be solved so your users can view in progress orders and continue the checkout process.
The unforunate effect of this is that the more items someone wants to order from your site, the slower it will run. To throw some numbers in, an order with a total quantity of 2000 items took approximately 2 minutes to update the user’s address during the order process on a 1x Heroku dyno. The saving grace here is that the Spree team keeps a Slack channel open, and @brendan was able to point me to a previous issue and fix over at the Spree Github. Even better, dropping it in worked like a charm. Now, it has introduced a few more complications, such as needing to update a handful of template files and adjust a little logic, but in the course of an hour we had large orders running through the customer’s system again.
The point of this post isn’t to point out any weaknesses on Spree’s part, but to highlight the importance of making sure your systems scale. “Your” involves not just the code you write, but also other code being included. More importantly, make sure to test it on a system identical to your production environment. The reason I say that is I’d run many test orders through the system, but had only ever ran large orders through locally, assuming that it wouldn’t make a difference. I’d even noticed the substantially large amount of SQL queries for the checkout process on the area that had ended up hanging. What was different though was that on my development machine those queries completed at least an order of magnitude quicker (I recently timed it, it was ~4s vs the aforementioned 2 minutes). When things like that jump out at you, make a note and be sure to test on production. To quickly touch on it, there is a difference between performance testing and load testing. In this case, a very specific type of load testing was needed; namely checking out with large quantities in the cart.
To go back to what that professor shot our way every class period: Make sure your system scales. This doesn’t mean overly optimize early on or spend an inordinate amount of time testing every aspect of your system, but make sure to get real numbers in as benchmarks and base levels of speed so that you can check in time to time and be sure your application stays healthy and performant. At a bare minimum, fire up Siege and point it at some endpoints, then run through the most common user flows while New Relic or Librato is hooked up and you’ve got a window up tailing the logs so you can be sure to have a mental model and a few specific numbers of where the majority of requests to your application will go.
If you enjoy having free time and the peace of mind that a professional is on your side, then you’d love to have me work on your project.