Add Output to Your Long Running Rake Tasks

21 July 2017 on . 4 minutes to read

Expectations vs Reality

Have you ever worked on an item, tested it thoroughly on a staging environment, done extra dry runs for good measure, been completely satisfied with the results, only to have it hit production and you have no idea whether it’s working properly? I had such an experience recently with a one-off rake task. The following details that, along with what I learned and how to prevent it from happening to your projects. Chalk up another lesson about what it means for a feature to be complete

The Task

Recently I was involved on a team project around developing ingestion and display of user data. The basic process was this:

  • Get a list of all the objects from an S3 bucket
  • Sort them by when they were last modified
  • Enqueue background jobs (in order) for each of the sorted files

Simple enough; 15 lines of code or so to handle these requirements. For good measure, the below steps were done to ensure quality:

  • Test locally
  • Test against a staging system
  • Code reviews from 4 other team members
  • Run as a rake task rather than copy pasted into a console to ensure consistency

From my perspective, the above looked fantastic. However, there was one key question I left off of development…

Does it scale?

From a performance standpoint, it scaled nicely. Sorting the few hundred thousand AWS S3 objects in ruby wasn’t an issue for either RAM or the CPU (we’ll see how it goes if this has to be done again with 3 or 4 orders of magnitude more items), which is where my head was at when I wrote the code.

No, where it didn’t scale nicely was from the perspective of the people running thise task post deploy, of whom I was one. In the end, it did exactly what it was supposed to, but there was a period of 10 minutes where we didn’t have any metrics, so there was no way to tell if it was hung up. When you’re doing a late night deploy with 4 people, nobody relishes having to spend any extra time just because some code got hung up and you had to blindly wait rather than immediately retry.

The gap here was that there was no output from the time the task started pulling down info about the objects in the bucket until it started enqueuing background jobs. Even in batches of 1000 objects from S3, that’s still hundreds of network calls with large payloads that had to be consumed before we saw any progress.

How I Fixed It

Since this task may need to be run again in the future, I did a few things:

  • Add incremental feedback for long running sub tasks
  • Provide final output when a task completed

For the S3 portion of the task, my code looked like the following:

objects = []
s3.list_objects(bucket: bucket_name).each do |response|
  objects.concat(response.contents)
  puts "Objects Received from #{bucket_name}: #{objects.count}"
end

puts "Total Objects Received from #{bucket_name}: #{objects.count}"

That way in the future we’ll see updates as each network call completes, along with a final update of how many objects were found. This will also help in providing confirmation that all the data was successfully reprocessed.

Takeaways

I’ve placed a lot of focus recently on adding good metriccs and instrumentation to critical parts of codebases. Instrumentation is an excellent example of how a healthy application is a process, not a goal. This illustrated to me that even for code which will only be used once, displaying proper feedback about progression is key. Just like it’s helpful to see file download progress in your browser, seeing the progress of a task as it runs eases a lot of potential pain points.


If you’ve got critical pieces of business logic into which you have little or no insights and would like that fixed, I’m your man; whether that’s in implementation or consulting on best practices with teams and architects. Being able to see what, how long and how many times specific actions happen in addition to your standard APM service can make all the difference for having confidence your applications are running as expected.


If you enjoy having free time and the peace of mind that a professional is on your side, then you’d love to have me work on your project.

Contact or view a list of available services to see how I’ll make your life better, easier and bring satisfaction back into you running your business.