Rack, CSV streaming, and Ruby's Enumerator

When a response body to a HTTP request gets big it's a good idea to stream it. A classic example of this is a CSV download--while you might get away without streaming for smaller response bodies as soon as the CSV file size is over a few megabytes you're going to see timeouts and performance issues.

We saw some of these issues with Mailmatch, and I'm going to take you through how we solved these by adding streaming support to our CSV downloads. Note that this tutorial is fairly Ruby, Rack and Sinatra specific although you should be able to apply the principals to your Rack based framework of choice.

It turns out that streaming is baked into Rack's protocol. The body section of Rack's array spec can be anything that responds to each(). In practice this often an array containing the string response body. However we can take advantage of this behavior by providing our own streaming object that responds to each.

Our Sinatra route is going to lookup a record, set up the content disposition headers, and return List#as_csv (which we'll define later).

get '/lists/:id/csv' do  
  @list = List.first!(id: params[:id])

  attachment 'list.csv'
  @list.as_csv
end  

Our as_csv method is going to return a Enumerator. We're doing a little magic at the start of the method with enum_for to instantiate the Enumerator.

class List  
  def as_csv
    return enum_for(:as_csv) unless block_given?

    emails.each do |email|
      yield CSV.generate_line(email.as_csv)
    end
  end
end  

That's it! The Enumerator responds to each and make sure our response is streamed to the client one CSV line at a time.

While in this case emails is just an array we're iterating over, in practice you'll want to paginate over large datasets so you don't have to load them all into memory.