Join 15,000+ Subscribers
Get tactical growth tips once a week

PostgreSQL offsets and pagination performance

PostgreSQL is notoriously bad at performing well with high offsets, mainly because it needs to do a full table scan and count all the rows up to the offset. This can be a problem is you want to paginate over a full table without holding all the rows in memory.

On the surface of it PostgreSQL Cursors may seem like a good solution to this problem--rather than executing a whole query at once a cursor encapsulates the query allowing you to read a few rows at a time. However in practice cursors use a transaction behind the scenes and you'll quickly run out of memory.

A good solution is to use an indexed auto-incrementing ID on the table and then paginate based on that.

At Clearbit, we use a UUIDs for identifiers rather than integers, but we have an internal iid column which is a indexed numeric sequence. If you're not using the default integer IDs than you'll need to do something similar and add a column.

The example below is using Sequel, but can be easily ported to your ORM of choice. We're essentially paging over the ordered records using the iid column in a WHERE clause instead of an OFFSET.

batch.rb

def batch_page(page_size = 2_000)  
  last_record = order(:iid.asc).first
  last_iid    = last_record && last_record.iid || 0
  last_iid   -= 1

  loop do
    result = order(:iid.asc)
      .where { iid > last_iid }
      .limit(page_size)
      .all

    break if result.empty?
    yield result

    last_iid = result.last.iid
  end

  self
end  

The generated SQL query ends up looking like this--it's very fast to run since the iid column is indexed.

SELECT * FROM "people" WHERE ("iid" > 2006) ORDER BY "iid" ASC LIMIT 2000  
Share this post
Join 15,000+ Subscribers
Get tactical growth tips once a week