-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Hello, thanks for your work on this gem.
Summary
I noticed a performance issue when using maintenance_tasks and multiple columns for a cursor and fetching pages of records. The issue is with how the 'next page' query is written in ActiveRecordCursor, which causes Postgres to do inefficient data fetches and is over 100x slower than it could be.
Background
Our table requests has an index on created_at, uuid like this:
CREATE INDEX idx ON requests USING btree (created_at, uuid)and our maintenance task, when defined like this:
class OurTask < MaintenanceTasks::Task
def cursor_columns
[:created_at, :uuid]
end
def collection
Request.all
end
end
will trigger queries for pages like this:
SELECT *
FROM requests
WHERE ( created_at > '2023-01-01 00:00:00' OR ( created_at = '2023-01-01 00:00:00' AND ( uuid > 'aaaaaaaa-aaaa-aaaa-b46e-38cd3572ec76' ) ) )
ORDER BY created_at, uuid
LIMIT 1000
This query does not perform as expected in Postgres, because of the OR. See related SO question here - the gist of it is that Postgres doesn't use an index condition but rather does a filter, which results in reading records from disk rather than using the index values.
In our table of about 30 million records, fetching a page of 1000 records was taking around 140s.
Solution
My proposed solution is to change how we construct the query here to use what Postgres calls "Composite Type Comparison", which I believe has wide support across DBMS. The query would look like:
SELECT *
FROM requests
WHERE ( created_at, uuid) > ('2023-01-01 00:00:00', 'aaaaaaaa-aaaa-aaaa-b46e-38cd3572ec76')
ORDER BY created_at, uuid
LIMIT 1000This should be functionally equivalent and also causes Postgres to use only the index to determine the specific set of records to fetch from disk.
This query returns the page of 1000 records in under 1s.
Would you accept a PR to change this (whether for Postgres only, or for all DBs)?