First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance

On Stack Overflow, the most frequent question for PostgreSQL is: “Select first row in each GROUP BY group?”. I’ve written about it previously, presenting multiple alternatives and execution plans: How to Select the First Row of Each Set of Grouped Rows…


This content originally appeared on DEV Community and was authored by Franck Pachot

On Stack Overflow, the most frequent question for PostgreSQL is: "Select first row in each GROUP BY group?". I've written about it previously, presenting multiple alternatives and execution plans: How to Select the First Row of Each Set of Grouped Rows Using GROUP BY.
Solving this problem in a way that's both straightforward and high-performing can be challenging with SQL databases. However, MongoDB's aggregation provides a simple syntax and an efficient execution plan. When you use $first or $last alongside $sort and $group, MongoDB can perform an efficient loose index scan, which is similar to an index skip scan, reading only what is necessary for the result.

PostgreSQL

With PostgreSQL, the DISTINCT ON ... ORDER BY syntax is the easiest for the developer, but not the best for performance.

create table demo (
 primary key (a, b, c),
  a int, b timestamptz,
  c float,
  d text
);
-- insert a hundred thousand rows
insert into demo
 select 
  a,
  now() as b,
  random() as c,
  repeat('x',5) as d
 from generate_series(1,5) a
    , generate_series(1,20000) c
 -- ignore bad luck random;
 on conflict do nothing
;
-- run 10 more times (now() will be different):
\watch count=9
vacuum analyze demo;

In PostgreSQL 18, all rows are read, but most are eliminated so that only one row remains per group:

explain (buffers, analyze, verbose, costs off)
select
 distinct on (b) a, b, c, d
from demo
where a=1
order by b, c
;

                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Unique (actual time=0.025..94.601 rows=10.00 loops=1)
   Output: a, b, c, d
   Buffers: shared hit=199959
   ->  Index Scan using demo_pkey on public.demo (actual time=0.024..77.263 rows=200000.00 loops=1)
         Output: a, b, c, d
         Index Cond: (demo.a = 1)
         Index Searches: 1
         Buffers: shared hit=199959
 Planning Time: 0.077 ms
 Execution Time: 94.622 ms

If you want to read only what is necessary, you need to write a complex recursive CTE:

with recursive skip_scan as (
 (
  -- get the first row
  select * from demo
  where a=1
  order by b,c limit 1
 ) union all (
  -- get the next row
  select demo.*
  from skip_scan , lateral(
   select * from demo
   where demo.a = skip_scan.a and demo.b > skip_scan.b
   order by b,c limit 1
  ) demo
 )
)
select * from skip_scan
;

This simulates an index loose scan with nested loops, using a complex recursive WITH clause.

MongoDB aggregation

In MongoDB, using an aggregation pipeline makes it easy and efficient to get either the first or last document from each group.

I create a collection and fill it with similar data:

// Create collection and compound unique index to mimic PRIMARY KEY(a, b, c)
db.demo.drop();
db.demo.createIndex(
  { a: 1, b: 1, c: 1 },
  { unique: true }
);

// Function to insert bulk records
function insertDemoBatch() {
  const batch = [];
  const now = new Date();
  for (let a = 1; a <= 5; a++) {
    for (let j = 1; j <= 20000; j++) {
      batch.push({
        a: a,
        b: now,                   // similar to now()
        c: Math.random(),         // random float [0,1)
        d: 'x'.repeat(5)          // repeat string
      });
    }
  }
  try {
    db.demo.insertMany(batch, { ordered: false }); // ignore duplicates
  } catch (e) {
    print(`Insert completed with some duplicates ignored: ${e.writeErrors?.length ?? 0} errors`);
  }
}

// Run 10 times — now() will be different each run
for (let i = 0; i < 10; i++) {
  insertDemoBatch();
}

Here is the aggregation that groups and keeps the first value of each group:

db.demo.aggregate([
  { $match: { a: 1 } },         // equivalent to WHERE a=1
  { $sort: { b: 1, c: 1 } },    // equivalent to ORDER BY b, c
  { $group: {
      _id: "$b",                 // equivalent to DISTINCT ON (b)
      a: { $first: "$a" },       
      b: { $first: "$b" },
      c: { $first: "$c" },
      d: { $first: "$d" }      
  }},
  { $project: {                  // equivalent to SELECT a, b, c, d
      _id: 0, a: 1, b: 1, c: 1, d: 1 
  }},
]).explain("executionStats");

The execution plan is efficient, reading only one document per group (totalDocsExamined: 10) and seeking to the end of each group (keysExamined: 11) in the index scan:

...
        executionStats: {                                                                                                                                                                            
          executionSuccess: true,
          nReturned: 10,
          executionTimeMillis: 0,
          totalKeysExamined: 11,
          totalDocsExamined: 10,
          executionStages: {
            isCached: false,
            stage: 'FETCH',
            nReturned: 10,
            executionTimeMillisEstimate: 0,
            works: 11,
            advanced: 10,
            needTime: 0,
            needYield: 0,
            saveState: 1,
            restoreState: 1,
            isEOF: 1,
            docsExamined: 10,
            alreadyHasObj: 0,
            inputStage: {
              stage: 'DISTINCT_SCAN',
              nReturned: 10,
              executionTimeMillisEstimate: 0,
              works: 11,
              advanced: 10,
              needTime: 0,
              needYield: 0,
              saveState: 1,
              restoreState: 1,
              isEOF: 1,
              keyPattern: { a: 1, b: 1, c: 1 },
              indexName: 'a_1_b_1_c_1',
              isMultiKey: false,
              multiKeyPaths: { a: [], b: [], c: [] },
              isUnique: true,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                a: [ '[1, 1]' ],
                b: [ '[MinKey, MaxKey]' ],
                c: [ '[MinKey, MaxKey]' ]
              },
              keysExamined: 11
            }
          }
        }
      },
      nReturned: Long('10'),
...

MongoDB uses DISTINCT_SCAN in aggregation when the pipeline starts with a $sort and $group using $first or $last. The planner checks for a matching index that's not multi-key and has the correct sort order, adjusting the scan direction if needed. If these conditions are met, MongoDB rewrites the pipeline to use DISTINCT_SCAN and $groupByDistinct, optimizing by skipping to relevant index entries and retrieving only needed documents.

This pattern is common in real‑world queries such as:

  • Latestor earliest measure for each metric in a time‑series database
  • Last contract with each supplier
  • Last purchase from each client
  • Most recent transaction for each account
  • Earliest login event for each user
  • Lowest‑paid employee in each department


This content originally appeared on DEV Community and was authored by Franck Pachot


Print Share Comment Cite Upload Translate Updates
APA

Franck Pachot | Sciencx (2025-10-03T22:08:35+00:00) First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance. Retrieved from https://www.scien.cx/2025/10/03/first-last-per-group-postgresql-distinct-on-and-mongodb-distinct_scan-performance/

MLA
" » First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance." Franck Pachot | Sciencx - Friday October 3, 2025, https://www.scien.cx/2025/10/03/first-last-per-group-postgresql-distinct-on-and-mongodb-distinct_scan-performance/
HARVARD
Franck Pachot | Sciencx Friday October 3, 2025 » First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance., viewed ,<https://www.scien.cx/2025/10/03/first-last-per-group-postgresql-distinct-on-and-mongodb-distinct_scan-performance/>
VANCOUVER
Franck Pachot | Sciencx - » First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/10/03/first-last-per-group-postgresql-distinct-on-and-mongodb-distinct_scan-performance/
CHICAGO
" » First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance." Franck Pachot | Sciencx - Accessed . https://www.scien.cx/2025/10/03/first-last-per-group-postgresql-distinct-on-and-mongodb-distinct_scan-performance/
IEEE
" » First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance." Franck Pachot | Sciencx [Online]. Available: https://www.scien.cx/2025/10/03/first-last-per-group-postgresql-distinct-on-and-mongodb-distinct_scan-performance/. [Accessed: ]
rf:citation
» First/Last per Group: PostgreSQL DISTINCT ON and MongoDB DISTINCT_SCAN Performance | Franck Pachot | Sciencx | https://www.scien.cx/2025/10/03/first-last-per-group-postgresql-distinct-on-and-mongodb-distinct_scan-performance/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.