Grouping Rows by Overlapping Range on Postgresql

======================================================

In this article, we will explore how to group rows in a PostgreSQL table based on overlapping ranges. This can be useful when working with data that represents intervals or time ranges.

Understanding the Problem

Given a table ranges containing rows with range information, we want to create a new column group_id that indicates whether two ranges overlap. We will assume that the table is populated with values of type int8range.

For example, suppose we have three ranges: (10, 20), (15, 25), and (20, 30). The first two ranges overlap in the range (15, 20) and the third range overlaps with both the first and second ranges. We want to assign a group_id value of 1 to these overlapping ranges.

Solution

To solve this problem, we will use PostgreSQL’s built-in functions for working with intervals and ranges.

First, let’s create a subquery that aggregates all ranges from the table into a single multirange. We can do this using the range_agg function, which concatenates adjacent ranges into a new range.

SELECT range_agg(range) AS mr
FROM ranges;

This query will produce an array of multiranges for each row in the original table.

Next, we’ll use the unnest function to split this multirange back into individual ranges. We can do this using the WITH ORDINALITY clause, which adds a numbering to each range that we can use as our group_id.

SELECT u.r AS group_id,
       count(*) OVER (PARTITION BY u.o) AS group_count
FROM (
  SELECT range_agg(range) AS mr
  FROM ranges
) AS ra
  CROSS JOIN LATERAL unnest(ra.mr) WITH ORDINALITY AS u(r,o)
  JOIN ranges ON u.r @> ranges.range;

This query will assign a unique group_id to each row based on its position in the multirange.

Finally, we’ll use a WHERE clause to filter out groups that contain only one member (i.e., non-overlapping ranges).

SELECT range, group_id
FROM (
  SELECT u.r AS group_id,
         count(*) OVER (PARTITION BY u.o) AS group_count
  FROM (
    SELECT range_agg(range) AS mr
    FROM ranges
  ) AS ra
    CROSS JOIN LATERAL unnest(ra.mr) WITH ORDINALITY AS u(r,o)
    JOIN ranges ON u.r @> ranges.range
) q
WHERE group_count > 1;

Putting it All Together

Let’s combine the above steps into a single query that generates the desired output.

SELECT range, group_id
FROM (
  SELECT t.*,
         u.o AS group_id,
         count(*) OVER (PARTITION BY u.o) AS group_count
  FROM ranges t
    CROSS JOIN LATERAL unnest(
      SELECT range_agg(range) AS mr
      FROM ranges
    ) WITH ORDINALITY AS u(r,o)
    JOIN ranges ON u.r @> ranges.range
) q
WHERE group_count > 1;

Example Use Cases

This solution can be applied to various use cases where you need to group rows based on overlapping intervals or ranges.

For example, suppose you’re working with a table of flight schedules and want to identify groups of flights that overlap in time. You can use this approach to generate a group_id column for each row.

Similarly, if you’re analyzing sensor readings over time and want to identify clusters of readings that overlap within a certain timeframe, this solution can help.

Conclusion

In conclusion, grouping rows by overlapping ranges on PostgreSQL requires using the built-in functions for working with intervals and ranges. By aggregating all ranges into a single multirange, splitting it back into individual ranges, and then filtering out non-overlapping groups, we can generate a group_id column that indicates whether two or more ranges overlap.

We hope this article has provided you with the necessary knowledge to tackle similar problems in your PostgreSQL-based applications.

Last modified on 2025-02-28