title: Understanding GROUP BY in SQL published: true description: tags: sql

cover_image: https://direct_url_to_image.jpg

Use a ratio of 100:42 for best results.

published_at: 2026-03-08 10:26 +0000

GROUP BY is one of those SQL clauses that starts out feeling simple and then becomes confusing as soon as real queries get involved. The short version is that it lets you take many rows and treat some of them as belonging to the same group, usually so you can calculate something for each group.

That idea matters more than the syntax. GROUP BY is not mainly about sorting, filtering, or removing duplicates. It is about collapsing rows into groups based on shared values, then producing one result row per group.

What `GROUP BY` does

Imagine a table called orders:

order_id | user_id | status  | amount
---------|---------|---------|-------
1        | 10      | paid    | 50
2        | 10      | paid    | 20
3        | 11      | pending | 15
4        | 12      | paid    | 40
5        | 11      | paid    | 30

If you run this:

SELECT status
FROM orders;

you get one row per order. If instead you run this:

SELECT status
FROM orders
GROUP BY status;

you get one row per distinct status value:

paid
pending

At first glance, that can look similar to DISTINCT, and in this case the result is similar. But GROUP BY becomes useful when you combine it with aggregate functions like COUNT, SUM, AVG, MIN, or MAX.

For example:

SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status;

This gives you one row for each status, plus the number of rows inside each group:

status  | order_count
--------|------------
paid    | 4
pending | 1

So the pattern is this: define groups, then calculate something per group.

The underlying idea

The mental model that helps most is to think of SQL working in stages.

First, SQL starts with rows from a table. Then GROUP BY partitions those rows into buckets based on the column or columns you specify. After that, aggregate functions are applied within each bucket. The final result has one row per bucket, not one row per original input row.

If you group by one column, rows with the same value in that column go together. If you group by multiple columns, rows only go together when all those values match.

For example:

SELECT user_id, status, COUNT(*) AS count
FROM orders
GROUP BY user_id, status;

Now the groups are based on the combination of user_id and status.

That means these rows:

user_id | status
--------|--------
10      | paid
10      | paid
11      | pending
11      | paid

become groups like:

(10, paid)
(11, pending)
(11, paid)

Each distinct combination becomes its own group.

Why SQL is strict about grouped queries

A common source of confusion is this kind of query:

SELECT user_id, created_at
FROM orders
GROUP BY user_id;

In most SQL systems, this is invalid, or at least unsafe, because once rows are grouped by user_id, there may be many created_at values inside each group. SQL needs to know which one you want.

That is why grouped queries usually follow this rule: every selected column must either be part of the GROUP BY clause, or be wrapped in an aggregate function.

This works:

SELECT user_id, COUNT(*) AS total_orders
FROM orders
GROUP BY user_id;

This also works:

SELECT user_id, MAX(amount) AS largest_order
FROM orders
GROUP BY user_id;

But this does not make sense in standard SQL:

SELECT user_id, amount
FROM orders
GROUP BY user_id;

because a user may have many amounts, and GROUP BY produces one row per user.

That rule is not arbitrary. It comes directly from what grouping means. After grouping, individual rows are no longer the main unit. Groups are.

Examples that show when `GROUP BY` is useful

A common use is counting rows per category.

SELECT user_id, COUNT(*) AS total_orders
FROM orders
GROUP BY user_id;

This tells you how many orders each user has placed.

Another common use is summing values.

SELECT user_id, SUM(amount) AS total_spent
FROM orders
GROUP BY user_id;

Now you get total spending per user.

You can also group by dates or derived values:

SELECT DATE(created_at) AS order_date, COUNT(*) AS total_orders
FROM orders
GROUP BY DATE(created_at);

That gives daily totals.

And you can combine grouping with filtering. For example, if you only care about paid orders:

SELECT user_id, SUM(amount) AS total_paid
FROM orders
WHERE status = 'paid'
GROUP BY user_id;

The WHERE clause filters rows before grouping happens. That matters. You are grouping only the rows that survive the filter.

When to use `GROUP BY`

Use GROUP BY when your question is about categories, buckets, or summaries rather than individual rows.

If you want to know how many users signed up each day, how much revenue each product generated, or how many orders each status has, that is a grouping problem. The same goes for average salary per department, maximum score per player, or number of tickets per support agent.

A good test is to ask whether your result should contain one row per original record, or one row per logical group. If it is one row per group, GROUP BY is probably involved.

When not to use it

Do not use GROUP BY just because you want unique rows. Sometimes people reach for it when DISTINCT is simpler and clearer.

For example:

SELECT DISTINCT user_id
FROM orders;

is usually better than:

SELECT user_id
FROM orders
GROUP BY user_id;

Both may return the same values, but DISTINCT says exactly what you mean: give me unique user_id values. GROUP BY suggests you are preparing to aggregate.

Also do not use GROUP BY when you still need row-level detail. Once you group, you lose the original per-row shape unless you use more advanced techniques like window functions or subqueries.

For example, if you want every order row plus the total number of orders for that user, a plain GROUP BY is not enough, because it collapses rows. That is often a sign you want a window function instead.

`GROUP BY` vs `DISTINCT`

DISTINCT removes duplicate rows from the result set. GROUP BY forms groups, usually so aggregates can be computed.

That difference is easier to see with examples.

This:

SELECT DISTINCT status
FROM orders;

returns unique statuses.

This:

SELECT status
FROM orders
GROUP BY status;

also returns one row per status, but it does so by grouping rows.

And this is where GROUP BY goes beyond DISTINCT:

SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status;

DISTINCT cannot do that by itself. It can remove duplicates, but it does not summarize each group with counts or sums.

So a rough rule is this: use DISTINCT when you want uniqueness, use GROUP BY when you want summaries per category.

A note on `HAVING`

HAVING often appears next to GROUP BY, so it is worth mentioning even if it is not exactly similar to DISTINCT.

WHERE filters rows before grouping. HAVING filters groups after grouping.

For example:

SELECT user_id, COUNT(*) AS total_orders
FROM orders
GROUP BY user_id
HAVING COUNT(*) >= 2;

This returns only users who have at least two orders.

You cannot do that with WHERE COUNT(*) >= 2, because aggregates are computed after rows are grouped. HAVING exists for conditions on aggregated results.

Common mistakes

One mistake is selecting columns that are neither grouped nor aggregated. That usually means the query does not match the shape of the result you are asking for.

Another is using GROUP BY when DISTINCT would be clearer. The query may still work, but it makes the intent less obvious.

A third is forgetting that grouping changes the meaning of the result. Once rows are grouped, you are no longer dealing with individual records. You are dealing with summaries of sets of records.

It is also easy to confuse ORDER BY and GROUP BY. ORDER BY sorts rows. GROUP BY combines rows into groups. A grouped result can still be sorted afterward:

SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;

Here, the rows are grouped first, and then the grouped result is sorted.

Closing thought

GROUP BY is best understood as a change in level of detail. A normal query works at the row level. A grouped query works at the group level.

Once that clicks, a lot of SQL becomes easier to reason about. You stop memorizing syntax and start asking a simpler question: am I trying to return rows, or am I trying to return summaries of rows?

If the answer is summaries, GROUP BY is usually the tool.

Understanding `GROUP BY` in SQL

cover_image: https://direct_url_to_image.jpg

Use a ratio of 100:42 for best results.

published_at: 2026-03-08 10:26 +0000

What `GROUP BY` does

The underlying idea

Why SQL is strict about grouped queries

Examples that show when `GROUP BY` is useful

When to use `GROUP BY`

When not to use it

`GROUP BY` vs `DISTINCT`

A note on `HAVING`

Common mistakes

Closing thought

Tags

Comments

More Blog

Five Gemma-4 models, one accelerator: what porting E2B 31B to AWS Inferentia2 taught me

Hey DEV, I'm Tobore. Let's actually connect.

I burned through thousands of AI tokens. Then a friend did it for free

Claude might be saturating your machine

Automated GitHub Code Reviews Using Google Gemini

What is an "agentic harness," actually?

Ready-made automations for this

Understanding `GROUP BY` in SQL

cover_image: https://direct_url_to_image.jpg

Use a ratio of 100:42 for best results.

published_at: 2026-03-08 10:26 +0000

What GROUP BY does

The underlying idea

Why SQL is strict about grouped queries

Examples that show when GROUP BY is useful

When to use GROUP BY

When not to use it

GROUP BY vs DISTINCT

A note on HAVING

Common mistakes

Closing thought

Tags

Comments

More Blog

Five Gemma-4 models, one accelerator: what porting E2B 31B to AWS Inferentia2 taught me

Hey DEV, I'm Tobore. Let's actually connect.

I burned through thousands of AI tokens. Then a friend did it for free

Claude might be saturating your machine

Automated GitHub Code Reviews Using Google Gemini

What is an "agentic harness," actually?

Ready-made automations for this

What `GROUP BY` does

Examples that show when `GROUP BY` is useful

When to use `GROUP BY`

`GROUP BY` vs `DISTINCT`

A note on `HAVING`