Database Normalization

May 18, 2026

#databases #sql #normalization #schema-design

Normalization is the process of organizing database tables to reduce redundancy and prevent data anomalies. It’s a set of rules (called “normal forms”) that guide you toward a clean schema where each piece of data lives in exactly one place.

If you’ve ever had a bug where updating a customer’s address in one place didn’t update it everywhere, you’ve experienced the problem normalization solves.

Why Normalize?

Consider this denormalized table:

order_id	customer_name	customer_email	product	price	quantity
1	Alice	alice@mail.com	Widget	9.99	2
2	Alice	alice@mail.com	Gadget	24.99	1
3	Bob	bob@mail.com	Widget	9.99	5

Problems:

Update anomaly — If Alice changes her email, you must update every row she appears in
Insert anomaly — You can’t add a new customer without them having an order
Delete anomaly — If you delete Bob’s only order, you lose his contact info entirely
Wasted space — Alice’s name and email are stored repeatedly

Normalization eliminates these problems by splitting data into related tables.

First Normal Form (1NF)

Rule: Every column contains atomic (indivisible) values. No repeating groups or arrays.

Violates 1NF

student_id	name	phone_numbers
1	Alice	555-0101, 555-0102
2	Bob	555-0201

Satisfies 1NF

students:

student_id	name
1	Alice
2	Bob

student_phones:

student_id	phone_number
1	555-0101
1	555-0102
2	555-0201

Each cell now contains exactly one value.

Second Normal Form (2NF)

Rule: Must be in 1NF, and every non-key column must depend on the entire primary key (not just part of it).

This only matters for tables with composite primary keys.

Violates 2NF

Primary key: (student_id, course_id)

student_id	course_id	student_name	course_name	grade
1	101	Alice	Math	A
1	102	Alice	English	B
2	101	Bob	Math	B+

student_name depends only on student_id, not on the full key. course_name depends only on course_id.

Satisfies 2NF

students:

student_id	student_name
1	Alice
2	Bob

courses:

course_id	course_name
101	Math
102	English

enrollments:

student_id	course_id	grade
1	101	A
1	102	B
2	101	B+

Now every non-key column depends on the full primary key of its table.

Third Normal Form (3NF)

Rule: Must be in 2NF, and no non-key column depends on another non-key column (no transitive dependencies).

Violates 3NF

employee_id	name	department_id	department_name	department_head
1	Alice	10	Engineering	Carol
2	Bob	10	Engineering	Carol
3	Charlie	20	Marketing	Dave

department_name and department_head depend on department_id, not on employee_id.

Satisfies 3NF

employees:

employee_id	name	department_id
1	Alice	10
2	Bob	10
3	Charlie	20

departments:

department_id	department_name	department_head
10	Engineering	Carol
20	Marketing	Dave

Now each fact is stored in exactly one place.

A Practical Example

Let’s normalize an e-commerce order system step by step.

Starting point (unnormalized)

order_id	date	customer_name	customer_email	customer_address	items
1	2026-05-18	Alice	alice@mail.com	123 Main St	Widget x2 ($9.99), Gadget x1 ($24.99)

After normalization

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    address TEXT
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    order_date DATE NOT NULL DEFAULT CURRENT_DATE
);

CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10, 2) NOT NULL  -- price at time of order
);

Now:

Updating a customer’s email happens in one place
Products exist independently of orders
Order history is preserved even if a product’s current price changes (we store unit_price at time of purchase)

When to Denormalize

Normalization optimizes for data integrity and write efficiency. But sometimes you need to denormalize for read performance:

Acceptable denormalization

Caching computed values — Store order_total on the orders table instead of calculating it every time
Reporting tables — Pre-joined, flattened tables for analytics queries
Read-heavy data — If you always read a user with their profile, storing them together avoids a JOIN

The rule of thumb

Normalize first, denormalize when you have a measured performance problem. Premature denormalization creates data integrity bugs that are much harder to fix than slow queries.

-- Example: adding a cached total
ALTER TABLE orders ADD COLUMN total DECIMAL(10, 2);

-- Update it when order items change
UPDATE orders SET total = (
    SELECT SUM(quantity * unit_price)
    FROM order_items
    WHERE order_id = orders.id
) WHERE id = 42;

Quick Reference

Normal Form	Rule	Eliminates
1NF	Atomic values, no repeating groups	Multi-value cells
2NF	No partial dependencies on composite keys	Redundancy from partial key dependence
3NF	No transitive dependencies	Redundancy from non-key dependencies

For most applications, 3NF is sufficient. Higher normal forms (BCNF, 4NF, 5NF) exist but are rarely needed in practice.

Common Mistakes

Over-normalizing — Creating a separate table for every tiny piece of data makes queries complex and slow. If data always appears together and doesn’t repeat, it can stay in one table.
Under-normalizing — Storing everything in one giant table leads to update anomalies and wasted space.
Forgetting about time — Prices change, addresses change, names change. Store historical values where they matter (like unit_price on order items).

What’s Next

With a normalized schema, you can write efficient queries using SQL JOINs to combine data from related tables. If you’re choosing a database, check out Relational vs NoSQL to understand when a relational approach is the right fit.

Table of Contents