Database Normalization

May 18, 2026
#databases #sql #normalization #schema-design

Normalization is the process of organizing database tables to reduce redundancy and prevent data anomalies. It’s a set of rules (called “normal forms”) that guide you toward a clean schema where each piece of data lives in exactly one place.

If you’ve ever had a bug where updating a customer’s address in one place didn’t update it everywhere, you’ve experienced the problem normalization solves.

Why Normalize?

Consider this denormalized table:

order_id customer_name customer_email product price quantity
1 Alice alice@mail.com Widget 9.99 2
2 Alice alice@mail.com Gadget 24.99 1
3 Bob bob@mail.com Widget 9.99 5

Problems:

  • Update anomaly — If Alice changes her email, you must update every row she appears in
  • Insert anomaly — You can’t add a new customer without them having an order
  • Delete anomaly — If you delete Bob’s only order, you lose his contact info entirely
  • Wasted space — Alice’s name and email are stored repeatedly

Normalization eliminates these problems by splitting data into related tables.

First Normal Form (1NF)

Rule: Every column contains atomic (indivisible) values. No repeating groups or arrays.

Violates 1NF

student_id name phone_numbers
1 Alice 555-0101, 555-0102
2 Bob 555-0201

Satisfies 1NF

students:

student_id name
1 Alice
2 Bob

student_phones:

student_id phone_number
1 555-0101
1 555-0102
2 555-0201

Each cell now contains exactly one value.

Second Normal Form (2NF)

Rule: Must be in 1NF, and every non-key column must depend on the entire primary key (not just part of it).

This only matters for tables with composite primary keys.

Violates 2NF

Primary key: (student_id, course_id)

student_id course_id student_name course_name grade
1 101 Alice Math A
1 102 Alice English B
2 101 Bob Math B+

student_name depends only on student_id, not on the full key. course_name depends only on course_id.

Satisfies 2NF

students:

student_id student_name
1 Alice
2 Bob

courses:

course_id course_name
101 Math
102 English

enrollments:

student_id course_id grade
1 101 A
1 102 B
2 101 B+

Now every non-key column depends on the full primary key of its table.

Third Normal Form (3NF)

Rule: Must be in 2NF, and no non-key column depends on another non-key column (no transitive dependencies).

Violates 3NF

employee_id name department_id department_name department_head
1 Alice 10 Engineering Carol
2 Bob 10 Engineering Carol
3 Charlie 20 Marketing Dave

department_name and department_head depend on department_id, not on employee_id.

Satisfies 3NF

employees:

employee_id name department_id
1 Alice 10
2 Bob 10
3 Charlie 20

departments:

department_id department_name department_head
10 Engineering Carol
20 Marketing Dave

Now each fact is stored in exactly one place.

A Practical Example

Let’s normalize an e-commerce order system step by step.

Starting point (unnormalized)

order_id date customer_name customer_email customer_address items
1 2026-05-18 Alice alice@mail.com 123 Main St Widget x2 ($9.99), Gadget x1 ($24.99)

After normalization

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    address TEXT
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    order_date DATE NOT NULL DEFAULT CURRENT_DATE
);

CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10, 2) NOT NULL  -- price at time of order
);

Now:

  • Updating a customer’s email happens in one place
  • Products exist independently of orders
  • Order history is preserved even if a product’s current price changes (we store unit_price at time of purchase)

When to Denormalize

Normalization optimizes for data integrity and write efficiency. But sometimes you need to denormalize for read performance:

Acceptable denormalization

  • Caching computed values — Store order_total on the orders table instead of calculating it every time
  • Reporting tables — Pre-joined, flattened tables for analytics queries
  • Read-heavy data — If you always read a user with their profile, storing them together avoids a JOIN

The rule of thumb

Normalize first, denormalize when you have a measured performance problem. Premature denormalization creates data integrity bugs that are much harder to fix than slow queries.

-- Example: adding a cached total
ALTER TABLE orders ADD COLUMN total DECIMAL(10, 2);

-- Update it when order items change
UPDATE orders SET total = (
    SELECT SUM(quantity * unit_price)
    FROM order_items
    WHERE order_id = orders.id
) WHERE id = 42;

Quick Reference

Normal Form Rule Eliminates
1NF Atomic values, no repeating groups Multi-value cells
2NF No partial dependencies on composite keys Redundancy from partial key dependence
3NF No transitive dependencies Redundancy from non-key dependencies

For most applications, 3NF is sufficient. Higher normal forms (BCNF, 4NF, 5NF) exist but are rarely needed in practice.

Common Mistakes

  • Over-normalizing — Creating a separate table for every tiny piece of data makes queries complex and slow. If data always appears together and doesn’t repeat, it can stay in one table.
  • Under-normalizing — Storing everything in one giant table leads to update anomalies and wasted space.
  • Forgetting about time — Prices change, addresses change, names change. Store historical values where they matter (like unit_price on order items).

What’s Next

With a normalized schema, you can write efficient queries using SQL JOINs to combine data from related tables. If you’re choosing a database, check out Relational vs NoSQL to understand when a relational approach is the right fit.