Database Normalization
Table of Contents
Normalization is the process of organizing database tables to reduce redundancy and prevent data anomalies. It’s a set of rules (called “normal forms”) that guide you toward a clean schema where each piece of data lives in exactly one place.
If you’ve ever had a bug where updating a customer’s address in one place didn’t update it everywhere, you’ve experienced the problem normalization solves.
Why Normalize?
Consider this denormalized table:
| order_id | customer_name | customer_email | product | price | quantity |
|---|---|---|---|---|---|
| 1 | Alice | alice@mail.com | Widget | 9.99 | 2 |
| 2 | Alice | alice@mail.com | Gadget | 24.99 | 1 |
| 3 | Bob | bob@mail.com | Widget | 9.99 | 5 |
Problems:
- Update anomaly — If Alice changes her email, you must update every row she appears in
- Insert anomaly — You can’t add a new customer without them having an order
- Delete anomaly — If you delete Bob’s only order, you lose his contact info entirely
- Wasted space — Alice’s name and email are stored repeatedly
Normalization eliminates these problems by splitting data into related tables.
First Normal Form (1NF)
Rule: Every column contains atomic (indivisible) values. No repeating groups or arrays.
Violates 1NF
| student_id | name | phone_numbers |
|---|---|---|
| 1 | Alice | 555-0101, 555-0102 |
| 2 | Bob | 555-0201 |
Satisfies 1NF
students:
| student_id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
student_phones:
| student_id | phone_number |
|---|---|
| 1 | 555-0101 |
| 1 | 555-0102 |
| 2 | 555-0201 |
Each cell now contains exactly one value.
Second Normal Form (2NF)
Rule: Must be in 1NF, and every non-key column must depend on the entire primary key (not just part of it).
This only matters for tables with composite primary keys.
Violates 2NF
Primary key: (student_id, course_id)
| student_id | course_id | student_name | course_name | grade |
|---|---|---|---|---|
| 1 | 101 | Alice | Math | A |
| 1 | 102 | Alice | English | B |
| 2 | 101 | Bob | Math | B+ |
student_name depends only on student_id, not on the full key. course_name depends only on course_id.
Satisfies 2NF
students:
| student_id | student_name |
|---|---|
| 1 | Alice |
| 2 | Bob |
courses:
| course_id | course_name |
|---|---|
| 101 | Math |
| 102 | English |
enrollments:
| student_id | course_id | grade |
|---|---|---|
| 1 | 101 | A |
| 1 | 102 | B |
| 2 | 101 | B+ |
Now every non-key column depends on the full primary key of its table.
Third Normal Form (3NF)
Rule: Must be in 2NF, and no non-key column depends on another non-key column (no transitive dependencies).
Violates 3NF
| employee_id | name | department_id | department_name | department_head |
|---|---|---|---|---|
| 1 | Alice | 10 | Engineering | Carol |
| 2 | Bob | 10 | Engineering | Carol |
| 3 | Charlie | 20 | Marketing | Dave |
department_name and department_head depend on department_id, not on employee_id.
Satisfies 3NF
employees:
| employee_id | name | department_id |
|---|---|---|
| 1 | Alice | 10 |
| 2 | Bob | 10 |
| 3 | Charlie | 20 |
departments:
| department_id | department_name | department_head |
|---|---|---|
| 10 | Engineering | Carol |
| 20 | Marketing | Dave |
Now each fact is stored in exactly one place.
A Practical Example
Let’s normalize an e-commerce order system step by step.
Starting point (unnormalized)
| order_id | date | customer_name | customer_email | customer_address | items |
|---|---|---|---|---|---|
| 1 | 2026-05-18 | Alice | alice@mail.com | 123 Main St | Widget x2 ($9.99), Gadget x1 ($24.99) |
After normalization
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
address TEXT
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10, 2) NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
order_date DATE NOT NULL DEFAULT CURRENT_DATE
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
unit_price DECIMAL(10, 2) NOT NULL -- price at time of order
);
Now:
- Updating a customer’s email happens in one place
- Products exist independently of orders
- Order history is preserved even if a product’s current price changes (we store
unit_priceat time of purchase)
When to Denormalize
Normalization optimizes for data integrity and write efficiency. But sometimes you need to denormalize for read performance:
Acceptable denormalization
- Caching computed values — Store
order_totalon the orders table instead of calculating it every time - Reporting tables — Pre-joined, flattened tables for analytics queries
- Read-heavy data — If you always read a user with their profile, storing them together avoids a JOIN
The rule of thumb
Normalize first, denormalize when you have a measured performance problem. Premature denormalization creates data integrity bugs that are much harder to fix than slow queries.
-- Example: adding a cached total
ALTER TABLE orders ADD COLUMN total DECIMAL(10, 2);
-- Update it when order items change
UPDATE orders SET total = (
SELECT SUM(quantity * unit_price)
FROM order_items
WHERE order_id = orders.id
) WHERE id = 42;
Quick Reference
| Normal Form | Rule | Eliminates |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Multi-value cells |
| 2NF | No partial dependencies on composite keys | Redundancy from partial key dependence |
| 3NF | No transitive dependencies | Redundancy from non-key dependencies |
For most applications, 3NF is sufficient. Higher normal forms (BCNF, 4NF, 5NF) exist but are rarely needed in practice.
Common Mistakes
- Over-normalizing — Creating a separate table for every tiny piece of data makes queries complex and slow. If data always appears together and doesn’t repeat, it can stay in one table.
- Under-normalizing — Storing everything in one giant table leads to update anomalies and wasted space.
- Forgetting about time — Prices change, addresses change, names change. Store historical values where they matter (like
unit_priceon order items).
What’s Next
With a normalized schema, you can write efficient queries using SQL JOINs to combine data from related tables. If you’re choosing a database, check out Relational vs NoSQL to understand when a relational approach is the right fit.