Normalization in DBMS. Learn about normalization and different types.
Abstract
- Normal forms are rules for Database Normalization. Each rule is called a “Normal Form”.
- Normalization – Process of meaningfully organizing data in database for easy usage.
- Database normalization is essential to minimize redundancy and to ensure only related data is stored in each table.
- Normal Forms are of mainly three types – First Normal Form, Second Normal Form, Third Normal Form.
Scope of Article
The article explains database normalization terminology for beginners. A basic understanding of this terminology is helpful when discussing the design of relational database.
This article does not deal with the Relational Algebra in Database and the Keys in Database.
Definition:
Normal Forms are the rule by which we achieve normalization. Now, what is normalization and why is it essential?
Normalization is a process of organizing the data in the database to avoid data redundancy and anomalies.
Let’s understand this with the help of an example. Consider the table below for a manufacturing company that stores Employee data.
Emp_ID | Emp_Name | Emp_Address | Emp_Dept |
101 | Rick | Delhi | D001 |
101 | Rick | Delhi | D002 |
123 | Karen | London | D450 |
133 | Max | Sydney | D500 |
133 | Max | Sydney | D003 |
Now let us understand the problem with this table.
- Firstly – Employee Rick from the table is working in 2 departments. Now if we want to update the address for Rick, we would have to update it at 2 different places. This could create a lot of unforeseen issues. There could be even simple mistake like wrong spelling entered at one location. (This is Update anomaly)
- Secondly – Suppose a new employee want to join the company, who would be under training, then we would not be able to add that employee to this table as he is not assigned to any department. (This is Insert Anomaly)
- Lastly – If at some point the Department D450 is closed. Then we would have to delete the data corresponding to Karen, and the information would be lost. (This is delete anomaly)
To overcome these anomalies we require Normalization, and Normalization (decomposing bigger tables to smaller tables to meaningfully organize data) can be achieved through Normal Forms. Let us understand the Normal Forms in detail.
TYPES OF NORMAL FORMS:
- First Normal Form(1NF):
- Eliminate repeating groups in individual tables.
- Create separate table for each set of related data.
- Identify each set of related data with a primary key.
As per rule of first normal form, an attribute(column) of a table cannot hold multiple values. It should hold only atomic value (which cannot be divided further).
Let’s understand this via an example.
The table below shows and Employee Information table.
Emp_ID | Emp_Name | Emp_Address | Emp_Mobile |
101 | Karen | London | (0207) 7555 7777 |
102 | Jack | Burmingham | (1217)90127922, (1217)90238989 |
103 | Frieda | London | (0207) 4566 7654 |
104 | Matthew | Nottingham | (0115) 9155 5555 |
From the table above it is evident that Employee Jack is having two phone numbers in the attribute Emp_Mobile. This contradicts the 1NF Form. To make it complaint the table should be arranged as below.
Emp_ID | Emp_Name | Emp_Address | Emp_Mobile |
101 | Karen | London | (0207) 7555 7777 |
102 | Jack | Birmingham | (1217)9012 7922 |
102 | Jack | Birmingham | (1217) 9023 8989 |
103 | Frieda | London | (0207) 4566 7654 |
104 | Matthew | Nottingham | (0115) 9155 5555 |
- Second Normal Form:
- Create separate table for set of values that apply to multiple record.
- Relate these tables with a foreign key.
A table is said to be in 2NF form if both the conditions hold true:
a. Table is in 1NF.
b. There should be no partial dependency existing in the Relation.
What is a partial dependency?
It means, No subset of the candidate key should determine any non – prime attribute(attributes that are not candidate keys).
Non – prime attribute –> An attribute that is not a part of the candidate key is a non – prime attribute.
Consider the example below.
Teacher_ID | Subject | Teacher_Age |
111 | Maths | 38 |
111 | Physics | 38 |
222 | Biology | 38 |
333 | Physics | 40 |
333 | Chemistry | 40 |
Candidate Keys (Teacher_ID,subject)
Non – Prime attributes(Teacher_Age).
The table given above is already in 1NF form, but it violates the second condition as the Teacher_age attribute is dependent on the Teacher_ID , meaning a non – prime attribute is dependent on the a subset of the candidate key. Hence, to make this table in 2NF form we need to split this table into two tables as shown below.
Teacher_Detail Table.
Teacher_ID | Teacher_Age |
111 | 38 |
222 | 38 |
333 | 40 |
Teacher_Subject Table.
Teacher_ID | Subject |
111 | Maths |
111 | Physics |
222 | Biology |
333 | Physics |
333 | Chemistry |
- Third Normal Form:
- Eliminate attributes that do not depend on the keys.
A table design is said to be in 3NF form if the following hold true.
- 1. Table must be in 2 NF
- Transitive functional dependency of non-prime attributes should be removed.
In other words, A table if it is in 2NF and for each functional dependency X->Y at least one of the following conditions hold
- X is a Super Key of the table
- Y is a prime attribute of the table
An attribute that is part of the candidate key is a prime attribute.
Consider the table below.
Emp_Id | Emp_Name | Emp_Zip | Emp_State | Emp_City | Emp_Disctrict |
1001 | Ajay | 282005 | UP | Agra | Dayal Bagh |
1002 | Jeet | 222008 | TN | Chennai | M-city |
1003 | Jenny | 222008 | TN | Chennai | Urrapakkam |
1004 | Meet | 292008 | UK | Pauri | Bhagwan |
1005 | DK | 222999 | MP | Gwalior | Ratan |
Super Keys – {Emp_id},{Emp_id, Emp_name}, {Emp_id, Emp_name, Emp_zip}…so on
Candidate Keys – {Emp_id}
Here Emp_state,Emp_city,Emp_District are dependent on Emp_Zip and Emp_Zip is in turn dependent on Emp_Id that makes non – prime attributes transitively dependent on Super Key{Emp_id}.
To make this table complaint with 3NF we have to break the table into two tables to remove this dependency.
Employee Table:
Emp_Id | Emp_Name | Emp_Zip |
1001 | Ajay | 282005 |
1002 | Jeet | 222008 |
1003 | Jenny | 222008 |
1004 | Meet | 292008 |
1005 | DK | 222999 |
Employee Zip Table:
Emp_Zip | Emp_State | Emp_City | Emp_Disctrict |
282005 | UP | Agra | Dayal Bagh |
222008 | TN | Chennai | M-city |
Summary
- Normalization is the process of organizing data by disintegrating bigger table into smaller one’s
- with proper dependencies
- Redundant Data wastes a lot of disk space and creates maintenance problems (Update, Insert and Delete Anomaly). Hence the DB tables should be Normalized.
- The process of Normalization is achieved by following some rules which are defined as Normal Forms
- There are basically 3 types of Normal Form – 1NF, 2NF, 3NF. Strictness increases as we go from 1NF to 3NF.
- Apart from the above mentioned Normal Form there exist one more Form called Boyce Codd Normal Form (BCNF) . This is an advanced version of 3NF and is even stricter than 3NF.
Special thanks to Ami Jangid for contributing to this article on takeUforward. If you also wish to share your knowledge with the takeUforward fam, please check out this article