Normalization in DBMS : What is Normalization

Normalization in DBMS. Learn about normalization and different types.

Abstract

  • Normal forms are rules for Database Normalization. Each rule is called a “Normal Form”.
  • Normalization – Process of meaningfully organizing data in database for easy usage.
  • Database normalization is essential to minimize redundancy and to ensure only related data is stored in each table.
  • Normal Forms are of mainly three types – First Normal Form, Second Normal Form, Third Normal Form.

Scope of Article

The article explains database normalization terminology for beginners. A basic understanding of this terminology is helpful when discussing the design of relational database.

This article does not deal with the Relational Algebra in Database and the Keys in Database.

Definition:

Normal Forms are the rule by which we achieve normalization. Now, what is normalization and why is it essential?

Normalization is a process of organizing the data in the database to avoid data redundancy and anomalies.

Let’s understand this with the help of an example. Consider the table below for a manufacturing company that stores Employee data.

Emp_IDEmp_NameEmp_AddressEmp_Dept
101RickDelhiD001
101RickDelhiD002
123KarenLondonD450
133MaxSydneyD500
133MaxSydneyD003

Now let us understand the problem with this table.

  • Firstly – Employee Rick from the table is working in 2 departments. Now if we want to update the address for Rick, we would have to update it at 2 different places. This could create a lot of unforeseen issues. There could be even simple mistake like wrong spelling entered at one location. (This is Update anomaly)
  • Secondly – Suppose a new employee want to join the company, who would be under training, then we would not be able to add that employee to this table as he is not assigned to any department. (This is Insert Anomaly)
  • Lastly – If at some point the Department D450 is closed. Then we would have to delete the data corresponding to Karen, and the information would be lost. (This is delete anomaly)

To overcome these anomalies we require Normalization, and Normalization (decomposing bigger tables to smaller tables to meaningfully organize data) can be achieved through Normal Forms. Let us understand the Normal Forms in detail.

TYPES OF NORMAL FORMS:

  1. First Normal Form(1NF):
  • Eliminate repeating groups in individual tables.
  • Create separate table for each set of related data.
  • Identify each set of related data with a primary key.

As per rule of first normal form, an attribute(column) of a table cannot hold multiple values. It should hold only atomic value (which cannot be divided further).

Let’s understand this via an example.

The table below shows and Employee Information table.

Emp_IDEmp_NameEmp_AddressEmp_Mobile
101KarenLondon(0207) 7555 7777
102JackBurmingham(1217)90127922, (1217)90238989
103FriedaLondon(0207) 4566 7654
104MatthewNottingham(0115) 9155 5555

From the table above it is evident that Employee Jack is having two phone numbers in the attribute Emp_Mobile. This contradicts the 1NF Form. To make it complaint the table should be arranged as below.

Emp_IDEmp_NameEmp_AddressEmp_Mobile
101KarenLondon(0207) 7555 7777
102JackBirmingham(1217)9012 7922
102JackBirmingham(1217) 9023 8989
103FriedaLondon(0207) 4566 7654
104MatthewNottingham(0115) 9155 5555
  • Second Normal Form:
  • Create separate table for set of values that apply to multiple record.
  • Relate these tables with a foreign key.

A table is said to be in 2NF form if both the conditions hold true:

a.   Table is in 1NF.

b.   There should be no partial dependency existing in the Relation.

What is a partial dependency?

It means, No subset of the candidate key should determine any non – prime attribute(attributes that are not candidate keys).

Non – prime attribute –> An attribute that is not a part of the candidate key is a non – prime attribute.

Consider the example below.

Teacher_IDSubjectTeacher_Age
111Maths38
111Physics38
222Biology38
333Physics40
333Chemistry40

Candidate Keys (Teacher_ID,subject)

Non – Prime attributes(Teacher_Age).

The table given above is already in 1NF form, but it violates the second condition as the Teacher_age attribute is dependent on the Teacher_ID , meaning a non – prime attribute is dependent on the a subset of the candidate key. Hence, to make this table in 2NF form we need to split this table into two tables as shown below.

Teacher_Detail Table.

Teacher_IDTeacher_Age
11138
22238
33340

Teacher_Subject Table.

Teacher_IDSubject
111Maths
111Physics
222Biology
333Physics
333Chemistry
  • Third Normal Form:
  • Eliminate attributes that do not depend on the keys.

A table design is said to be in 3NF form if the following hold true.

  1. 1.     Table must be in 2 NF
  • Transitive functional dependency of non-prime attributes should be removed.

In other words, A table if it is in 2NF and for each functional dependency X->Y at least one of the following conditions hold

  • X is a Super Key of the table
  • Y is a prime attribute of the table

An attribute that is part of the candidate key is a prime attribute.

Consider the table below.

Emp_IdEmp_NameEmp_ZipEmp_StateEmp_CityEmp_Disctrict
1001Ajay282005UPAgraDayal Bagh
1002Jeet222008TNChennaiM-city
1003Jenny222008TNChennaiUrrapakkam
1004Meet292008UKPauriBhagwan
1005DK222999MPGwaliorRatan

Super Keys – {Emp_id},{Emp_id, Emp_name}, {Emp_id, Emp_name, Emp_zip}…so on

Candidate Keys – {Emp_id}

Here Emp_state,Emp_city,Emp_District are dependent on Emp_Zip and Emp_Zip is in turn dependent on Emp_Id that makes non – prime attributes transitively dependent on Super Key{Emp_id}.

To make this table complaint with 3NF we have to break the table into two tables to remove this dependency.

Employee Table:

Emp_IdEmp_NameEmp_Zip
1001Ajay282005
1002Jeet222008
1003Jenny222008
1004Meet292008
1005DK222999

Employee Zip Table:

Emp_ZipEmp_StateEmp_CityEmp_Disctrict
282005UPAgraDayal Bagh
222008TNChennaiM-city
222008TNChennaiUrrapakkam
292008UKPauriBhagwan
222999MPGwaliorRatan

Summary

  • Normalization is the process of organizing data by disintegrating bigger table into smaller one’s
  • with proper dependencies
  • Redundant Data wastes a lot of disk space and creates maintenance problems (Update, Insert and Delete Anomaly). Hence the DB tables should be Normalized.
  • The process of Normalization is achieved by following some rules which are defined as Normal Forms
  • There are basically 3 types of Normal Form – 1NF, 2NF, 3NF. Strictness increases as we go from 1NF to 3NF.
  • Apart from the above mentioned Normal Form there exist one more Form called Boyce Codd Normal Form (BCNF) . This is an advanced version of 3NF and is even stricter than 3NF.

Special thanks to Ami Jangid for contributing to this article on takeUforward. If you also wish to share your knowledge with the takeUforward fam, please check out this article