Normalization

Database Normalization is a technique of organizing the data in the database. It is a systematic approach of decomposing tables into atomic entities to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is difficult and next to impossible.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.

Normalization is used or serves mainly two purposes,

  • Eliminating redundant (useless or repetitive) data.
  • Ensuring data dependencies make sense i.e. data is logically stored.

 

Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update and deletion anomaly.

  • Update anomalies− If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
  • Deletion anomalies− Say we tried to delete a record, but parts of it were left undeleted because of unawareness, the data is also saved somewhere else.
  • Insert anomalies– Say we tried to insert data in a record that does not exist at all.

 

Let’s take an example to understand this.

Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing employee’s address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this:

 

emp_id      emp_name     emp_address     emp_dept

101               Rick                 Delhi                      D001

101               Rick                 Delhi                      D002

123               Maggie            Agra                      D890

166               Glenn              Chennai                D900

166               Glenn              Chennai                D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.

To overcome these anomalies and having a peaceful life we need to normalize the data.

 

Bibliography source:

http://www.studytonight.com/dbms/database-normalization.php

https://www.tutorialspoint.com/dbms/database_normalization.htm

https://beginnersbook.com/2015/05/normalization-in-dbms/

 

Leave a comment