Why is normalization required?

Normalization

Database Normalization is a technique of organizing the data in the database. It is a systematic approach of decomposing tables into atomic entities to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is difficult and next to impossible.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.

Normalization is used or serves mainly two purposes,

  • Eliminating redundant (useless or repetitive) data.
  • Ensuring data dependencies make sense i.e. data is logically stored.

 

Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update and deletion anomaly.

  • Update anomalies− If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
  • Deletion anomalies− Say we tried to delete a record, but parts of it were left undeleted because of unawareness, the data is also saved somewhere else.
  • Insert anomalies– Say we tried to insert data in a record that does not exist at all.

 

Let’s take an example to understand this.

Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing employee’s address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this:

 

emp_id      emp_name     emp_address     emp_dept

101               Rick                 Delhi                      D001

101               Rick                 Delhi                      D002

123               Maggie            Agra                      D890

166               Glenn              Chennai                D900

166               Glenn              Chennai                D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.

To overcome these anomalies and having a peaceful life we need to normalize the data.

 

Bibliography source:

http://www.studytonight.com/dbms/database-normalization.php

https://www.tutorialspoint.com/dbms/database_normalization.htm

https://beginnersbook.com/2015/05/normalization-in-dbms/

 

Transitive functional dependency.

In a Database Management System

A transitive dependency is a functional dependency which holds by virtue of transitivity.

Transitive dependencies occur when there is an indirect relationship that causes a functional dependency.

A transitive dependency can occur only in a relation that has three or more attributes.

The symbol “->” means than.

”A -> C” is a transitive dependency when it is true only because both “A -> B” and “B -> C” are true

Suppose A, B, and C designates three distinct attributes (or distinct collections of attributes) in the relation. Suppose all three of the following conditions hold:

A → B

(If A is true than B holds true)

It is not the case that B → A

(And when B is not true than A holds true)

B → C

(And when B is true C is also true in the relation.)

 

Consider the below table:

Book                                           Author                                               Author_age

Game of Thrones             George R. R. Martin                                      66

Harry Potter                      J. K. Rowling                                                  49

Dying of the Light             George R. R. Martin                                     66

 

{Book}  -> {Author}

If we know the book, we knows the authors name

 

{Author} does not -> {Book}

If we don’t know the authors name than we can still have a book.

 

{Author} -> {Author_age}

If we know Author, we also know Authors age.

 

Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that makes sense because if we know the book name we can know the author’s age from the table.

 

Bibliography Source

https://en.wikipedia.org/wiki/Transitive_dependency

https://beginnersbook.com/2015/04/transitive-dependency-in-dbms/