The importance of Database Relations Normalization and its effects on efficiency.

'Schema Busters' is a multipart technical overview of Database Design elements that you can implement when designing and accessing your database implementation. In this first section we will discuss the importance of a concept titled Normalization which aims to eliminate a number of undesirable attributes that a database relation may contain. Initially we will discuss these undesirable attributes and then describe a number of popular methods, called Normalizations, to remove any undesirable attributes and convert an existing database relation into what is known as a Higher Normal Form Relation. In each section we will also provide real world examples where such normalizations improve the quality of relations and query efficiency.

Undesirable #1: Non-atomic field values

Storing atomic (or indivisible) values in a field makes accessing data simpler and reduces redundancy. Atomicity can be obtained by disallowing the direct storage of multiple values in a single attribute field. That is, a single field contains at most a single element of the attribute domain associated with this field. Take for example a CAR relation with a Color attribute. Even though a single car may come in several colors, we would not want to store a list of colors in a single field since it makes accessing a single color element indirect and complicated (shown in figure 1).

CAR

Car_IdNameColor
001Honda Civic{Black}
002Toyota Camry{Red,White,Blue}
Figure 1 – Non-atomic attribute Color

To resolve this issue we need to convert our CAR relation into what is called a First Normal Form relation. There are different ways of achieving this, some more efficient than others. Below is my recommended method:

  • Remove the Color attribute from the CAR entity and place it into an associative table CAR_COLOR along with the primary key of CAR, Car_Id.
Pretty easy so far, right? These two tables are now considered to be in First Normal Form. We are now able to read in single car-to-color associations without the need of value parsing.

Undesirable #2: Non-Full Functional Dependency

Enforcing full functional dependency more accurately and properly associates relational data together. For example, a table may have a composite primary key with different elements being functionally dependent on pieces of the composite key. Full functional dependency separates data relations into intuitive dependency sets and allows for further database normalization. Take for example the following CAR_SHOP, which refers to a shop that works on cars, relation and functional dependencies (denoted FD):

CAR_SHOP

Car_IdShop_IdDriverOwnerPrice

  • FD1: Car_Id, Shop_Id >> Price
  • FD2: Car_Id >> Driver
  • FD3: Shop_Id >> Owner

The functional dependencies describe relations between data. FD1 simply states that for a given car at a given shop the price can be evaluated, FD2 states that a given car has a given driver, and FD3 states that a given shop has a given owner. We can break up the CAR_SHOP entity to be simpler and more descriptive by performing the following manipulations:

  • Create a CAR entity that associates Car_Id and Driver.
  • Create a SHOP entity that associates Shop_Id and Owner.
  • Modify the CAR_SHOP entity that associates Car_Id, Shop_Id, and Price.
As you can see, this simply reflects the FDs described for our original CAR_SHOP entity. These three new entities are considered to be in Second Normal Form.

Undesirable #3: Transitive Dependencies

Transitive dependencies exist when a non-key attribute of an entity is found on both the left and right-hand side of a known functional dependency. This can lead to attributes being functionally dependent to non-key attributes. Take for example a modified version of the CAR entity we described in the previous section:

CAR

Car_IdVINOWNERDL_NUMBERSSN
Notice that we can derive the following functional dependency Car_Id >> DL_Number, SSN by the transitive property as shown below:
  • Car_Id >> Owner >> DL_Number, SSN
Since it is relatively safe to assume that a single Owner can have more than one car, we can easily see that having two entries in the CAR relation would generate repetitious data. In order to simplify our entity we can take the following steps:
  • Modify the CAR entity and associates Car_Id, VIN, and Owner.
  • Create an OWNER entity that associates Owner, DL_Number, and SSN.
We can verify that these manipulations are valid by double checking that the original functional dependencies still hold on our new relations, CAR and OWNER. FD1 now holds on the modified CAR entity and FD2 holds strictly on the OWNER entity.

Conclusion (for now)

While there are more normalizations we can consider, several have limited practicality in real-world environments and can be made unnecessary if a database is designed carefully. I say this not to belittle their importance, but to recognize the matter of pertinence in our discussion. If your projects are requiring you to perform additional normalizations, you may consider picking up Fundamentals of Database Systems by Elmasri & Navanthe or another database schema textbook and doing some independent studies. Hopefully by now you have gained a general understanding of the importance of database relation normalization and how it can improve your database schema. Look for my next entry in this series where we will discuss additional database schema concepts and practices.

.blogtable { width:100%; border:solid 1px #333333; color:White; } .blogtable th { background-color:#555555; color:White; } .blogtable td { background-color:#aaaaaa; color:White; padding:5px; }