The importance of Database Indexing and its effects on efficiency.

'Schema Busters' is a multipart technical overview of Database Design elements that you can implement when designing and accessing your database implementation. In this second section we will discuss the importance of Indexing database columns to speed up data accesses. We will discuss the pros and cons of indexing, and describe the logic behind Indexes.

What is an Index?
Think of database indexes like you would an index in a library book. When you go into the library and need to find a book of a particular genre, you usually head straight to the genre's section. This is similar to performing Select statements on non-indexed tables. You might be looking at the right shelves but you don't necessarily know where to start looking. Indexes help to more quickly identify the location of a book, or in the case of databases a record.

How do Indexes work?
Take our library example from above. What would be an efficient and simple solution to help speed up the time it takes to locate a particular book? Sorting! Let's sort all of the books by their title, and now anybody who knows the title of their book can quickly slim their search down to a significantly smaller number of books. Database indexes work in the same exact way. Indexes keep track of a sorted set of values with an associated row number that allows them to point back to the original record containing the sought index value.

Creating an Index
You can create an index on a table column either at table creation or by using the CREATE INDEX syntax. Listed below are examples of both of these approaches on the same table with the same results:

  • CREATE TABLE Settings
    (
    settingId INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    settingName VARCHAR(128) NOT NULL,
    settingDescription VARCHAR(512) NOT NULL,
    settingGroup VARCHAR(256) NOT NULL,
    settingValue VARCHAR(512) NOT NULL,
    INDEX(settingName),
    INDEX(settingGroup)
    );
  • CREATE TABLE Settings
    (
    settingId INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    settingName VARCHAR(128) NOT NULL,
    settingDescription VARCHAR(512) NOT NULL,
    settingGroup VARCHAR(256) NOT NULL,
    settingValue VARCHAR(512) NOT NULL,
    );
    CREATE INDEX indexSettingName ON Settings (settingName);
    CREATE INDEX indexSettingGroup ON Settings (settingGroup);

Efficiency Comparisons
Indexes manipulate the fact that they are looking through sorted records in order to refine their search scopes to specific regions by doing simple comparisons. While the method that indexes refine their search scopes is dependent upon the DBMS and the type of Index being created, we can still show a generic example of efficiency improvements due to indexing based on the concept of binary searching. Keep in mind that this is a rudimentary example of indexing logic and is simply an estimate of efficiency gains. Again we will refer to our library example when determining efficiency gains. Let's assume that the number of books contained in this non-indexed library is N, where N is any real whole number greater than 0.

Imagine that you are looking for a particular book in a non-indexed library. Where do you start? Best case scenario is you will pick up the book you want first, however in the worst case scenario you could end up looking at every single book before finding the one you want. This yields a best case of 1 comparison versus a worst case of N comparisons.

Now imagine that you are looking for the same book in an indexed library sorted by book title. We can now logically eliminate half of the books to look through simply by comparing the middle book title with our sought after title. In one row comparison we have halved our search time, and the greatest part is this process can be applied recursively! This yields a best case of 1 comparison versus a worst case of log2(N) comparisons. The logarithmic base of 2 comes from our division into two subsets equivalent in size.

First let's note that in the best case we gain nothing, however efficiency is better evaluated in the worst case or in an aggregate of the two. In the worst case we considerably reduce the amount of books we will have to look through. If you are familiar with Big-O Notation you will notice this difference immediately. If you're not familiar with Big-O Notation, try substituting 1,000 for N. Our non-indexed library has a worst case of 1,000 searches whereas our indexed library has a worst case of approximately 10 searches. It takes 1% of the time to find the book we want in the non-indexed library (if we're considering the worst case scenario)!

Side Note
If you have been using databases for a decent amount of time you have probably already come across indexes but may not be immediately aware of it. Primary Keys and Unique constraints are actually forms of indexing. By default these indexes are clustered, but you can specify otherwise when creating the constraint. This is important to note when deciding how to define primary keys for your tables.

Index Issues To Consider
Indexing is only beneficial to access queries, so before you go and add indexes to all of your tables on every column possible know that indexing can have a negative effect on your database performance. Indexing columns will increase the time it takes to perform INSERT, UPDATE, and DELETE commands since the index will also need to be modified appropriately to reflect changes. Always ask yourself 'Will the majority of activity on this table be SELECT statements?' and 'Where do I want to see stronger performance?' before adding indexes to a table.

It is also necessary to consider which columns you wish to index. Take our library example again: If we knew the author's name but not the book title, then we would gain no benefit from an index on book titles. We would need to add an index on the author attribute instead of the book title column. In all reality you would probably benefit from an index on both attributes, but it is hard to understand multiple indexes with our library example since we cannot sort books by author and title separately due to physical limitations of the books. Fortunately there are no physical limitations with database data so it is entirely plausible to have two separate and distinct indexes. Choosing which columns to index is highly dependent upon project and client specifications. If you are having trouble identifying where you need to insert indexes consider using a database analyzer that can track where you are experiencing the most transaction traffic and provide suggestions that may benefit your database design.

Conclusion
Almost every project you come across can benefit from indexing database tables. Learning how to properly create indexes and utilize them to your advantage can further increase their usability, and the best part is they are relatively simple to understand and implement. Indexing is one of the primary means of boosting application speed, especially with a database-heavy application. For more information on indexing consider picking up Fundamentals of Database Systems by Elmasri & Navanthe or another database schema textbook.