More minute methods to further improve your database performance
'Schema Busters' is a multipart technical overview of Database Design elements that you can implement when designing and accessing your database implementation. In this third section we will discuss several tricks and tweaks to consider when constructing your database schema and queries. While some of the proposed methods are fundamental and globally applicable, others are subject to a lot of debate since they are highly situational. Take the application logic into careful consideration before utilizing these methods since some of the proposed methods can actually work to the detriment of your system. The examples below are intended for a MySql implementation, but I am confident that similar issues can be resolved in a similar fashion for other database engines. With that said let's begin with an extremely MySql specific concern.
MySql Storage Engines and the COUNT Function
While administering your database you may have noticed that MySql has two primary storage engines: MyISAM, which is good for read-heavy applications, and InnoDB, which is good for write-heavy applications. In general the two have similar performances with the one exception being how they handle the COUNT function. Before I begin discussing the benefits of using MyISAM's meta-data collection, I would like to remind readers that InnoDB has considerable benefits when creating an insert/update heavy application which is common in several company intranet applications. Since a large majority of lifeBLUE's projects involve front-facing websites wrapped around a back-end management utility, we frequently utilize the MyISAM engine over InnoDB.
MyISAM keeps an internal cache of table meta-data whereas InnoDB keeps no such cache. One of the values found in this table meta-data is the number of rows, which in turn means any COUNT(*) call has the potential of having minimal costs. InnoDB, on the other hand, has to read every entry row present in a table to evaluate a COUNT(*) call (which is about as effective as calling SELECT(*) on the table). The major drawback to utilizing MyISAM is the loss of InnoDB-specific functionality such as stored procedures.
Subqueries vs. Joins
For those who are unfamiliar with the terminology, Subqueries involve the embedding of queries in another query, most commonly in the form of SELECT statements, while a JOIN operation creates a temporary table that holds results from two different tables linked by an identical value in both tables. Two example queries with the same results are provided below:
SELECT d.Id, (SELECT alias FROM resources r WHERE r.Id = d.Id) AS alias FROM document d
SELECT d.Id, r.alias FROM resources r INNER JOIN document d ON (r.Id = d.Id) GROUP BY r.Id
A general rule of thumb is to replace correlated subqueries with joins. A correlation occurs when the subquery references a table from the outer query. In our example, the subquery can be considered correlated since it references the d.Id property from the document table which is contained in the outer query. It is also important to point out that implied JOIN operations are not as efficient as explicit JOIN operations since the join condition is not specified immediately resulting in larger table reads. Above we utilized an explicit JOIN operation and below we show an example of an implicit JOIN operation:
- SELECT d.Id, r.alias FROM resources r, document d WHERE r.Id = d.Id
If you want to take it a step further, really crack down on the type of joins being utilized to reduce the size of returned datasets (types of JOIN operations explained here).
This practice requires a strong understanding of the application logic, and should be evaluated from both a usability and development standpoint. There are two approaches to table partitioning: read-optimization and write-optimization. Let's start with read-optimization:
The idea is to simply maintain a smaller dataset of frequently accessed data and another table to hold infrequently accessed data. Imagine that you have an extremely generic content management system where each page contains an alias, description, url, and html body with the following table layout:
The alias and description are simply to give a name and summary of the content and the url is obviously the page location. The HTML body will most likely only be accessed when the page is called directly, whereas the alias, description, and url may be displayed in search results and recent item listings. Our resultant tables would look like this:
For write-optimization we are implementing the same exact practice, just with the understanding that we are separating the frequently updated values of a table from the infrequently updated values.
Data Types & Size Constraints
There are several ways to tighten up your data usage by manipulating size constraints and data types. I will provide a few examples and leave the rest to the imagination. By now you should understand that VARCHARs maintain a variable length byte to keep track of how many characters are contained in the VARCHAR entry. If we are handling data entries that have a known fixed size, such as zipcodes, it is beneficial to use the CHAR data type as opposed to a VARCHAR since we are eliminating the need of the length determining byte. This may seem small at first, but after a thousand entries we are effectively eliminating a kilobyte of additional data. The worst memory hits come in the form of redundant data entries. It is important to implement proper database normalization to reduce redundant entries as well as other undesirable attributes. See my first Schema Busters entry for more information on database normalization.
If you have been following along from the first entry in this series you should now have enough information to crank out an extremely clean and efficient database design. If there are areas that still remain grey, I highly recommend picking up a textbook on database systems such as the Fundamentals of Database Systems by Elmasri & Navanthe. Also be sure to check back for future entries on improving database and application efficiency. From this point on we will be moving into application-level concepts to improve efficiency and promote usability. Any additional related database entries can be expected in the form of an addendum.