We recently had some performance issues in an application that had been in production for almost 3 years. This application is highly transactional and had grown almost 5-fold in volume. It brings together data from multiple systems. It had already been through a couple of rounds of optimization to the web application. That left us with the database as the most likely candidate for performance improvements. All the usual suspects were checked: indexes (coverage and fragmentation), updated statistics, waits, locks, and hardware bottlenecks. None of these turned up any issues that needed to be addressed. So, we turned our attention to the database schema itself.
A review of the schema showed that the same two tables were being joined to grab one piece of data for almost every query made on the database. To explain this data, we need to explain the application a little more. Of all the systems being brought together, one system has the identifier that is common for all of them. It was this identifier that required two table joins to get. While the database had been through a good round of database design including normalization, it was time to denormalize this field.
Denormalization is essentially putting a field on a table that could have been otherwise read or derived from existing tables. In this case, we already had a routine that was matching the main table with the reference to the tables with our field. So, this routine was modified to add the identifier to the main table at the time a record is associated. The risk in adding the identifier field is that the identifier could change in the source tables and that wouldn’t be reflected in the main table. In this case, the identifier cannot change. So, we had a good solution. We refactored the application to use the field from the main table.
Prior to making this change, we had performance tested the web application. After making the change, we ran the performance test again. Moving this single field, allowed 100% more users before any page loads averaged more than 3 seconds. It decreased our average page load time by 83%. Similarly, it decreased the worst 5% of our page load times by 86%.
While normalization is a good thing for a relational database, the results clearly show that sometimes you can be too normal. So the next time you have an application that could use a performance boost, don’t forget to think a little less normal.