Database Resilience
Complete Developer Podcast - A podcast by BJ Burns and Will Gant - Giovedì
Your application’s data is probably one of the most valuable and longest-lived pieces of your application. When your application goes to the big bit bucket in the sky, its data will likely live on, perhaps being migrated into another system that is, itself, waiting for its own eventual obsolescence. Such valuable data is unlikely to remain the sole property of your application for very long, no matter much you try to keep other people in your company out of it. As a result, if you want your application to be stable, one of the main sources of problems you’ll encounter is your own database. Further, because databases hold such critical data and because they are the “furthest in” component in most application models, it’s much harder to scale them effectively and safely. A small issue in a database can ripple outward, potentially crippling multiple applications. Besides the ordinary worries of things like network issues and hardware failures, the very way that databases tend to get used in modern apps can cause scaling issues. Making things worse, many database stability and resiliency issues don’t make themselves apparent until the system is already under load, making it difficult to find problems before they occur. And many developers don’t do a good job of making sure that the their code doesn’t cause problems at the database level. Finally, the causes of many database issues is transient and involves the interactions of multiple disparate systems, and often includes timing considerations. However, there is some hope in all this. There are some realistic things you can do that will improve the stability of your database and interactions with it. These things will not only make it more likely that the database remains in a working state during your application’s lifecycle, but they also make it more likely that your application (which depends on its database) will not be as easily broken by another client of the same database. Most of these suggestions can be simply summarized as “do less stuff on the main database server”, but like most summaries, it’s not enough to act on. Databases are the most brittle part of many applications. Not only do they contain the most critical data for a system, but load management strategies for databases are always more complex than simply making another copy of data somewhere. A broken database can also break a lot of things – even if you think your application is the only one that will use your database, you’ll probably find that this statement doesn’t hold true over time. The database is often the easiest, but not the most sustainable, place to hook into the workings of an existing application, especially if one wants to avoid asking the team about it. As a result, having a resilient and robust database implementation is critical to application stability over the long term. Database resiliency covers a lot more areas besides “just” system administration tasks and often has implications that reach into code as well. Episode Breakdown Keep schemas backward-compatible with a couple of versions of deployed applications While it can be tempting to do things like deleting columns in a single state, it can wreak havoc with application stability. You’ll want to update the applications that use a column to no longer do so, WELL BEFORE deleting the column. While columns are a simple example of this, other parts of your database also matter in this regard. For instance, deleting or changing an index can have similar implications. While nothing will explicitly break, removing an index that an application needs can significantly degrade database performance and is best avoided until the index is no longer in use. You should also be cautious when adding columns (especially required ones), changing column types,