Sample Data Generation

Complete Developer Podcast - A podcast by BJ Burns and Will Gant - Giovedì

Categorie:

When you start a new software project, you probably don’t spend a lot of time thinking about how to setup useful and realistic test data. In general, by the time you start thinking about it, your application has reached a level of complexity that doesn’t make it easy. Additionally, you will be tempted to try and create your sample data set in a hacky, short-term manner. While such solutions can work for a little while, their shortcomings eventually become obvious. In particular, you’ll start to notice that bugs are occurring in production that should have been caught in development, if they’d only had good data. As your application ages, there are numerous reasons to start paying better attention to the way that you manage sample data for use by developers. Not only does this enable more agility within your development workflow, but it can make troubleshooting easier. Further, having a clean way to generate sample data on developer machines can make it easier to onboard new team members. If you make the generation of sample data into a repeatable process, it also makes it far easier to do repeatable system testing from a known good state. Done properly, the generation of sample data becomes a first-class portion of your application development workflow. As code evolves, your sample data should also evolve in a way that makes sure that it remains useful for testing how the system will behave in production. While this seems like more of a QA role, making this a development responsibility confers several distinct advantages. In particular, it tightens the feedback loop between development and QA. It also makes it quicker to set up a new environment, whether that is a new environment for a new developer or an environment being used in other parts of the development process, possibly including quality assurance. Having good sample test data in your local development environment is critical for being able to effectively write code in your local environment. Ideally, this test code will reflect a lot of the sort of scenarios that you are likely to encounter in production. Not only do it make troubleshooting easier, but it also increases the likelihood that you will spot potential problems earlier in development. Finally, it makes it much easier to onboard new developers and reduces the amount of friction developers experience when beginning to work on a different part of the system. Episode Breakdown Why you need to have good sample data in your local working environment Sparsely populated local development databases are fast and this speed can mask performance and data integrity issues. If something can happen in production and cannot happen in your local (or worse, QA) environment, then it will consistently surprise you in production. Having appropriate test data in your local environment ensures that you can do reasonable sanity checks without getting QA involved. This tightens your feedback loop and makes you more productive. This test data also makes it easier for you to quickly troubleshoot new scenarios that come up. Let’s say that you find there is an issue if a certain column value is null in a specific use case – it’s much less effort to null out that column as needed, instead of having to fully (and manually) create all the required test data from scratch. Additionally, as you make schema and system changes, having appropriate sample data will force you to write appropriate data migration code, rather than putting it off until it breaks in the QA or production environment. Why shared dev databases are evil. It can be tempting to use a single database for all developers to make it easier to have appropriate test data. However, there are far more downsides than upsides with this approach. First, this approach means that the actions of a single developer can break things for...

Visit the podcast's native language site