I love data. There, I said it. “Wait…the title says that you hate data migrations? What gives?” If you read the title and the first line and said “yep, makes sense”, you’re in good company. Data is complicated, and often inconsistent (especially user input data, sigh). You always hope you’ve put everything in place to keep it consistent, but at some point something will undoubtedly go wrong. You cannot account for every situation that will arise will developing software (or building anything for that matter). If you can, congratulations, that’s awesome, but you are the exception to the norm.
Maintaining a consistent data structure has to be one of the hardest things I’ve done as a software engineer. CSS may be more frustrating, but you can at least rapidly iterate on finding the solution. Additionally, even though CSS is arguably the most front-facing part of your application, and you may get the most complaints that boil down to a CSS issue, it usually won’t cause a war room situation. Just a critical priority hot fix. If the data is off in one place for one account, it can send the entire company into a panic. “What’s wrong? Why is it off? Is it system-wide or just a one-off? How long has this been happening? How quickly can we diagnose and fix it?” How confident are we that this cannot happen again?” And the list goes on and on. It’s highly stressful and can lead to multi-day war rooms. Not fun. If you are a young, budding engineer, it can be a great time to learn from your senior counterparts, but be prepared for everyone’s worse tendencies to come out. You’ll possible feel inadequate. That’s all okay, you are learning. But anyway, back to data.
I’ve done a ton of data migrations at this point, it’s part of working at a startup. Sure, they aren’t necessarily as complicated as those in large corporations where you have to completely migrate to new machines, coordinate with all the teams that reference that data, version the API endpoints, etc., but the volume is where I’m most familiar. It seems like every feature can have a data migration of some kind in the startup world. Some requirements changed, and now the feature you built has to do something different, or you missed an abstraction and now the data is too specific.
It’s so easy to make a mistake, and when (not if) you do, I hope you catch it immediately. I also hope you have routine database backups configured to be able to quickly restore the data. Whether that is through restoring it to a recovery instance and copying over the data, or fully rolling the database back, you’ll at least save the day. It will absolutely suck and derail everything else you’ve been working on, but it’s better than losing everything (including customers).
This is also a public service announcement to please configure a regular cadence of database backups.
— Me
The best situation is if you are at a startup early enough (with little to no customers) that you have time to fix it. You can fix it over a few days/weeks (or even just completely reset the database and recreate everything). There will come a time where this is no longer possible, but take advantage of it while you can. This is where I’m at right now, and it has been glorious (particularly since we are dealing with a large volume of data from the blockchain).
Speaking of the blockchain (a distributed immutable ledger), that has been one of the best databases to build on top of, especially if you are indexing it for part of your application. If your data gets out of date, corrupted, or needs migrated to a different structure, you can always recreate it from scratch. It will take awhile, but it’s always possible. You will have other problems with queries (if trying to query in real time for your UI), rate limiting (if using an API provider instead of your own node), ensuring you are only syncing validated blocks, and others, but I think it’s an acceptable trade-off. This also assumes that either the data for your application is on the blockchain (a decentralized app AKA dApp) or that your customer’s data is on the blockchain. At this point, I have a hard time imagining why I would build something that doesn’t utilize the blockchain in some way.
All of this to say, with the war rooms and stressful migrations, I still love data. The problems to solve have always been fascinating to me, even when they are frustrating. I remember being in high school and really loving pattern recognition. It seemed like my brain was really good at that type of problem. I was literally searching the internet for “jobs for people good at pattern recognition” trying to figure out what I wanted to do (but not really finding what I thought felt right). I ended up enjoying software engineering for the logical reasoning, but it wasn’t until I started getting into doing a lot of the database side of engineering that I realized how much the data problems were exactly what I had been looking for.