For what ought to be a fairly precise industry, computing is fairly weak in a few semantic areas and one of those is the term 'database'. A database is, essentially, any store of data: it might be an Excel spreadsheet of customer names or even a Word document of contact addresses.
More properly, we use the term to describe a formally structured or organised datastore of information. These datastores are created, managed and accessed by specific software, the most common example being Microsoft's Access.
The data is stored in different 'tables' and these are linked together using the data items. So, for example, customer names and details might be on one table while the customer addresses are on another. Each address is on a separate row and has a unique reference. This reference is also held on the customer rows to which it applies.
The example above consists of two tables but a small business system will comfortably have ten times that number. Indeed, database design is an art in itself. When we take on a new project and the initial business analysis is complete, we well sketch out a data model - an initial design for the database - to check our understanding and identify areas requiring further analysis. Over the years we have found that once we have the data model right, many aspects of the system we are constructing will fall into place.
As well as being the foundation of a new system, the database remains the most important feature. Screens can be badly designed, load times may be poor, navigation may be unintuitive but these are all faults that can be fixed and, once fixed, there will be no sign they were any different.
Let me break out here and tell you a story from earlier in my career. I was working as a test manager on a migration project for a large financial institution. All of the existing data - clients, accounts, balances etc - was being moved from a system that was twenty years old onto a new one. My job was to make sure that all the data transferred correctly from the old system to the new one.
Everything went well in testing and the time came for us to do a trial migration. As part of the migration process, we generated a lot of reports, so we could tally the number between the original system and the new one: number of clients, number of accounts, total value and so on. We even threw in some logging with no immediate business value - average number of accounts per client, mean total per client - just so we could make sure all the numbers matched.
The testing had been thorough and I was very pleased with what we'd done to date so it's no exaggeration to say that I was horrified when our test run against live data returned results best described as nonsense. I came clean at the project board meeting and assured my colleagues that we'd get to the bottom of the matter.
Two weeks went by and the results against our test data continued to be spot on, as you would expect, but the more we delved into the live data, the more confused we became. Armed with a set of inexplicable examples I returned to the project board and a member of the user group came clean: at some point in the system's history, there was a bug in the code that meant that the relationship between a client and their accounts was not recorded properly. Without these relationships in place, of course, we had no way of doing a meaningful comparison between the old system and the new one. Furthermore, we had to - effectively - build bugs into the new system to accommodate this 'junk' data from the old one.
Quite apart from the importance of testing, this story emphasises the fact that if there is one aspect of an IT system where you cannot afford to make mistakes, it is with your data. Once data is lost or corrupted, there is no easy way of correcting the problem. Indeed, having 'holes' in your data can commit you to a future of weak software, where the processes and validations have to be weakened to compensate for the mistakes that you have made in the past.
When businesses change hands, websites and IT systems are assets of the sale and so, of course, is the data. If you aren't selling your business, then your data remains at the heart of what you do. Either way, it is vital that you take care if the information that is at the heart of what you do. Here are my recommendations for doing just that:
- Make sure that you, your IT department or your IT provider understand your data.
- When building a system, ensure that your data is captured in an appropriate structure.
- Back up all your data daily.
- When you make changes to the system, particularly - but by no means exclusively - to your data model, then insure that you carry out the appropriate amount of regression testing to ensure that your data is not adversely affected.
Prevention is better than cure, but what do you do if you have data that is corrupted? You need to cater for the existing data that is substandard but that is no reason to weaken the processing around your new data. The simplest measure to take is to flag your old data so that validations know when to apply strict rules and when to apply weaker ones. An example would be if you insist that anyone registering with you has a post code but this rule was, at some point, not enforced. By flagging those addresses with no post code as, say, 'weak data', you can ignore the post code validation when processing those addresses, whilst being as rigorous as you like with data that is not affected.
The bottom line here is to appreciate your data. Your understanding of your business almost certainly maps down to the data (whether you are aware of that or not!) and you must make sure that the people who build and manage your systems take the time to listen to you and thus share that understanding. If your data is corrupted or lost, that is equivalent to losing a filing cabinet in a fire, so you need to protect the data you have and treat it with respect. I think I have mentioned before that of the businesses located in the twin towers, a third went under after 9/11 because they had no backups of their business data.