I have been thinking a lot lately about keeping users away from their data. This isn’t something new that I have been doing, it is fairly standard practice especially in object oriented programming, it is just that I have been thinking about it more formally.
The first question is: why should a user be kept from their data? Perhaps a little history is in order.
Many years ago, and even today in legacy systems, large amounts of data was held on mainframe systems. In the commercial world this often meant writing programs in COBOL using an IMS hierarchical database or a relational database like CICS DB2. The database modelling was done with data flow diagrams and entity relationship diagrams. The programs would grab the data and display it on the screen in fields produced by a screen overlay generator. The user could edit the data or enter new data, and save it back to the database.
The whole procedure was data driven.
The along came Windows with its WIMP – Windows, Icons, Mouse, Pointer – user interface, and everything changed. Sort of.
Programs were now often UI, or User Interface, centric, rather than data centric. This sounds like a good thing, after all, shouldn’t application developers concentrate on their users? Of course they should. But this approach led to designing the user interface first and then, more often than not, fetching the data to display in the various controls on each form. So, in effect, it was no better than the data centric approach, in many cases.
The other problem with UI centric design is that often the users really don’t know what they want. This isn’t because users are not very bright, many of them are extremely bright. Nor is it because users are not programmers. I know that whenever I build an application for myself – and many/most programmers build their own tools from time to time – I am never sure what I want at the outset.
So then, how are we to build applications?
If we get back to the original premise of this article which is that users should be kept away from their data and explain why, then we will be in a better position to answer this question.
Data is the most valuable of any computer application. Indeed it is often the most valuable asset in the business.Valuable assets need to be protected. The easiest way to protect them is to keep them under lock and key. So it is with data. It is valuable so keep it locked away. I also keep it locked away from the prying eyes of developers, including myself, but that is a subject for another time.
The reason I keep it locked up and not accessible to users is that they don’t need to see it nor do they want to see it, and if they did see it they probably wouldn’t understand it. Perhaps I should explain.
Users are not interested in data, they are interested in information. The common example given in all the books on programming is that of a customer who places an order. The user is interested in the order, and then invoice that is sent when the order is filled. But if the data is kept in a relational database then there will be tables for perhaps: Customer, Order, OrderItems, Products, Invoices, InvoiceItems, and maybe some tables to take care of many-to-many relationships. The user doesn’t care because the user isn’t a relational database analyst. If you want to test this then the next time you are at a dinner party try to start a conversation on relational databases. Two things will happen. No one will talk to you, and you won’t receive another invitation to parties, anywhere, ever.
The second reason that data should be separated from users is because of the old adage Garbage in, garbage out. This used to be the case but it is no longer acceptable. Garbage in is just not an option these days. Of course we cannot prevent users from entering inaccurate data. If the surname should be Smith and the user enters Jones then there is nothing we can do about it short of writing some validation that changes all instances of Jones into Smith. But I don’t think that would be acceptable.
But what if the user enters Smith into the Date of Birth field on the screen? We can prevent that. The question is where and how do we prevent it? With a purely data-centric view then we could attempt to validate at the database level and write some code in SQL. Now SQL is a very powerful language in the hands of the experts but really, life is too short to go down this route. The other problem is that you are accessing the database for no good reason, and this is a slow process, painfully slow if you are operating on a network.
So the validation must take place before the data hits the database. The usual place is in a Business Layer, although in a small application you can stick it in the presentation layer. And the validation is written in the same language as the application, a much easier proposition. If the validation fails then the user is informed and the database is never touched.
This leads to another point. Every application comprises a number of design decisions. One is Robustness vs Correctness. A Robust application will always try to do something to keep the application running. A Correct application will never return an incorrect result, and may even shut down if there is an error. Information can come from many sources, not just user input. How that information is handled if there is an error is a design decision, as is the type of validation to be done on that information before it is sent to the database.
There are many ways to keep data away from a user. You can build your own data access layer, and I have done this many times. Or you can use some form of Code generation. I believe that this is the best approach and it is the way that I build applications these days. I will talk more about code generators in another article.