GUID or Autoincrement?

Every commercial program I have written, except one, has used a relational database. The exception had a very small amount of data which I kept in an XML formatted file. That’s fine for very small amounts of data, and in this case it was the sort of data you would normally find in a configuration file.

Many years ago almost every application had an initialization file called myprogram.ini, or something similar. This contained all the data needed to run the program. Things like the location of the database, and perhaps some user-defined settings, such as windows sizes, last opened document etc. The ini files were easy to read and write and could be accessed through a couple of Windows API calls which would read and write the Private Profile String for the application.

This all changed and Microsoft recommended that we read and write Registry entries. In the main I ignored the recommendation and continued to use ini files – mostly because I was used to them, and they worked just fine.

In the .NET world Microsoft has reverted to text files, but instead of calling them ini files they are now Application Configuration Files and are stored as XML documents. This time I have accepted the recommendations because app.config files are easy to read and write, and they don’t require any API calls. Everything is done through managed code and the world remains a happy place conducive to producing contented programmers.

But when you are progamming against a database it doesn’t make sense to convert a relational database into XML. You need to use a relational database but you then run into the problem of what to use as a primary key for your tables.

I wrote some time ago about using Surrogate Keys as the primary keys for database tables. In many cases they are very useful, but what should you use as the surrogate?
I wrote then that I used autoincrement fields when using MS Access databases, but I have had a rethink about this practice.

First off, there is nothing wrong with using Autoincrement. In Access they are 4 bytes, or long integer type, so you can have up to 2 billion distinct records. That is more than enough for an Access database. You would be looking at a more substantial database long before you got to that many records. And, in any case, the types of programs that I write are for companies with nothing like that many records.

Autoincrement fields are also very easy to use. So easy, in fact that you can almost forget about them, you certainly can when creating new records. When you want to read a record it is retrieved using the long integer type in Access, or the Integer or System.Int32 type in .NET. Very straightforward.

But there is problem. In fact there are two problems. The first being that you don’t know the value of the autoincrement. Access doesn’t return the value. I have read that there is a way around this, and that as of JET 4.0, which is what I am using in .NET there is a value called @@IDENTITY, similar to that returned by SQLServer, and many other database systems. However, I am not sure that it is available outside of a data adapter, and I often don’t use them because of the overhead.

In any case, that problem is really just a symptom of the larger problem, and it isn’t confined to Access, but applies to any database which generates a new number to be used as a key. The problem is that the value is not known until the record is added to the database. And this is a problem in many applications.

An example will provide an explanation of the difficulty. Let’s suppose that a customer places an order for a number of items. You create a new order. If the application needs to track where items are sent, then there will be a table with the complex primary key of OrderNo and Stock Item No, or something like that. First we need to generate the Order record so that we get a new Order Number, and that number is used to create the Order-Stock record. Herein lies the problem.

If the user decides to cancel the order this is what happens. The application creates a new order so that an order number is generated. An Order-Stock object is created. It does not get saved until the user commits the changes, and in this case it will merely be destroyed. Another call is made to the database to destroy the just created record. And until the database is compacted it has still grown. And there were two unnecessary calls to the database server.

The problem is that users expect that deleting a create prodedure will restore them to a pre-existing condition, but this results in wasted database calls, creation and deletion of records needlessly, and hence additional coding.

The answer, in these cases, is to use GUIDs or Globally Unique Identifiers as the primary key. But these should never be exposed to the user. They are 16 bytes in size and look like gibberish. But they are generated by the application. They may be a little fiddlier to code, but if you do all your data access through data layer then you aren’t doing this coding very often. So far they seem to be the answer to my problems.

Comments are closed.