In this introductory article I will try to explain what database normalisation is and why it is important. In later articles we will look at the process of normalisation.

Is Normalisation Necessary?

The short answer is no. You can keep all your data in one table. This is called a flat file database and it can be the most appropriate format for some kinds of data. A spreadsheet is perhaps the most obvious example of a flat file database. However, even within a Database Management System such as Microsoft Access, MySQL, or the myriad other commercial DBMSs available you can keep all your data within one table.

Once your data, or rather its structure, becomes even slightly complex then it makes sense to separate it into a number of tables through the normalisation process.

The reasons for doing this are threefold:

  • Performance improvement
  • Reduction in database size
  • Data integrity

    Similarly there are reasons for not normalising a database:

    • Performance improvement
    • It is too hard

    Let’s look at each of these, although not in order.

    It is too hard

    No one ever gives this as a reason for not normalising a database but I have seen many databases which are incompletely or incorrectly normalised. Normalising is not easy, but it is not all that difficult either. If you can’t do it then find someone who can, or learn how to do it. End of story.

    Database size

    A normalised database will be smaller than its corresponding flat file counterpart. These days disk space is cheap, cheaper than paying for a database consultant. But smaller is usually faster and throwing away disk space is an ongoing cost.

    Nevertheless, this is not really a good reason to normalise. You can buy a lot of gig for the cost of normalising a relatively staightforward database.

    Performance

    This appeared as a positive and a negative. A normalised database will often outperform a non-normalised database. But if you take the normalisation to extremes you can end up with very complex joins between the tables and this will result in a degradation in performance.

    But, like the discussion on database size, performance is becoming less of an issue. Processors are getting faster. New desktop PCs have 3GHz or more processors and 256meg memory as a minumum with most now having 512meg or 1gig.
    It seems these days that if a database, or indeed any application, is not performing well then we can throw more grunt at it relatively cheaply.

    Data Integrity

    This is the only valid reason for normalising a database. Obviously, there is no reason in having a database if you cannot guarantee the integrity of your data – that’s why you have a DBMS in the first place.

    How does normalising help to maintain data integrity? There are two basic premises behind normalisation.

    1. Data is only held in one place in the database. This means that any inserts, updates or deletes only affect one table. There are cascading effects but we will look at those in later articles.
    2. Data is accessed through the key to the table and all data is dependent upon the key. When you use the primary key to look up a table you will only get one row of data. For example, if you have a table which lists people, members of an organisation, or staff in a company, then accessing the table using the primary key will return data for one person.

    That is a very rudimentary explanation but the reason to normalise a database is to ensure that your data is reliable.

    How far should I go

    This is a difficult question. There are a number of steps involved in normalising a database. The first step puts the database into 1st Normal Form, the second step into 2nd Normal Form, and so on.

    It is fairly widespread practice to normalise to third normal form. And in many cases that is enough. After third normal form there is a form known as BCNF, or Boyce-Codd Normal Form, which has relevance when there are a number of fields, each of which is a candidate key and they are inter-dependant upon each other. This form, together with the later forms, will be examined in a later article.

    There is a final step in normalisation, regardless of the level of normalisation undertaken, and that is decomposition. Once you have normalised your database you most certainly don’t want to undo all your good work. But you must be able to demonstrate that you have not lost any data through your normalisation process. That means that you should be able to derive your original flat file from your normalised tables.

    This step is often not undertaken explicitly and if the normalisation rules are followed rigorously then you are assured that you won’t have lost data. But it is a good idea to check some of your data to re-assure yourself that it is all there and all accessible.