<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>midnight muse &#187; Databases</title>
	<atom:link href="http://midnightmuse.com.au/category/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://midnightmuse.com.au</link>
	<description>Richard Wright's musings about software and other things that take his fancy</description>
	<lastBuildDate>Thu, 06 May 2010 05:43:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Changes to SQL Please</title>
		<link>http://midnightmuse.com.au/2007/04/27/changes-to-sql-please/</link>
		<comments>http://midnightmuse.com.au/2007/04/27/changes-to-sql-please/#comments</comments>
		<pubDate>Fri, 27 Apr 2007 08:11:37 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2007/04/27/changes-to-sql-please/</guid>
		<description><![CDATA[SQL is a good language, but it just needs one or two changes.]]></description>
			<content:encoded><![CDATA[<p>I like SQL, I really do. I like declarative languages in general.</p>
<p>But there is one thing about it which really annoys me. And that is the very different syntax between inserting and updating a record. it isn&#8217;t because they are difficult to remember, they aren&#8217;t. The standard CRUD (Create/Retrieve/Update/Delete) operations are the type of statements that most SQL users could type in their sleep.</p>
<p>Some queries can get very complex &#8211; far beyond the capabilities of my tiny brain. But I can insert and update records.</p>
<p>What I think would be nice would be a single statement that would update or insert a record, depending upon whether it already existed in the database table. An example is probably in order.</p>
<p>Suppose we have a very simple table <strong>customer</strong> and we want to insert a new row then we would have something like this<br />
<code>Insert into Customer (Surname, FirstName) Values("Smith","Bob")</code></p>
<p>Now let&#8217;s update that row.<br />
<code>Update Customer Set Surname = "Jones", FirstName = "Bill" where CustomerID = 1</code></p>
<p>Now here is the problem. When I get data from a database I put it into an entity object. I do it using an Object Relational Mapper, but it doesn&#8217;t really matter, you could write your own code and  build your own classes. Whichever way you do it you are going to eventually call a Save method, or something like it. The O/R mapper I use, and I think most operate in a similar way, knows when it tries to fill the entity whether it is a new entity or not. So, for example, if I write the following code in VB.NET<br />
<code>Dim Customer as new CustomerEntity(1)</code><br />
or in C#<br />
<code>CustomerEntity Customer = New CustomerEntity(1)</code><br />
then I get the entity Customer with data from the database for Customer with an ID of 1. Everything is fine. If the Customer does not exist then my Customer has a property <strong>IsNew</strong> which is set to True.</p>
<p>When I Save this customer the O/R mapper uses the Insert syntax if the Customer is new, or the Update syntax if the customer already exists, and it does this on the basis of the IsNew property.</p>
<p>But I run into problems if I use a collection of customers. When I fill a collection from the database there is no problem, provided I don&#8217;t want to save it. The problem arises because the collection can change. In the .NET world a collection implements the IList interface which means that a collection can be added to. So now the collection has no way of knowing whether each of its items already exists in the database. I might have a collection of, say 10 customers, and then I add 3 more. When I save the collection I don&#8217;t want to make three trips to the database, I want all three saved in one query. But I can&#8217;t do that because there is no way of knowing whether the Save method should call the Insert syntax or the Update syntax.</p>
<p>If there was one syntax used for both then life would be so much easier.</p>
<p>Of course the syntax would have to change a bit. I am thinking about how I would like it to look. More on that later, perhaps.</p>
<p>But anything to save unnecessary round trips to the database has to be a plus.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2007/04/27/changes-to-sql-please/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Dataset Is Not A Duck</title>
		<link>http://midnightmuse.com.au/2006/06/07/a-dataset-is-not-a-duck/</link>
		<comments>http://midnightmuse.com.au/2006/06/07/a-dataset-is-not-a-duck/#comments</comments>
		<pubDate>Wed, 07 Jun 2006 04:43:27 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/06/07/a-dataset-is-not-a-duck/</guid>
		<description><![CDATA[But it looks and sounds like a duck]]></description>
			<content:encoded><![CDATA[<p>I recently read an article by <a href="http://www.mindview.net/">Bruce Eckel</a> which was published in the very enjoyable and immensely entertaining <a href="http://www.amazon.com/gp/product/1590595009/ref=pd_bxgy_text_b/104-5544559-2295954?%5Fencoding=UTF8">Joel Spolsky&#8217;s <em>The Best Software Writing I</em></a> in which Bruce referred to Duck Typing.  Bruce was talking about Python, but I think the same thing applies to some other languages such as Ruby and Smalltalk, and perhaps some others. The point is that they don&#8217;t rely on strong typing, but rather Duck Typing, as in <em>If it looks like a duck and behaves like a duck then we can treat it like a duck.</em></p>
<p>Now that is all well and good, and I have no argument. But it got me thinking about datasets in .NET. Microsoft loves datasets, and so does almost every author of books about ADO.NET.</p>
<p>But not me. I don&#8217;t like them and I don&#8217;t use them. And I gather that I am not alone &#8211; it seems that many (is it most?) application programmers get their data using some mechanism other than datasets.</p>
<p>I am not going to delve into the mysteries of datasets here, suggest that they are necessarily evil, or even <shudder> propose that they are <a href="http://www.meyerweb.com/eric/comment/chech.html">considered harmful</a>, I will leave that to <a href="http://www.acm.org/classics/oct95/">Edsger Dijkstra</a>.</shudder></p>
<p>What I will do, however, is point out that the dataset is really a goose, camouflaged to look like a database, and therein lies the problem, or at least, one of its problems.</p>
<p>Microsoft describes the dataset like this <em>The DataSet is an in-memory cache of data retrieved from a data source</em>. And that is correct, that&#8217;s exactly what the dataset is. And the good thing about it is that, unlike its predecessor, the Recordset found in DAO and ADO, it is disconnected from the database.</p>
<p>So when you fill a dataset with data, usually using the DataAdapter&#8217;s Fill method, you are creating an in-memory copy of the table or tables, or relations contained in the database. But being a disconnected copy, anything you do to the dataset is not copied or merged into the database until you explicitly tell it to merge, using a merge method or something similar.</p>
<p>Now don&#8217;t get me wrong here. This is a <strong>Good Thing.</strong> Disconnected datasets are a much better way to deal with data than a connected recordset.</p>
<p>The problem is the duck. It looks like a database table, and behaves like a database table, and in fact, it is a table. But it is a copy of the table in the database at the time you created the dataset. If you keep it around too long the dataset and the underlying datatables can get out of sync.</p>
<p>Of course, this is a problem with all concurrent systems, but the difficulty with the dataset is that it looks so much like a duck that you can easily forget that it is really a goose.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/06/07/a-dataset-is-not-a-duck/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Code Generation</title>
		<link>http://midnightmuse.com.au/2006/03/13/code-generation/</link>
		<comments>http://midnightmuse.com.au/2006/03/13/code-generation/#comments</comments>
		<pubDate>Mon, 13 Mar 2006 06:35:26 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/03/13/code-generation/</guid>
		<description><![CDATA[<p>Why write code when you can have it generated automatically?</p>]]></description>
			<content:encoded><![CDATA[<p>Code generation, in some form, has been around for a long time. Computer Aided Software Engineering (CASE) Tools have been around for a long time.</p>
<p>In more recent times another bunch of tools have come onto the market. These are usually described as Object Relational, or O/R, Mappers. Here I will describe what these are and why you should use one.</p>
<p>I mostly program in Visual Studio .Net. Mostly in Visual Basic but occasionally in C#. Both of these are Object Oriented languages, although some purists would argue how well they perform in this area. For example, neither of these languages support multiple inheritance. Programmers who use Eiffel, for example, will tell you that multiple inheritance is a must for any Object Oriented programming. Others will suggest that inheritance of any type is overhypped. Dan Appleman, a Visual Basic guru from long ago suggests that inheritance is <em>the coolest feature you will never use.</em></p>
<p>My view is that inheritance is very useful, especially in form inheritance. Inheriting from multiple classes, though, is something that I don&#8217;t miss. One day I will have a go at Eiffel and maybe I will change my mind.</p>
<p>But all that is off the track. When you program in Visual Studio, in whatever language, you are programming in an Object Oriented environment, and it makes sense to utilise the features inherent in the framework, if they will work to your advantage.</p>
<p>Code generation, especially O/R mappers definitely work to your advantage. The basic premise behind these is that in an OO world it is easier to work with objects than raw data. It is also safer (see <a href="/2006/03/13/keeping-users-away-from-data/">this article</a> for reasons why.)</p>
<p>OK, see we should create objects, or entities, rather than program against the database. Can&#8217;t we do that without using an O/R mapper? Of course you can. But look what the O/R mapper does for you. Every database has CRUD operations &#8211; Create, Read, Update, Delete. You can write your own if you want to, and there is nothing all that difficult about doing that, but there is no point doing it if you can generate the code to do it in a matter of seconds.</p>
<p>But that&#8217;s not all. By setting up relations within the O/R mapper, or within the database managment system you can create the objects which span multiple tables. This way we are really dealing with useful information, not just raw data. Collections and Read-Only typed lists can be created, and the big advantage is that because the O/R mapper has been thoroughly tested, at least the reputable ones have, you are sure that the code works. Every time we write our own code we introduce the possibility of creating bugs. In this case usually from our typos, and we all make those.</p>
<p>Estimates I have read suggest that the data access components to the average business application can make up around 30% of the total code. So you eliminate 30% of your coding effort, 30% of the potentiality for introducing bugs, and, I also believe, that you simplify your design process.</p>
<p>O/R mapping is not a panacea, but I don&#8217;t find typing the most enjoyable part of software development. Anything that gives me a reliable and efficient method for removing the some of the drudgery from programming has to be a benefit, both for me and my clients.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/03/13/code-generation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keeping users away from data</title>
		<link>http://midnightmuse.com.au/2006/03/13/keeping-users-away-from-data/</link>
		<comments>http://midnightmuse.com.au/2006/03/13/keeping-users-away-from-data/#comments</comments>
		<pubDate>Sun, 12 Mar 2006 23:12:56 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/03/13/keeping-users-away-from-data/</guid>
		<description><![CDATA[<p>Just because a user owns the data doesn't mean they should be given access to it.</p>]]></description>
			<content:encoded><![CDATA[<p>I have been thinking a lot lately about keeping users away from their data. This isn&#8217;t something new that I have been doing, it is fairly standard practice especially in object oriented programming, it is just that I have been thinking about it more formally.</p>
<p>The first question is: why should a user be kept from their data? Perhaps a little history is in order.</p>
<p>Many years ago, and even today in legacy systems, large amounts of data was held on mainframe systems. In the commercial world this often meant writing programs in COBOL using an IMS hierarchical database or a relational database like CICS DB2. The database modelling was done with data flow diagrams and entity relationship diagrams. The programs would grab the data and display it on the screen in fields produced by a screen overlay generator. The user could edit the data or enter new data, and save it back to the database.</p>
<p>The whole procedure was data driven.</p>
<p>The along came Windows with its WIMP &#8211; Windows, Icons, Mouse, Pointer &#8211; user interface, and everything changed. Sort of.</p>
<p>Programs were now often UI, or User Interface, centric, rather than data centric. This sounds like a good thing, after all, shouldn&#8217;t application developers concentrate on their users? Of course they should. But this approach led to designing the user interface first and then, more often than not, fetching the data to display in the various controls on each form. So, in effect, it was no better than the data centric approach, in many cases.</p>
<p>The other problem with UI centric design is that often the users really don&#8217;t know what they want. This isn&#8217;t because users are not very bright, many of them are extremely bright. Nor is it because users are not programmers. I know that whenever I build an application for myself &#8211; and many/most programmers build their own tools from time to time &#8211; I am never sure what I want at the outset.</p>
<p>So then, how are we to build applications?</p>
<p>If we get back to the original premise of this article which is that users should be kept away from their data and explain why, then we will be in a better position to answer this question.</p>
<p>Data is the most valuable of any computer application. Indeed it is often the most valuable asset in the business.Valuable assets need to be protected. The easiest way to protect them is to keep them under lock and key. So it is with data. It is valuable so keep it locked away. I also keep it locked away from the prying eyes of developers, including myself, but that is a subject for another time.</p>
<p>The reason I keep it locked up and not accessible to users is that they don&#8217;t need to see it nor do they want to see it, and if they did see it they probably wouldn&#8217;t understand it. Perhaps I should explain.</p>
<p>Users are not interested in data, they are interested in information. The common example given in all the books on programming is that of a customer who places an order. The user is interested in the order, and then invoice that is sent when the order is filled. But if the data is kept in a relational database then there will be tables for perhaps: Customer, Order, OrderItems, Products, Invoices, InvoiceItems, and maybe some tables to take care of many-to-many relationships. The user doesn&#8217;t care because the user isn&#8217;t a relational database analyst. If you want to test this then the next time you are at a dinner party try to start a conversation on relational databases. Two things will happen. No one will talk to you, and you won&#8217;t receive another invitation to parties, anywhere, ever.</p>
<p>The second reason that data should be separated from users is because of the old adage <em>Garbage in, garbage out.</em> This used to be the case but it is no longer acceptable. Garbage in is just not an option these days. Of course we cannot prevent users from entering inaccurate data. If the surname should be <strong>Smith</strong> and the user enters <strong>Jones</strong> then there is nothing we can do about it short of writing some validation that changes all instances of Jones into Smith. But I don&#8217;t think that would be acceptable.</p>
<p>But what if the user enters <strong>Smith</strong> into the Date of Birth field on the screen? We can prevent that. The question is where and how do we prevent it? With a purely data-centric view then we could attempt to validate at the database level and write some code in SQL. Now SQL is a very powerful language in the hands of the experts but really, life is too short to go down this route. The other problem is that you are accessing the database for no good reason, and this is a slow process, painfully slow if you are operating on a network.</p>
<p>So the validation must take place before the data hits the database. The usual place is in a Business Layer, although in a small application you can stick it in the presentation layer. And the validation is written in the same language as the application, a much easier proposition. If the validation fails then the user is informed and the database is never touched.</p>
<p>This leads to another point. Every application comprises a number of design decisions. One is Robustness vs Correctness. A Robust application will always try to do something to keep the application running. A Correct application will never return an incorrect result, and may even shut down if there is an error. Information can come from many sources, not just user input. How that information is handled if there is an error is a design decision, as is the type of validation to be done on that information before it is sent to the database.</p>
</p>
<p>There are many ways to keep data away from a user. You can build your own data access layer, and I have done this many times. Or you can use some form of Code generation. I believe that this is the best approach and it is the way that I build applications these days. I will talk more about code generators in another article.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/03/13/keeping-users-away-from-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ORM and Related Stuff</title>
		<link>http://midnightmuse.com.au/2006/03/09/orm-and-related-stuff/</link>
		<comments>http://midnightmuse.com.au/2006/03/09/orm-and-related-stuff/#comments</comments>
		<pubDate>Thu, 09 Mar 2006 01:28:39 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/03/09/orm-and-related-stuff/</guid>
		<description><![CDATA[<p>Moving data through an application has always been a problem.</p>]]></description>
			<content:encoded><![CDATA[<p>I have said before, and I will undoubtedly say it many times again &#8211; every application uses data. Most business applications use very formal data structures usually via a relational database.</p>
<p>There are a number of ways of looking at applications. The two most common approaches, at least in years gone by, are UI centric and data centric. The UI centric view looks at the application from the point of view of the User Interface. The data centric model considers the data to be the most important part of the application and builds around it.</p>
<p>Neither of these approaches is intrinsically wrong, but too great an emphasis on one over the other can lead to problems in the design of the whole application.<br />
By concentrating on the User Interface there is the danger of designing the database to fit in with what the User wants to do. Conversely, by concentrating on the the database the user may be experience difficulties in what they wish to accomplish because of the constraints of the database.</p>
<p>This is, probably, rather obvious. The user has certain goals which they need to meet. For example, the user may wish to create and send an invoice. But the data needed to create the invoice is likely to be contained in a number of tables: a Customer table, an Invoice table, an Orders table, and so on. The problem for the application designer is not how to build a User-centric interface and a data-centric data layer, for that is not really a problem at all, but rather how to build the transport layer between the two.</p>
<p>In the object oriented world this is usually accomplished by building a business layer which sits between the UI and the data, a typical 3-tier system. And, again, this problem has essentially been solved, and solved in many ways.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/03/09/orm-and-related-stuff/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GUID or Autoincrement?</title>
		<link>http://midnightmuse.com.au/2006/03/01/guid-or-autoincrement/</link>
		<comments>http://midnightmuse.com.au/2006/03/01/guid-or-autoincrement/#comments</comments>
		<pubDate>Wed, 01 Mar 2006 01:26:13 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/03/01/guid-or-autoincrement/</guid>
		<description><![CDATA[<p>Choosing the right type of primary key for your database tables affects the way you program.</p>]]></description>
			<content:encoded><![CDATA[<p>Every commercial program I have written, except one, has used a relational database. The exception had a very small amount of data which I kept in an XML formatted file. That&#8217;s fine for very small amounts of data, and in this case it was the sort of data you would normally find in a configuration file.</p>
<p>Many years ago almost every application had an initialization file called myprogram.ini, or something similar. This contained all the data needed to run the program. Things like the location of the database, and perhaps some user-defined settings, such as windows sizes, last opened document etc. The ini files were easy to read and write and could be accessed through a couple of Windows API calls which would read and write the Private Profile String for the application.</p>
<p>This all changed and Microsoft recommended that we read and write Registry entries. In the main I ignored the recommendation and continued to use ini files &#8211; mostly because I was used to them, and they worked just fine.</p>
<p>In the .NET world Microsoft has reverted to text files, but instead of calling them ini files they are now Application Configuration Files and are stored as XML documents. This time I have accepted the recommendations because app.config files are easy to read and write, and they don&#8217;t require any API calls. Everything is done through managed code and the world remains a happy place conducive to producing contented programmers.</p>
<p>But when you are progamming against a database it doesn&#8217;t make sense to convert a relational database into XML. You need to use a relational database but you then run into the problem of what to use as a primary key for your tables.</p>
<p>I wrote some time ago about using <a href="./index.php/articles/database/Surrogate_Keys">Surrogate Keys</a> as the primary keys for database tables. In many cases they are very useful, but what should you use as the surrogate?<br />
I wrote then that I used autoincrement fields when using MS Access databases, but I have had a rethink about this practice.</p>
<p>First off, there is nothing wrong with using Autoincrement. In Access they are 4 bytes, or long integer type, so you can have up to 2 billion distinct records. That is more than enough for an Access database. You would be looking at a more substantial database long before you got to that many records. And, in any case, the types of programs that I write are for companies with nothing like that many records.</p>
<p>Autoincrement fields are also very easy to use. So easy, in fact that you can almost forget about them, you certainly can when creating new records. When you want to read a record it is retrieved using the long integer type in Access, or the Integer or System.Int32 type in .NET. Very straightforward.</p>
<p>But there is problem. In fact there are two problems. The first being that you don&#8217;t know the value of the autoincrement. Access doesn&#8217;t return the value. I have read that there is a way around this, and that as of JET 4.0, which is what I am using in .NET there is a value called @@IDENTITY, similar to that returned by SQLServer, and many other database systems. However, I am not sure that it is available outside of a data adapter, and I often don&#8217;t use them because of the overhead.</p>
</p>
<p>In any case, that problem is really just a symptom of the larger problem, and it isn&#8217;t confined to Access, but applies to any database which generates a new number to be used as a key. The problem is that the value is not known until the record is added to the database. And this is a problem in many applications.</p>
<p>An example will provide an explanation of the difficulty. Let&#8217;s suppose that a customer places an order for a number of items. You create a new order. If the application needs to track where items are sent, then there will be a table with the complex primary key of OrderNo and Stock Item No, or something like that. First we need to generate the Order record so that we get a new Order Number, and that number is used to create the Order-Stock record. Herein lies the problem.</p>
<p>If the user decides to cancel the order this is what happens. The application creates a new order so that an order number is generated. An Order-Stock object is created. It does not get saved until the user commits the changes, and in this case it will merely be destroyed. Another call is made to the database to destroy the just created record. And until the database is compacted it has still grown. And there were two unnecessary calls to the database server.</p>
<p>The problem is that users expect that deleting a create prodedure will restore them to a pre-existing condition, but this results in wasted database calls, creation and deletion of records needlessly, and hence additional coding.</p>
<p>The answer, in these cases, is to use GUIDs or Globally Unique Identifiers as the primary key. But these should never be exposed to the user. They are 16 bytes in size and look like gibberish. But they are generated by the application. They may be a little fiddlier to code, but if you do all your data access through data layer then you aren&#8217;t doing this coding very often. So far they seem to be the answer to my problems.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/03/01/guid-or-autoincrement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Second Normal Form</title>
		<link>http://midnightmuse.com.au/2006/01/08/second-normal-form/</link>
		<comments>http://midnightmuse.com.au/2006/01/08/second-normal-form/#comments</comments>
		<pubDate>Sun, 08 Jan 2006 00:43:26 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/01/08/second-normal-form/</guid>
		<description><![CDATA[<p>Putting your data into Second Normal Form is the second step in normalising a database.</p>]]></description>
			<content:encoded><![CDATA[<p>In a previous article I discussed <a href="./index.php/articles/general/First_Normal_Form">First Normal Form</a>. I have also mentioned <a href="./index.php/articles/general/The_key%2C_the_whole_key...">elsewhere</a> that the shorthand version of database normalisation is to ensure that the database is <em>dependant upon the key, the whole key, and nothing but the key, so help me Codd.</em> First Normal Form is ensuring that the data is dependant upon the key.</p>
<p>
Second Normal Form, or 2NF, is ensuring that the data in any table is dependent upon the whole key. That is, a table, or stricly a relation, is in Second Normal Form if it is in 1NF (First Normal Form) and evey on-key attribute is dependent upon the primary key.</p>
<p>
The classic example of this is often given as a table for Parts provided by a number of Suppliers. A particular Part can be supplied by a number of different suppliers, so the key to the table is a composite key made up of Part Number and Supplier Number.</p>
<p>
What is important to note is that only data that is dependant upon both elements of the key, that is Part Number, and Supplier Number, belong in the table. You would not put Part Description in the table, because that is only dependent upon the Part Number. Similary, the Supplier Address would be in the Supplier table. However, Price would go in this Parts_Supplier table because different suppliers are likely to chage different amounts for the same parts.</p>
<p>
In summary, when an item of data is dependent upon more than one key then it needs to go into a table with a composite key. But, to be in 2NF all the data needs to be dependent upon the entire composite key.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/01/08/second-normal-form/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Which database should I use?</title>
		<link>http://midnightmuse.com.au/2006/01/07/which-database-should-i-use/</link>
		<comments>http://midnightmuse.com.au/2006/01/07/which-database-should-i-use/#comments</comments>
		<pubDate>Sat, 07 Jan 2006 00:41:34 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/01/07/which-database-should-i-use/</guid>
		<description><![CDATA[<p>All relational databases weren't created equal, especially MS Access.</p>]]></description>
			<content:encoded><![CDATA[<p>Nearly all my programming accesses databases. In fact, the only application I have written in the past couple of years which didn&#8217;t grab data from a formal database was a tiny application for a client, which, unlike Rome, was built in a day. But even this application used some data. It was just that, in this case, the amount of data was very small and was unlikely to change often, if ever. But there was the possibility that it might change and so it needed to be accessible and not hard coded.</p>
<p>
I can&#8217;t think of any reason, off the top of my head, to hard code data.</p>
<p>
So, for this application, the data was put in an XML file. There were less than a dozen nodes. As I said, it was a very small amount of data.</p>
<p>
But most commercial applications , and probably most other types too, rely upon a lot mor data than this. These days that data is usually kept in a relational database and the application has to have some mechanism for accessing that data. CRUD is the technical term &#8211; Create, Read, Update, Delete (although some people say the R stands for Retrieve) &#8211; and it is fundamental to most commercial applications.<br />
The question then arises <em>Which database should I use?</em></p>
<p>
There are many databases and RDBMSs (Relational Database Management Systems) on the market. Somehow you have to choose one.</p>
<p>
I have probably used MS Access more than any other, and for a number of reasons. First, I am used to it. I have been using Access since version 1.0, although I didn&#8217;t use it for long. It was awful. Version 1.1 was a big improvement, and each subsequent version has added something extra.</p>
<p>
The other reason that I use it is that my client&#8217;s use it. Most of my clients have Microsoft Office installed. They already know, or else I can show them, how to use Access data in a Word document or an Excel spreadsheet.</p>
<p>
There are two downsides to this. The first is compatibility. I have been programming in Visual Basic since Access was first launched. Each version of Access has its own version of the Jet database engine and they are not compatible. You can upgrade a database from one version of Access to the next, but going back is usually impossible. So if a client upgrades their version of Access and then updates the database then the application probably won&#8217;t work because it is using the wrong version of the Jet engine. This can be fixed by attaching the correct engine to the program and recompiling.</p>
<p>
The other problem is that the client can change the database from within their version of Access. You can secure the database with various protection methods but the bottom line is, the database &#8211 its structure and its data &#8211; belong to the client. They can do what they like with it. Who am I to keep them out?</p>
<p>
However, when choosing a database, I think the decision really boils down to the type of structure, the amount of data and the number of users. Generally, my view is that small is good for Access, big means a proper RDBMS.</p>
<p>
Isn&#8217;t Access a proper RDBMS? Well, no, not exactly. And herein lies the difference. Access is, in fact, a flat file database. If you look at an MS Access database you will see that all the data, the forms, the reports, the queries, are in one file. You can separate the files by having another database and attaching it the first. And a   number of Access applications are written this way. The data is kept in one file and the forms and reports in another.<br />
But even in this case the data is all in one file. Access uses what is called an Indexed Sequential Access Method, or ISAM, file. Notice the word <em>Sequential</em>. That means that if you want to read the file you start at the beginning and keep going until you find what you are after. That is horribly inefficient.</p>
<p>
So the file is <em>Indexed</em>. All you have to do now to find some data is look up the index, and this is much quicker, provided the index is arranged efficiently. Access uses a BTree, or Binary Tree to index data. A BTree cuts down on the number of searches you must make through your index file to find what you are after. As far as speed goes the number of searches is a logarithmic function of the number of items. This means that 100 items will take twice as long to search as ten, and a million will take 6 times as long. Note that the B in BTree stands for Binary, so the number of searches is based upon 2 rather than 10, but you get my drift.</p>
<p>
Proper RDBMSs are not ISAM files and the way that data is accessed is quite different.<br />
What this means for you is that the method used within a program to access data should be different depending upon the type of database.</p>
<p>
When I use an Access database I grab one record at a time. (I will often get a lot of records and fill a list or combo box.) If I want to get data from two tables I get the key to the second table with a read of the first table&#8217;s data, then I send a second call to the database to read the second table. There are occasions when I do it slightly differently, but this is the usual method.</p>
<p>
If I am getting data from say, SQL Server or MSDE, the Microsoft Desktop Engine, I am more likely to grab a bunch of data with the one call. For filling list boxes this isn&#8217;t any different than what I do with Access. But, if want data from more than one table I will construct the join in the program and let the RDBMS select the data.</p>
<p>
Updates in Access are nearly always done one record at a time. In SQL Server I am more likely to make changes to a dataset and upload all the transactions in one call.<br />
As I said earlier, for small databases MS Access is often the easiest way to go. But hopefully your business won&#8217;t stay small forever. As it grows your data, and hence your RDBMS, needs will change. To get the most out of your data you need to consider the most efficient way to process it.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/01/07/which-database-should-i-use/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First Normal Form</title>
		<link>http://midnightmuse.com.au/2006/01/06/first-normal-form/</link>
		<comments>http://midnightmuse.com.au/2006/01/06/first-normal-form/#comments</comments>
		<pubDate>Fri, 06 Jan 2006 00:36:47 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/01/06/first-normal-form/</guid>
		<description><![CDATA[The first step in database normalisation is to put into first normal form.]]></description>
			<content:encoded><![CDATA[<p>First Normal Form, or 1NF, is relatively straight forward. The idea behind it is that every attribute is atomic, that is, single valued, or, as it is often put, there are no repeating groups.</p>
<p>
For example, if you have a customer table within your database you would not put orders in the customer table because each customer, hopfully, has more than one order.</p>
<table summary="Customer Table" border="1">
<tr>
<th>Name</th>
<th>Order No</th>
</tr>
<tr>
<td>Jones</td>
<td>1</td>
</tr>
<tr>
<td>Smith</td>
<td>2</td>
</tr>
<tr>
<td>Jones</td>
<td>3</td>
</tr>
</table>
<p>
Obviously, there would be more data in the table. Probably columns for the customer address, phone number, perhaps customer number etc. But the point should be obvious. Jones has two orders, and so an order cannot be uniquely identified. If we select Jones we don&#8217;t know which order we are referring to. So the order numbers must be moved to another table. And doing that will bring the data into first normal form.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/01/06/first-normal-form/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Object Oriented Databases</title>
		<link>http://midnightmuse.com.au/2006/01/05/object-oriented-databases/</link>
		<comments>http://midnightmuse.com.au/2006/01/05/object-oriented-databases/#comments</comments>
		<pubDate>Thu, 05 Jan 2006 00:35:10 +0000</pubDate>
		<dc:creator>Richard</dc:creator>
				<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://midnightmuse.com.au/2006/01/05/object-oriented-databases/</guid>
		<description><![CDATA[<p>I have been reading a lot of articles recently dealing with the pros and cons of object oriented databases.</p>]]></description>
			<content:encoded><![CDATA[<p>Object oriented databases. You either love them or you hate them. Like much that goes on in the IT world, the battle lines have been drawn and it is fought out much like the crusades of the middle ages. It has turned into a religious war.</p>
<p>
I&#8217;ll tell you where I stand at the outset &#8211;  I don&#8217;t like them. Although I have to admit, that some of the reasons I don&#8217;t like them are difficult to put into words. It may be that I am old and cussed and opposed to change.</p>
<p>
Back in the 1980s (I told you I was old) I wrote a sort of object oriented relational database. I thought it was the ant&#8217;s pants of database design. The reason I did it was because of one of the tenets of OO design &#8211; encapsulation. I built a database structure where all the data <strong>and</strong> all the methods and properties associated with the data were encapsulated within the database. It made sense to me.<br />
Unfortunately, it didn&#8217;t work very well. Part of the problem was that I was just learning to write in C++ and that is not a language that you learn when you have a spare afternoon to fill in. I&#8217;m sure that the major problem was my bad coding.</p>
<p>
But even after tweaking my code to make it work better, or even just work sometimes, I came to the conclusion that encapsulation, at the database level, was not a good idea.<br />
If you are writing an application to sit on a desktop and everything will be used locally it doesn&#8217;t matter too much. You can mix the program with the data and no one will be the wiser. Until someone needs to maintain your code and sees the mess you have made and then you look like an idiot. But if it is all running locally then you can mix it up, it will run just the same. I don&#8217;t recommend this approach, if for no other reason, because it is almost impossible or at least very difficult (translation &#8211; expensive) to maintain.</p>
<p>
But once you have more than one user accessing your data the everything changes. The usual model is to separate the components into an n-tier application, where n is as big as you like, but if it is too big then once again you look like an idiot. What are the tiers? Usually there is a data layer, perhaps a business layer and an application layer. These last two can be further broken down if needed.</p>
<p>
And what normally happens is that the data layer sits on a server, or in peer to peer networking it will sit on one PC which acts as the server as far as the data layer is concerned, and as far as many other things which don&#8217;t affect us here are concerned. The application sits on each user&#8217;s computer. And the business layer? I have called it that because that is often how it is described, but, and remember it may not even exist, depending upon what it is and the size of the system, it could reside in one of a number of places. For example, one model for large enterprises with large applications is to have the business layer sit on a separate server. Enough of the business layer. It is not really relevant here and I won&#8217;t refer to it again.</p>
<p>
Assuming we are using an object oriented approach, the question then is, where does the encapsulation occcur, within the application or within the database? The former model uses, usually, a relational database. The application grabs the data and creates an object out of the data. In the OO model the application grabs the object from the database. You can see that both are doing the same thing, it is just a matter of where everything is going on, sort of. And you can probably see why I liken it to a religious war.</p>
<p>
I should add that there is much more to OO than just encapsulation, so if you are an OO zealot please don&#8217;t tell me I missed all the good stuff, like inheritance and polymorphism, they are more fun than sex and more exciting than a roller coaster.</p>
<p>
There are a thousand and one reasons why you should use an OO database, according to its adherents. I can only look at one. The claim is made that objects provide a more realistic model of the real world. That is, you can create objects in your application which are a much closer representation of the real world objects which your program is dealing with. This may be fine for some objects. One of the oft quoted examples is Comuter Aided Design, CAD. The objects which comprise a CAD drawing don&#8217;t easily fit into the relational model, but are perfect candidates for OO, or so the argument goes. That may be right. I have never programmed anything remotely resembling CAD or built a database to deal with CAD objects.</p>
<p>
Perhaps my thinking is coloured by the type of programming I do. Most of it is small to medium sized applications for small to medium sized businesses. What are their real world objects? A transaction, a customer, an order &#8211; these are the common objects they deal with every day. And encapsulating all the object properties just doesn&#8217;t work well.<br />
For example, if I wish to create a new customer within an OO framework using a relational database then I need a class, we will call it customer. The class has a constructor (it probably has quite a few) so that I can create a new object to represent the new customer. Within the customer class I have a method called, say, SaveNew which contains the SQL statements to add a new customer record to the customer table of the database. I could do exactly the same with an OO database. The SaveNew method would be part of the database structure. And this seems appealing. Isn&#8217;t adding a new customer a fundamental operation? Once I have built the database I never need worry about how I add a new customer. I merely call the SaveNew method which is contained within the database. How it does it remains hidden, only the call to the method is exposed.</p>
<p>
But that is exactly what happens by building a class and having the method contained within the class. The bigger problem arises when I wish to update a customer record. Once again I can build an Update method. And this works fine if I am updating every field. But in the real world, which we are supposedly modelling, this is not what happens. There are times when we only wish to update one or two fields within a record. It is then much easier to write the SQL call in the application and by-pass the customer class altogether.</p>
<p>
This is just one example. But the bottom line is that I cannot get around the notion that it is much safer and more efficient to separate data from its methods. I will continue to put data in a database, and the methods in the application.</p>
]]></content:encoded>
			<wfw:commentRss>http://midnightmuse.com.au/2006/01/05/object-oriented-databases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
