Document Home

Database Design

OK, now let's think about this a bit. We've got baseball players, and they are on a team, and they play games, right? So let's start thinking about the teams and the players. If you are new to database design, and you look at this for the first time, you'd probably see a structure something like this:

Team Leaping Lizards
Manager Ralphzilla
Fax # 555-111-2222
Pitcher Gammera
Catcher Rodan
First Base Space Godzilla
Second Base Raymond Burr
... ...

But wait a minute, what happens if someone quits the team and is replaced? Do we just put the new player in the same slot? And how would we handle substitutions? What happens if Gammera's arm gets tired and he trades places with Space Godzilla? And how would we handle recording what happened in an individual game? We might well evolve the team record into a team-game record, in which the individual record would look something like the scorebook page to the right. But suppose you want to calculate the batting average for Raymond Burr, who is the team's utility player and might have played any position or batted in any slot in the order. We'd have to put together statements that searched something like this ... "where leadoff="Raymond Burr" or second_batter="Raymond Burr" or third_batter="Raymond Burr" ...". That's really clunky and there are about a gazillion ways to mess up a statement that lengthy. This, folks, is not the way databases are designed. That is not to argue that the scorebook page included is not representative of a very fine scorebook. The publishers, Triple E Productions, seem to have gone to a great deal of effort to put together a very legible scorebook in which it is easy to record a game. However, the way you would record a game is not the way you would store the information about the game. In effect, the scorebook page is the user interface to the records of the game. In fact, I wouldn't be at all surprised if at least one page in our application looks very much like a scorebook page.



There are two primary database design schemes used in serious database work currently, the object model and the relational. Postgresql is actually something of a hybrid, supporting some aspects of the object model in what is essentially a relational framework. A quick and dirty summary of the difference between the two is to say that the object model is oriented toward objects (duh), and the relational model is focused toward sets. Don't worry, this will be much more clear as we go on. In the object model, objects are members of classes that are capable of inheriting characteristics from the classes from which they are derived, but have some characteristic that distinquishes them from other members of that parent class. This is a very powerful model for describing a universe. Imagine for a moment that you are a naive observer of the universe in which a baseball game is being played, maybe an alien observing a complex of ball diamonds. You might be sitting somewhere, watching what is going on, and you see a bunch of creatures moving around, some large and some small. On that basis you might create the class "creatures". As time went on and you saw enough to differentiate, you might decide to call the really small ones "bugs", so you would derive a subclass of "creatures" called "bugs". After watching a little longer you might decide to call the ones walking around on two legs "people", and derive the appropriate subclass. As time went on you might notice that many around you were engaged in an elaborate organized pursuit that you recognized as a game and create the class "players". You can see where I'm going with this. At each step in the process, you identify characteristics that describe the subclass you've derived and distinquish it from the other members of the parent class. Here's where the art starts to come in, because it might take you a while to recognize that you shouldn't create another subclass for second basemen, but determine another way to characterize the position a player is occupying at any point in time. With time, however, you would be able to extend the characteristics of baseball players, and the game they play. Sooner or later you might even realize that it is inappropriate to include umpires in the parent class of "creatures"!


As you can see, the strength of the object model lies in its descriptive capabilities. If you think of how it is to describe a complex process on the level of what is actually happening the appeal of the model is probably obvious. Object-orientation extends far beyond databases, most languages have a degree of object orientation that provides that provides purists with endless opportunities for debates about the true path. If you can see how objects can simplify the modelling of a new domain in a database, you can see that applying the same technique to the realm of process description is even more compelling. It is a very common experience in application development to have the user population define new conditions and exceptions that apply to the operation as the development effort is underway, and in any event the the environments in which systems exist rarely remain static. Applications can break from the gradual accumulation of exception conditions grafted onto a structure that does not adequately support them. The ability to simply define a class of methods that inherit characteristics from their parent class but may execute in response to relatively rare conditions can ease this difficulty considerably.


The previous paragraph was primarily a preface to saying that there a relatively few truly object-oriented databases out there, but that virtually any language in widespread use has acquired some object features. That may change, but object databases require relatively more processing power to perform many of the set-oriented queries for which we use computers and therefore their adoption has been somewhat slow. When you think about it, that's not surprising. For the most part, the records we maintain haven't changed their basic nature. There may be a breakthrough application out there somewhere that will change that, but for the most part we already know the basic structure of the information of the information we want to maintain in the applications being developed. What generally concerns us more are changes in the way we use that information.


The primary database model in use today is known as the relational model. While there are conceptual similarities between the relational and object models in that both rely on grouping items on common characteristics, the relational model is more oriented toward describing the set of which an item is a part. Database design in a relational system is largely a matter of partitioning the data into those sets in a process known as normalization. This is not as difficult a process as it might sound. However, it does require a fairly high level of knowledge of the domain being described to be able to construct a model that you won't have to tear down and reconstruct. This is not necessarily a bad thing, because the points at which you have to tear it down represent the times at which your understanding of the nature of the data has reached a new level.


Next - Designing the Baseball Database