OK, let's start working through this. Obviously, in baseball we have teams, and one of the things we said that we were interested in is constructing some sort of report to send to each team's manager. In the more formal language of the development process, this would be called a system requirement. Therefore, we're going to have some sort of record with the fax number and the manager's name. This might lead us to a structure something like this:
| team | character | 30 |
| fax_num | character | 10 |
| manager_fname | character | 15 |
| manager_lname | character | 15 |
| team_id | character | 3 |
| team | character | 30 |
| fax_num | character | 10 |
| manager_fname | character | 15 |
| manager_lname | character | 15 |
| game_id | char | 3 |
| date | date | |
| home_team | character | 2 |
| away_team | character | 2 |
| home_ump | character | 30 |
| first_ump | character | 30 |
| second_ump | character | 30 |
| third_ump | character | 30 |
| player_id | character | 5 |
| player_lname | character | 15 |
| player_fname | character | 15 |
| player_id | character | 5 |
| game_id | character | 5 |
| team_id | character | 3 |
| postion | character | 15 |
| at_bats | integer | 2 |
| singles | integer | 1 |
| doubles | integer | 1 |
| triples | integer | 1 |
| home_runs | integer | 1 |
| walks | integer | 1 |
| on_by_error | integer | 1 |
| errors_committed | integer | 1 |
I'll jump right to the final structure for the database without going through all of the intermediate steps. You can probably see that in each iteration above we were, in effect, refining the granularity of the individual record. What we are really doing is going to the finest level of detail we can reasonably expect to maintain, leaving the task of storing a retrieving the data to the computer. After all, that's what it is for. The irony about this is that a fine levels of detail are generally easier to program. Remember algorithms? Since the structure of our data is very fine, all of the hard stuff will be handled in the background. If you design the database appropriately, the code you write will be lean, clean, and much easier to maintain. Hopefully this will be obvious as we get into later sections. This is a table-driven system, so the information about the system is maintained by tables in the system.

In this structure the table around which the application revolves (the participant_game table) is the one which maintains the discrete events associated with the subject area, the ball game. (You should recognize that this statement is purely to help you visualize the structure of the database. There is nothing inherent in the nature of a relational database system that requires anything like a central database forming a hub for the database structure. It is, however, a fairly common and very efficient structure for certain types of data, and many large systems are aggregates of many subsystems of similar structure.) The fields (or columns) in table participant_game are shown at right.
. The first, key, is simply a unique identifier for any given row in this table, and as such is known as the primary key for the table. (While it is possible to have primary keys that allow duplicate values, this is considered to be very bad form. I doubt that we will get sufficiently deep into relational theory here to show you why this is the case. You can either take my word for it at this point, or research the issue yourself. Both are very wise courses of action. (grin)). In a context like this, it is perfectly acceptable to use an auto-incrementing field to automatically generate a unique identifier, but I personally prefer to create unique keys by concantenating (sticking together) several of the identifying columns in the table. My rationale for this is simple ... sometimes you find yourself in a situation in which the records in a table are not stored in the order in which you need to use them, and when that happens you are going to have to append a condition to the statement that retrieves the records sprcifying the order in which you want them returned. If the clause by which the records are retrieved has several conditions it is going to be slower to execute than one constructed of just one condition. It's that simple. Therefore I try to anticipate the default order in which I might want to retrieve the records. For example, if I wanted to look at what one player did by inning I can just make sure the database has an index on the key and select for that participant_id. (This will be more clear as we get into later sections.) The net effect of that might be a small difference, depending on the manner in which the database engine implements the condition, but the gain is real. Within the constraints imposed by your hardware, the performance of any application is determined precisely in this fashion. In this instance, the value stored in the column key is constructed by concantenating the values in the columns game_id,inning, and inning_order.
You will probably notice that six of the ten columns in this table are coded. In the context of this table these columns represent what are known as foreign keys, allowing a relationship (a join) to be created with another table. Most of the tables in the system are just look-up tables, holding descriptions of the coded value. Why do this? Beyond the savings of space associated with storing a 2 byte code instead of a 30 byte description several hundred thousand times, retrieving records based on that two byte code is much more efficient than retrieving them based on a 30 byte description, even if it is always spelled consistently.
Yoy may haved noticed in the paragraph above that I went from talking about players to talking about participants. If you also noticed that the manager's name is not longer included in the team table you probably picked up on what I'm going to do. In this structure, you would include individuals like managers and umpires in the participant table, with the role they are assume in a given game indicated by the role_code in the relevant records in the participant game table. If you think about it a little bit you'll be able to see the pwer of this approach. Virtually nothing about it is hard-wired to the characteristics of the Mega-Monsters Major League. Since players are only associated with teams in the participant_game table, participants can move from team to team between games, or even change status from player to umpire to manager in three different games. As a result the system would be well-suited to an organization that runs leagues with several age classes. You may notice that I put columns home_team and away_team in the game table rather than completely normalizing the design with a game_team structure. In essence, since there would be only two records per game in such a table, normalizing to that extent would return no benefit. The point at which you stop normalizing generally seems to be a matter of personal preference. I tend to take it further than many, my own perspective is that a normalized design has sufficient advantages in flexibility to override whatever advantages a less-normalized design has for summarization and analysis. My feeling is that I can always create scripts to create less-normalized structures to facilitate such activities, but one creates clunky, inefficient program structures when ou use a poorly-normalized database to store your transactions. I can cope with the first by leaving a p75 executing scripts all the time. The price for the second lies in applications that are slow to execute and difficult to maintain.
Other elements of the database design will become more clear as we implement the application around it. The script bbdb_create will create the tables in your database. Take a look at it now, you'll get more details on its operation as we begin the section on HTML Documents and Simple CGI Scripts. (As a take-home exercise, can you spot the table created in that script that does not exist in the database structure illustration? Hurry now, the clock's ticking.)