Document Home

Previous - More Complex CGI

Single Form Example


The way I'm going to go about doing this is very similar to the strategy I employ in this document as a whole ... I'm going to start with a minimum piece and progressively wrap it in larger structures, until we can reasonably say that we've satisfied expectations for recording a game.


In all honesty, I've followed this approach because the more complex structures I started with did not function precisely the way I wanted and expected them to. As I wanted a solid structure underlying the application I kept ripping elements away until I ultimately started with a blank page and started building the underlying elements all over again. The time I've spent in this process has been modestly frustrating, but this is not at all unusual and you should not succumb to self doubt as you go through similar periods. Because you will.


One thing to remember as you develop CGI scripts that generate dynamic browser pages that that the script is executing on the server, and the output from such scripts is rendered as web documents on your desktop. The manner in which the script is executed is determined by the response from the browser in a fresh incarnation of the script. In other words, each time you enter information and submit a form on a browser page that is generated by a cgi script, the script is executed from the beginning, and the information submitted in the form, along with whatever other pieces of information can be extracted about prior executions of the script, is what is used to control the execution of the script. The process of doing this between executions of the script is called maintaining state, and is something of an art form in itself. You might think of it as setting little triggers in the environment on the server that will lead the script to execute in the manner you desire. Hopefully, this will become clearer as I move along.


In a sense, there are two levels on which you will ultimately want to maintain state information: within a session and between sessions (when the user has closed the browser and restarted the connection. The second is the context in what are called cookies are used. You may have heard them described as threats to internet privacy, but far more often they are simply small text files in which details of the way you want to connect to a given site are stored, freeing you from entering them each time you connect. At this point I'm primarily concerned with the first type of state management, but we will get into cookie management as the interface becomes more detailed.


Keeping that in mind, let's start with the most basic example of a dynamic cgi script, one to create the basic entry script that was illustrated on the previous page, and accept information from it. In it's entirety, the script can be found here.


#!/usr/bin/perl -wT
##in essence, this script creates a scorekeeping
interface for the baseball database
BEGIN {
# Set the DISPLAY variable to the name of the local machine
# where the debugger window and web browser appear.
$ENV{DISPLAY} = "mymachine:0" ;
}
As I discussed earlier, the first line in the script invokes the perl interpreter, in warning and Taint modes, and the second line represents a short description of what the script does. I've left the begin block that follows in to introduce you to the way I currently go about debugging cgi scripts. Since the script is running on a remote machine, the display has to be redirected to the machine at which I am sitting. Begin blocks in perl are used to define special environmental conditions for the perl compiler. When I come to the point of discussing security in more depth some of what will be done will involve begin blocks to close down aspects of the environment that could be exploited to take a greater level of access to the host that you would wish to grant to a browser user. In this script, the begin block simply sets the display environment variable to direct the output to my machine, in a manner similar to that used in the section on remote x-windows display. When I want to debug a script I change the switches in the shebang line to replace -T with -d:ptkdb. This invokes the ptkdb debugger, which is directed to the display on my desktop as a result of setting the display environment variable. The process of debugging scripts will be discussed in more detail in a later section.


##set up modules and pragmas
use strict;
use CGI qw (:standard);
use FileHandle;
The modules I use in this script are straightforward, but there are two considerations pertinent to the module environment that I'll mention at this point. First, I had some trouble getting the object interface to function reliably in some contexts. While I readily admit that this may have been attributable to the form of my implementation, I find it disconcerting when two parallel implementations using the same structure succeed or fail without discernable reason. Regardless, at this point the script will not be using the more advanced features of the object-oriented interface, so I'll proceed using the procedural interface. (As a side note, don't be afraid to follow a similar strategy if the situation warrants it. Make your stuff work, then work on getting fancier with it. Modules are written by people just like you, and something characteristic of your implementation may uncover a bug previously hidden. If you can implement the basic functionality required for a given portion of your application using the most basic elements from a given module then you have both a great deal of freedom in implementing enhancements and quite probably the most stable underpinning for your implementation, because it is generally in more sophisticated methods that bugs creep. Regardless, that is the reason that I specifically retrieve the standard set of methods from the CGI module. The second consideration relates to the use of the FileHandle module. (If the perl installed on machine hosting Apache in your environment is at or above version 5.6 this module is in the base distribution, which means that you don't have to install it manually.) In this script, and more extensively in the scripts to come, I use external files for various nefarious purposes. Running under strict mode, however, perl will not directly allow writing to an external file in the manner I used in the rsh cluster management script. It would be possible to create an indirect reference to a created file handle and perform taint checks on the data passed to it, but that is part of what the FileHandle module does. There are times when it makes sense to "re-invent the wheel" to deepen your understanding of what is going on, but something that could compromise the security of the system is not really the appropriate context for such experimentation


my $forms=new CGI;
my $test=param('ec');
if (!$test) {unlink '/home/www/recs' if -e '/home/www/recs';}
After the modules have been specified, I create a new CGI object and a test variable that holds the value of a CGI parameter. If that parameter does not exist, it must be the first time through the script, and old copies o the external file written by previous incarnations of the script are unlinked from the file system. As mentioned in the section on management scripts, this is a faster operation than removing the file. The information in the file will actually continue to exist until the space the file occupies is overwritten, but the information the script is storing is not confidential, and the script will be re-creating the file many time in any given session, making it unlikely that the space occupied by the file would stay intact for long.


my ($ec,$rc,$pc,$erc,$et); The next line initializes a set of scalars that will hold the results from the single screen after they are retrieved from the CGI object. If you think about what is being entered here, the names of the scalars should be readily decipherable: $ec will hold the event code, $pc the participant code, $erc the event result code, and $et will hold the entered event text.


print header("text/html");
print start_html(),
("<font size='5'><center>Scorecard Entry</center></font><br><br>"),
("<p><br><br>");

The next line in the script begins to send material back to the browser to draw a new page. If you pull up the html for one of the documents you created earlier it should be easy to spot the nature of what the script is sending back to the browser. In this script, as in the next few, I use a mixture of the procedures made available by the CGI module with simple print statements that redirect html back to the browser. The first two lines both use CGI.pm methods, header() and start_html(), but the second and third lines of the second print statement simply send back html. Two things are worthy of mention here. First, a single perl statement can span multiple lines. For all practical purposes, whatever constraints are placed on the length of statements in your perl code are placed there by your own self-restraint. (One recreation in the perl community is the development of code that is virtually impossible to decipher, by using strategies that combine illegible code with code that hides its true intent through misdirection. This is called obsfuscated perl. Also something of a tradition is the development of unique approaches, both visually and programmatically, to spelling the phrase "just another perl hacker". Called japhs, these are considered to represent a rite of passage in certain parts of the community. If you find such endeavors intriguing, the place to indulge in them is not in your scripts. Just as it is wise to extensively comment your scripts, it is a good idea to make them as legible as possible. Quite beyond the fact that it may fall to someone else to maintain your scripts, it is quite embarassing to return to a script six months later and find it impossible to understand.

At the same time, judicious use of multi-line statements can significantly improve the legibility of code. The appropriate grouping of the html tags and text returned to the browser via print statements can make the structure of the generated page far easier to recognize within the script, especially when combined with the use of comment lines and whitespace.



print start_form(-method=>"post"); The next line defines the beginning of the html form that will be used to accept data entry. In this script I use the CGI module's start_form() method to prepare and send the appropriate material to initialize the form. There are a number of options that can be used with the start_form() method, some of which are likely to get into subsequent incarnations of the application. This time around the only option specified is the CGI post method, the default for forms. (I specified the post method at one point when I was trying to track down the source of some problems I was having with the script and left it specified because it provides me with a good opportunity to discuss what it means.) It is appropriate to note that when post and get are referred to as cgi methods, this is something different from the methods, such as start_form(), that are included in the CGI perl module. CGI is an acronym for common gateway interface, which is the specification of how browsers and other clients are to interact with web server software. Get and post are methods at this level, and refer to the manner in which information is put into the string sent to the server. Put simply, when information is sent to the server via the get method, the variable portion of the message (what is specified in the form) is incorporated within the url string, while post appends information to the string. In general, the get method is used to retrieve material from the server (download files) while the post method is used to send data to the server. This is because the space available within the url string is extremely limited, while data posted to the server can be appended to the url string until the cows come home, for all practical purposes. The down side of a post submittal is that the information appended to the url string must be parsed into data elements and their associated values. You could write perl code, or indeed code in virtually any other language that the server could invoke, and use it to parse those strings. But you don't have to ... that is what the CGI perl module and its associated methods does. This little bit of confusion could have been avoided if the perl module had been named ralphie, so we then referred to the start_form() method of ralphie specifying the cgi post method, but most people would not have considered ralphie a very descriptive name.


print "Event Code:<input type=text name='ec' size='2'><br>",
'Role Code:<input type="text" name="rc" size="2"><br>',
'Participant Code:<input type="text" name="pc" size="4"><br>',
'Event Result Code:<input type="text" name="erc" size="2"><br>',
'Event Text:<input type="text" name="et" size="25"><br><br>';
After I start the form I fill it with one long print statement that simply prints the html elements back to the browser. This section illustrates the essential nature of what the web server does, which is redirect standard output from the server software to the client browser. Hence, the use of print statements. Acquiring a sense of the nature of that communication between the web server and the browser is much more important when generating dynamic pages than it is when writing static pages or writing scripts that are executed from pages that are at least in part statically defined, and this should become clearer as we proceed. The use of one long print statement, with each line in the form on its own line in the script and the lines split with commas. This is a good illustration of the point that I made earlier, that appropriate use of multi-line statements, well-formatted, can substantially enhance the legibility of code.


print submit(-name=>"store");
print endform;
print "<p><br>";
print end_html;
The submit button in this example is created using the submit method from the cgi module. The only option specified for the button when it is displayed. Pressing on the submit button signals to the browser that entry in the form is complete and that it should be posted to the server. Following the creation of the submit button the script signals the end of the form and the end of the html document to the browser with the endform and end_html methods.


$ec=param('ec');
$rc=param('rc');
$pc=param('pc');
$erc=param('erc');
$et=param('et');
Once the submit button is pushed the string returned to the server will have appended to it a string containing the names of the parameters and the values associated with them. The CGI module parses the returned string to make the values available to the script, and the next set of lines retrieves the values of the parameters and assigns them to scalars of the same name.


my $recs=new FileHandle('/home/www/recs','>>');
my $line=$ec.','.$rc.','.$pc.','.$erc.','.$et."\n";
print $recs $line;
$recs->close;

Neither this script nor the examples that immediately follow actually insert the entered values into the database. If you are designing a data-entry interface that is going to be amenable to rapid-fire entry, you don't want the user to have to wait for that process to finish. You probably want the user to receive some verification of successful entry and some sort of fail-over procedure that would allow entry to continue if the database server were unavailable. It is quite possible, for example, for someone to push a power switch, intentionally or not, turning off the database server machine, and database server systems have been known to abend (break and quit), although this is an increasingly rare occurrence with mature software. Therefore, the strategy I've adopted is to write off an external file with the data that has been entered. Once the data entry interface is functioning as I wish, I'll write a seperate script that will be fired off after the external file is written, to perform the appropriate actions to insert that data into the database. This is a particularly appropriate strategy to employ in a mosix cluster, where the spawned process will likely migrate to another machine. This is just a taste of things to come ... at this point all I'm worried about is writing the external file that holds the record that has just been entered.

As I mentioned earlier, the FileHandle module has been initialized in the script to provide a taint-checked interface to external file operations. The next line in the script creates a new FileHandle object holding a reference to a file handle. The two arguments used in the object instantiation are first, the name of the file on which an object is desired and second, the mode associated with the file handle. The mode determines the operations the operations that can be performed on the file with which the file handle is associated, in essence the permissions that the script has with that file. The mode specified (>>) indicates that the file can be created and appended to, but cannot be deleted. If you recall, early in the script I unlink the file from the file system (if it exists). My personal perspective is that a file that is going to serve as a temporary repository for entered records should be created anew each time through the script. (I suspect that the fail-over procedure in the insertion script will ultimately make use of an external file with a higher level of persistence, but that's different (grin).) In any event, unlinking the file early in the script removes any requirement that this operation be performed at the end of the script, thus removing the potential for deleting the file prematurely. Given, however, that the without waiting for the spawned process to finish, to be safe I'll copy it before running the insert script and run the insert script on that file. (In case you didn't notice, I just thought of that.) There will be other considerations we need to deal with concerning these external files as the application becomes more polished, such as how best to allow for multiple users entering simultaneously, but for now I'll simply write the file.

The next line in the script creates the line of records to be written to the external file. The $line scalar is created holding what is commonly known as a comma-delimited ascii record by concantenating the scalars that represent a record, separating them with a comma, and sticking a new line character to the end of the line. This form is read by virtually any software with the ability to import ascii files, though if the file were going to be distributed one would probably put a header line at the beginning of the file. The line is then written to the filehandle and the filehandle is closed. Ultimately, it is at this point that the file will be copied for processing and the insertion script spawned.



But wait a minute ... we're not going to want to enter one record at a time. For now, save this one to the cgi-bin directory of the web server and run it a few times. Open a telnet window on the web server and take a look at the data stored in the external file each time through. (The command "more /home/www/recs" will display the contents of the file.) Once you are sure that you have a feel for what is going on, the changes in the next section will seem like no big deal.



Next - Entering Multiple Records On One Form