Now that the essential structure of this data entry scheme is in place, I'm ready to start actually pulling the records into the database. Or am I? Let's think about this for a minute. The script as it exists now geta a number of records from the user, writes them to a file, and then loops back t do the same thing all over again. Obviously, with this structure we're not going to have more than one user entering at the same time ... I'd have to make the script pause or they'd be stomping all over that output file before the records were actually in the database. Even if I were to do something really clunky and have users run slightly different versions of the script that specify different destination files, there is no guarantee that the records would be saved before another iteration of the script being run by any given user came back to the point of writing the file. I could have the script actually insert the records, but as I've discussed before and will again shortly I've decided that I don't want to place the overhead of waiting for that to happen on the executioin of the script. (Put another way, I don't want the user to have to wait for the addition to the database to finish.) What I should do is figure out some coherent scheme for writing a unique file each time around.
To do that, it is clear that the script is going to have to dynamically generate the filename under which the records will be saved each time around. The wrinkle that this adds to the script lies in the operation of taint checking. If you recall, taint checking attempts, as much as possible, to make sure that values not generated by the script cannot be used by a malicious user. The process of taint checking essentially involves putting a level of perl insulation on top of the submitted value so it can't be used to execute a command on the server. I may get further into the actual process of taint checking at some point, but for now what we want is a perl mechanism to specify how things should be written in a fashion that co-exists with taint checking.
The scheme ultimately used to assign the filename is the third I've tried; I'll go through the first two just as an illustration of the path development can take. My first shot involved reading the IP address in the REMOTE_HOST environment variable and using that as the basis for the filename. Since this comes from the environment, and is external to the script, I ran into difficulty with taint checks when I attempted to create the filehandle (the hook to the specified external file that perl uses). This difficulty was ultimately resolved through the use of the FileHandle::Deluxe module, but as I worked through the problem it occurred to me that as it is possible to spoof an IP address in an http request it would be possible to store a command in that variable. While that possiblility might be relatively remote, most security vulnerabilities are based on problems far more obscure. Given that, I determined to adopt another approach.
My second approach involved simply using a counter scalar that would have been progressively incremented as new users accessed the script. This was and is a workable solution that I had trouble implementing for reasons I ultimately found to be attributable to problems with the Ralphzilla system. In the process of trying to resolve that problem I derived a third approach that I felt was likely to operate mote quickly than the counter approach, and the following discussion revolves around that one. I would suggest that sometimes things work that way. As you grapple with difficulties in implementing some type of structure you find yourself approaching it from different angles, and it is in such contexts that you derive more sophisticated and elegant approaches that you feed back into your overall way of doing things. I know this sounds silver-lining optimistic, but the times when the straight-forward solution you had envisioned does not work as expected are also the times when you are forced into looking at things from different angles. If you can refrain from letting anxiety dictate your approach in such circumstances, you'll find that these are the times when your overall productivity makes the most dramtic improvements. It just doesn't feel like it at the time <grin>.
|
#!/usr/bin/perl -wT
##in essence, this script creates a scorekeeping interface for the baseball database BEGIN { # Set the DISPLAY variable to the name of the local machine # where the debugger window and web browser appear. $ENV{DISPLAY} = "mymachine:0" ; } ##set up modules and pragmas use strict; use CGI qw (:all); use FileHandle::Deluxe qw (:all); ##instantiate a new CGI object and retrieve key parameters ... if there is no action paramter, execute the get_session subroutine my $forms=new CGI; my $item; my ($action,$session_id,$session_file,$pass); $session_file='/home/www/sessions'; if (! $forms->param('action')) { get_session(); } elsif ($forms->param('action')) { $action=$forms->param('action'); $session_id=$forms->param('session_id'); $pass=$forms->param('pass'); } |
Now obviously the structure of the beginning of the script must be modified to incorporate the use of some sort of session identifier. From the perspective of this script and the way it operates I could just as well have generated a seperate session identifier each time a screen of data is entered, but in most ciecumstances it would be appropriate for the entire entry session to be regarded as a single entity. So my approach takes that as a design requirement. |
|
sub get_session {
$action='start'; $pass=1; my ($session, $line); $session=FileHandle::Deluxe->new($session_file,append=ɭ); $session->autoflush(0); my $session_test=0; while (! $session_test) { $session_id=int(rand(100000)*10000000); if ($session !~ /^\s$session_id/) {$session_test=1;} } my $out_line=$session_id."\n"; print $session $out_line; $session->close; } |
In any event, an early line in the previous incarnation of the script assigned
the value "start" to the scalar $action if the CGI parameter "action" was
undefined. In this script that condition triggers the execution of the
get_session subroutine, which is of course new to this implementation. As I
began to work on this version I realized that while it is desireable to
maintain the state of the session throughout the entry process it is also
desireable to write the records from each pass into a file with a unique name,
because that gives me the most flexibility in how I can go about entering the
records stored in these ascii files into the database. The ramifications of
that will become apparent shortly, right now I just want to explain why the
second line of the subroutine initializes the scalar $pass with the value "1".
It gives me a way to distinguish any given entry screen from the screen before
it and the screen after it. After initializing the scalars $session and $line
the script uses the FileHandle::Deluxe module to create a filehandle object on
the file that holds session id's, with append rights, which means that, if it
is written to, the referenced file will be created if it does not already
exist. As the scalars holding the filename and the data to be written to this
file are generated within the script, they are not subject to subversion.
Therefore, the script could use the standard form of filehandle creation
without complaint from the taint-checking pragma, but since I use the module
elsewhere in the script I use it here for consistency. After creating the
handle I disable buffering on the filehandle by setting the autoflush attribute
to 0. What this means is that data will be written to disk as the print
statement is executed, rather than being buffered in memory and only written
periodically or when the filehandle is closed. If you think about this a
little the reasoning will be apparent. The session id is used to maintain the
state of an individual browser session. As each client starts a session it
will execute this subroutine to generate a unique id for that session. If
buffering is not turned off, it is possible that the same id will be generated
twice, and if the first is not yet written the subroutine will not catch that.
As you'll see shortly, the possibility that this will occur is
extraordinarily
unlikely, but that is the way many software bugs get created ... they are
event structures that are very unlikely.
After creating the filehandle object, the script initializes the scalar $session_test to the value "0". This scalar controls the small while loop that actually generates the value to be assigned to the $session_id scalar. Within that loop, which executes whule the value of the $session_test scalar is "0", a value is assigned to the $session_id scalar by calling the perl rand() function with an argument of 100000. This generates a random fractional number between 0 and 100000, providing the basis of what is stored in the $session_id scalar. As we'll see in a bit, the records are written into a scalar whose name is constructed primarily from that scalar. Since the generated random number is of the form "99999.9999999", the resultant file was named something like "recs99999.9999999.txt". Now a file named like that is perfectly legal in the linux world, as indeed it has been in the windows world since long filenames became supported. Howver, as I got deeply into the script that reads these files and inserts the records, which I'll be talking about in a little bit, I had some difficulty with the portion of the script that reads in the file names and processes them. Thinking that perhaps something I was using was having difficulty with a file name that included two periods, I multiplied the generated number by ten million an converted the result into an integer (albeit a pretty big one) as the session_id. I have left it in this form despite the fact that I ultimately resolved the culprit in that context to lie elsewhere. |
I have related that story primarily to suggest that such steps are perfectly appropriate to take when trying to diagnose a problem. One of the first things I look for when I'm trying to resolve a difficulty are elements of the construct that differ from the norm in some regard. It is not at all unusual to discover that an element of the software environment upon which an application rests has difficulty with a specific technique employed in the application. Furthermore, while that difficulty may be resolved by a later version of whatever package had that problem, that later version may be dependent on a software library that creates conflicts with some other component of your system. To generate an example pertinent to Ralphzilla, it's as if an upgrade to fix a problem required a library that did not respond appropriately to requests sent to it by a component of the mosix suite. In the Windows world, applications running on versions prior to Windows 2000 frequently create problems for each other by installing custom versions of common system libraries that don't implement a given function in the manner other applications expect it to be implemented. (Windows XP and, to a lesser extent, Windows 2000, resolve that issue by implementing mechanisms in which multiple copies of those files can co-exist, but that's another discussion.) This sort of thing is absolutely possible in the software world, and perhaps even more so in the open-source part of that world, which relies on cooperation between entities that are "loosely-coupled". While open-source advocates have an extensive set or arguments suggesting that their model of development will tend to evolve more dynamically, it is nonetheless true that for any given permutation of components there is a potential that a specific version of a component will have some level of trouble with something you are trying to implement. Just keep that in mind. It is quite possible that when you run into a problem, it is not due to a flaw in your work. Break the construct you've created into smaller components, and make sure that those elements are functioning appropriately, that what is returned from a given function call is what you expect to have returned. You may have to develop an alternate expression for what you are trying to do.
After generating the $session_id scalar, a pattern match is executed within an
if statement, determining whether that specific $session_id has previously been
used and stored in the file referenced by the $session filehandle object. The
specific statement used says, in effect, to look through that entire file for
an occurence of the string held in $session_id, and to look for that string on
a single line. If the pattern is not there, store "1" to the $session_test
scalar, which means that it is okay to continue. Perl's pattern matching and
regular expression capabilites are a powerful feature of the language, and more
than a few books have been written specifically on that, so I won't attempt an
extensive treatment of the subject here. Take a look at the links page, I'll
find some good references and post them there. Very briefly, the pattern match
operator in perl is "=~", which effectively translates into "contains the
pattern following". The negation of that, "does not contain the following
pattern", is expressed as "^~", and is what I use here. You may note that I
originally set the scalar $session_test to represent a condition that would not
let the execution of the script out of the loop, and require that the pattern
match not find a $session_id before it can continue. I could have set that
value to "1", and have a positive result to the pattern match set the
$session_id to "0", the logic of either statement is equivalent to the other in
a mirror image fashion. I generally prefer to construct key conditional tests
in this manner,
because in more complex pattern matches there may be ways in which an
expression framed in a way that is just slightly wrong could return a false
positive result. It is also easier to debug a context in which the script in
which execution is never let out of a loop than it is one in which it is always
let out of that loop. You may never recognize the latter ... I guarantee you
will
always
recognize the first.<grin>
Obviously, this pattern match and the file holding used session ids is only relevant as long as at least one other user is accessing the system, but there is no ready way to determine when a given session id is not longer active, beyond constructing some sort of system for recording the time of the last exchange associated with that session id and making some assumptions about the appropriate time-out value for a given session. Given the large number of potential session ids, it would be far less clunky simply to schedule a cron job (an automatic job run by the linux system) to delete this file at a time when no one would be using the system, perhaps at 4am each morning. A cron job like this could ultimately have a wide range of system clean-up functions to perform, I wouldn't be surprised if I revisit the concept sometime down the road. In the absence of that, you could simply manually delete the file before the system came in use for a given game.
Back in the main body of the script, we now have scalars holding a unique session id, an indication that this is the first pass through the script, and the value "start" in the scalar $action. Just as in the previous incarnation of the script, that value in $action will send the script into the sel_form subroutine. While in this circumstance I could as well have used the value of $pass to do that, I want to retain as much of the structure from the previous incarnation as possible, and using $action represents a cleaner implementation of the main body event handler.
In this incarnation, the sel_form and the get_form subroutines are very much
the same as they were in the previous version, the only real difference being
that each ahs lines printing hidden fields holding the values to $session_id
and $pass back to the client. The store subroutine, however, has been
substantially modified.
sub store {
$forms->delete('action');
$forms->delete('pass');
my $file='/home/www/save/recs'.$session_id.$pass.'.txt';
my $recs=new FileHandle::Deluxe($file,append=>1,safe_dirs=>['/home/www/save'],
lock=>LOCK_EX) or die "can't open file";
At this point I construct a scalar to hold the name of the output file, and use
that scalar to create a filehandle object on a file of the specified name.
Recall that there were two primary considerations that drove this iteration of
the script:
1:
Since it is possible that the application will be used in a multiple-user
context, the file save by each user should have a unique name. It was this
consideration that led to the incorporation of the session id to uniquely
identify any given entry session.
2:
Given that I decided to maintain a single id for any given session, but didn't
want to assume that the file would be gone by the time a new file was ready to
be written, I needed a way to give the file a slightly different name each time
around while maintaining the ability to group the files from any individual
session. (That grouping has no bearing on this iteration, but I wouldn't be
surprised to see it surface as a feature somewhere down the road. This led to
the development of the $pass counter.
In the next line the $recs filehandle is created as a FileHandle::Deluxe object. This is precisely the kind of context for which taint-checking was devised, because the screen is accepting input from the user. The FileHandle::Deluxe object creation is specifying, in essence, how the stream of data written to that object should be handled. In this case, I am saying that the object should be associated with the file specified in the $file scalar, that it should be opened in append mode (i.e., with read, write, and create permissions), that the directory "/home/www/save" should be considered safe, and that the file should be opened with an exclusive file lock. In general, what is being said here (among other things) is that the specified file, being written to a directory considered as safe, should be considered as safe from the standpoint of taint-checks. The exclusive lock that I place on the file has a different purpose, that I'll get to as I discuss the script that actually stores the records.
Following filehandle object creation, the subroutine proceeds much as it did
before. You may notice that I've changed the format of the output record from
the comma-delimited form used previously to one in which the data elements are
delimited by the string "ZzZ".
my $line=$ecj."ZzZ".$rcj."ZzZ".$pcj."ZzZ".$ercj."ZzZ".$etj."ZzZ\n";