Document Home


Starting to Build Modules

Extending the Utilities Module

Samples for this page

In the discussion on the previous page I briefly touched on accessing the subroutines included in the BB_UTIL module from within the main namespace rather than having to specifically point to the BB_UTIL namespace. On this page I'm going to take virtually all of the subroutines from the insert script and move them into the BB_UTIL module, refine their access points, and set the module up so the names of the subroutines are visible in the main namespace. The only subroutine I will leave behind is the actual insert_recs() subroutine, which I intend to ultimately put into another module which will include subroutines involved with accessing the database.. I know that it sounds as if I've a lot on my plate here, but I think that you will see that while this step is not trivial, it is far from overwhelming.



The relative ease of this step, however, is contingent on the structure of the subroutines that I am moving, and the fact that they have well-defined roles and interfaces. I have not discussed this explicitly before, but in general you should expect that most of your subroutines should be less than ten lines long, excluding comment lines, and often less than five. The free_space() subroutine, a long subroutine, is really only seventeen lines of code, and I could easily trim three or four more lines at least if I thought it appropriate, and lines of code were the only measure of the efficiency of a piece of code. Although I am raising this issue, I also want to caution you against getting all wound up in counting lines of code. The nature of the language you are using will be the prime determinant of how many lines of code it takes to implement any given task. It would be quite possible to have a ten or twelve line subroutine in perl that does what would require 70-120 lines in nested functions in another langauge. A large part of the reason for that lies in the nature of perl, because much of perl is comprised of commands that themselves represent multiple lines of code in perl and other languages, primarily c. My point is rather that you should think of a subroutine as a single chunk of work with a well-defined set of inputs. For example, it would likely not be good form to have a single monolithic drive_to_the_store() subroutine, but a good implementation of drive_to_the_store() would have several calls to right_turn(), left_turn(), and stop(), each of which might well be a sing;e block of code. Don't get all nuts about this, a lot depends on context. The right_turn() subroutine, for example, would likely be a great deal more complex when implemented to control a robot than when used to return a basic set of directions. There will be times when you look at something you did the week before and think to yourself "I could have expressed that more cleanly." Sometimes it will be something you jump right into modifying and sometimes something you revisit when you get the chance. As a general rule, remember that the more actions a given subroutine performs and the more data it requires to operate, the more it should be considered a candidate for dissolution.



Making all or a set of subroutines from a module visible in the main namespace once a module has been "use"d is known as exporting the module's contents. While it is of course possible to use a custom implementation to achieve this functionality, in practice most modules use the exporter module to export symbols to the calling namespace. In a basic implementation, such as the one here, the following few lines are sufficient:

package BB_UTIL;

use Exporter;
@ISA = ('Exporter');
@EXPORT = ('free_space', 'hash_assign', 'number_config', 'err_print', 'write_recs');
After using the Exporter module the next line initiates the @ISA array. You may well not have seen this before, it is the way perl implements class inheritance. (It is my understanding that one of the major changes in the next majore release of perl, version 6.0, will be in its implementation of objects. This might mean that by the time I start to objectify some of this stuff I'll have the chance to implement different versions side by side, which ought to be a very illuminating exercise.) All the @ISA array does is specify the perl classes from which the module may inherit, which you might think of as an extended version of "use". The @ISA array can contain the names of a number of different packages, in this situation it has only the Exporter module. I am sure that at some point I will get deeper into the operation of the Exporter, but for now all I am concerned with is showing you how to make it work.



The strings added to the @export array, in the next line, represent the symbols to be imported into the calling namespace. As you can see, I am exporting all of the subroutines in the module as it currently exists. Obviously, some care must be exercised when doing this, lest a symbol be exported that conflicts with an internal perl function. In the insert script, for example, I make extensive use of split, and if a module I used were to export another split it would represent a major problem for the anticipated operation of the script. There is a way to do this intentionally in perl, when you want to change or augment the functionality of a given command This process is called overloading, and although it is generally used to augment the command's capabilities I can envision situations in which it might be used to disable. Regardless, the point I wish to make here is that care must be taken not to do that inadvertently. It is also not a good idea to clutter up the main namespace.



If you look through the subroutines I have added to the module, you will see that whatever changes are required to move them into the module have to do with their communication with whatever called them. As the first two, hash_assign and number_config, are called to deal with files external to the application, I do not have to worry about what is passed in, and only in the case of hash_assign do I have to worry about what comes out. Ironically, given my discussion in the previous section, in this circumstance the reference for the $hostref reference scalar must be explicitly returned.
		return \%hostnum;
(While there are ways around this, in general the additional considerations required, in concert with potential conflicts in the namespace, seriously outweigh whatever benefit might result from adopting a strategy simply to avoid having to explicitly return a value. And minimizing problems like that is what we are about here.)


The err_print() and write_recs() subroutines do, however, require somewhat more substantial modification.
sub err_print	{
	
	my ($error,$log)=@_;	
	my @time=localtime;
	my $year=$time[5]+1900;
	my $month=$time[4]+1;
	print {${$log}} $time[2].':'.$time[1]. 'on' .$month.'/'.$time[3].'/'.$year.':::'.$error."\n";
}
	
sub write_recs	{  
	
		my ($recs,$repository)=@_;
		my $recs_line;
		foreach $recs_line (@{$recs})	{

			print {${$repository}} $recs_line;
			
		}
}

In both cases, I can no longer print directly to filehandles, but must now print to references to those filehandles, because execution is leaving the main namespace. As you can see, I named the scalars that hold the references $log and $repository in a fit of originality. When printing to the filehandles they reference I noticed a little quirk about perl's print function. That is, I could print successfully to the de-referenced filehandle through the scalar, as in "print $$repository $recs_line;", but i consistently received the error message "Scalar found where operator expected ...". Now in another context I might have let that slide, since I could verify that the appropriate stuff was getting written, but rather than let you think that I had made a mistake I tracked the problem down on the web. What I found was that in that position perl expects either a bareword or a scalar directly holding the filehandle, rather than a reference to it. The steps required to dereference the filehandle work, but also apparently run afoul of the warning pragma, which is what generates this message. To print without getting this message, an extra set of brackets must be placed around the dereferencing $ sign, as in {${$log}}.


There is an additional wrinkle that I have used in write_recs(). If I am not mistaken, it was in the communication between this subroutine and main that I first started reading the contents of a file into an array and passing the array to a subroutine. In this version I am getting away from that. Why, you ask? Well, in this version I have to be able to get to both the array and the filehandle, as indeed was the case when the subroutine was in the main body of the script. However, there is no inherent structure to the argument array. Clearly, one either must remember the order of the items passed in the array or create some sort of structure that can be used to hold that definition and pass that around. While I can see some contexts in which it would make sense to assemble a reasonably complex structure like that, it would be overkill for this application, at least in its current form, and would not resolve the problem in this current context. This will probably be more clear if I use a graphic.



The ptkdb window to the rightr illustrates the context resulting when I directly pass the @recs array and the $repository file handle reference to the write_recs subroutine. If you look in the expression panel on the right side you will see the @_ arguments array at the top, with the two lines from the data file being processed as the first two elements and the filehandle reference the third. Below that you can see the contents of the @recs array after it has been populated with the statement "my (@recs,$repository)=@_;". As you can see the entire arguments array has been pulled into @recs, and if you were to check the contents of $repository you would see that it is undefined. That is exactly what you should expect ... there is no structure in the arguments array except as a standard list of elements.


There are a number of ways to deal with this. The first, and most basic, would be simply to pass the @recs array as the last element in the array list. That is kind of like dealing with a broken turn signal by hanging your arm out the window and signalling. It works, but when it is cold it is not a lot of fun and does not work real well when it is dark. How would you deal with a situation in which you have two variable-length arrays in the argument list? Kludging something together to address that would get real clunky real fast. One viable alternative would be to construct a function prototype, with which you can specify the form of the arguments passed to the function. That's all well and good, but using a function prototype here is a little like renting a tractor-trailer truck when you need a pickup truck. Besides, constructing a prototype effectively does what I am going to do by hand. That, as you might have anticipated, is to pass references to the items with which the subroutine will be concerned. (When a prototype is constructed it takes references to the passed items as the script is compiled. In that sense constructing the references manually is likely to avoid a little bit of overhead.) If you look in the subroutine write_recs itself you will see that the only change required to make this work (beyond the changes required to print to the filehandle reference) is to dereference the scalar that points to the array. This is done simply by putting an @ sign in front of the bracketed reference, to force it to be evaluated as an array ((@{$recs})).



Given that the module is configured as it is, the subroutines are called in much the same manner as they were called when they were physically located within the script, with the exception that the references that the subroutines use must be constructed and appropriately fed to the subroutine. It is important to note that given the way that the export is configured in the module, the simple act of "use"ing the module will result in the importation of the symbols specified for export at the beginning of the module. If you read down through the script you will see that the first real change occurs around line 30 of this version. Where before I had declared the scalar and run the subroutine, which filled the scalar with the reference to the hash, in this version the scalar $hostref in the main namespace is not inherently visible to the subroutine, so the subroutine creates the hash and returns a reference to the hash, which is assigned to the $hostref scalar. There are, of course, other ways to get this done, but I find this form the most appealing because it accurately reflects the way most people would think about what is going on here.


If you read that last sentence and thought to yourself "I can sense another digression coming on here.", you were right. When you write subroutines in general and modules in particular you can use the flexibility of perl to implement the interface to any given function or subroutine in the manner that will make the most sense when it is called. In this instance, for example, I consider
my $hostref=hash_assign;
to be inherently legible. Similarly, the lines assigning the output from the free_space() subroutine to the $free scalar,
				$free=free_space(\$dir,$hostref,$df);
or calling err_print with the $error and the $log filehandle reference
			err_print($error,$log);
represent inherently intuitive manners in which to access the functionality provided by the subroutine. This is another of those things at which you get better as your experience grows, but giving some thought to the issue will make a substantial impact on the maintainability of your code regardless of your level of experience, and that is something that will benefit you as well as anyone who ever has to be responsible for maintaining your code.


Rather than getting detailed about code that is probably getting very familiar right now, I will just skip through the small changes to the insert script required to make accessing the subroutines from within the module work. After the change in the manner the hash_assign() subroutine is accessed, the next change is in the manner free_space() is accessed, near line 73. As you can see, the function is passed a reference to the $dir scalar, the $hostref reference to the %hostnum scalar (which, you may realize, no longer resides in the main namespace but in the module namespace), and the $df Filesys::DiskFree object. The returned result is of course stored in the $free scalar. In the previous version, of course, all that was required was passing the $dir scalar directly to free_space().


In lines 88 and 89, just after the space availability/filehandle viability test loops, the $log and $repository scalars are initialized with references to the filehandles open on the log and repository files.
my $log=\$log_fh;
my $repository=\$repo_fh;
These two reference scalars are of course important inputs to the write_recs and err_print() subroutines. (You would think I would develop some consistency in the way I name these things, wouldn't you?) You can see the first example of the use of the err_print() subroutine in line 131, and again in lines 164 and 181. In line 175 the $recs_ref scalar is initialized as a reference to the @recs array, which you probably recall holds the records stored in a single output file. In line 174 you can see the write_recs subroutine called, with that reference and the reference to the repository file handle as arguments.


As I said before, if you compare this version of the insert script with the previous pre-module version, there is really not a whole lot of difference. Putting things into modules primarily forces you to do things that might be considered housekeeping, becuase you have to pay attention to where the module will get what it needs. For many of the tasks for which you might want to use perl, you really don't have to worry about creating modules, but once an application reaches a certian size, or if you find yourself cutting subroutines and pasting them into new scripts, or if you are developing a series of scripts that revolve around common tasks, using modules to share code can substantially broaden your approach. You will likely see that better as I move along.


Remember the subroutine I was going to use the pass hidden fields between incarnations of the cgi script, one or two incarnations ago? Well, on the next page I am going to revisit that one, and use it as the first subroutine in a module that will contain interface elements. I can tell that your anticipation is hard to contain.


Samples for this page


Previous: The First Subroutine in the First Module
Next: Starting the Interface Module