Document Home


Samples for this Section


Starting to Build Modules

The First Subroutine in the First Module


I gave some thought to which subroutine should be the first to go into the first module. Something tough, but not too tough, with more than one piece of data required from the surrounding environment, but not an overwhelming amount. (In a situation in which a subroutine requires quite a lot of information from the main namespace there should be serious question as to whether that subroutine should be partitioned off into a seperate namespace anyway, but that is not an issue that I am going to address here. Just keep that in mind as I start through this stuff, and I think you will see why it is something to consider.) Although there are other good candidates, I quickly settled on free_space() in the insert script as the most logical choice. Beyond the considerations mentioned above, free_space() provides the kind of functionality that might have relevance on a broader scope within the application, and that is probably the primary rationale for incorporating code into portable libraries, which perl calls modules.



(As an aside, operating system and application code is increasingly made up of shared libraries. Generally in C or C++, in the Windows world these libraries are generally compiled into dll files registered at either the system or application level. Many of the errors associated with earlier versions of Windows are actually attributable to custom versions of dll's installed by applications. But that's besides the point, these files expose certain interfaces and return certain results, in just the same way as the modules I develop here will. There may be some difference in the form, but what is being done is the same thing. In the linux world, it is fairly common for the application to be compiled from source code on the machine on which it will be run, especially if it has not been installed as part of the specific linux distribution installed on the machine. In that environment, the shared files are generally called lib<something or other> and installed in one of the directories named lib. You might ask, why are you rambling on about this? If you know this, you're probably grumbling at the digression. Hey, this is my turf. My point is that the way this application will be developed reflects the way the vast majority of code is written these days. In any specific circumstance different tools may be used, and the expression will vary, but the forms are much the same. If you are new to this, you should know that as you start picking it up your understanding of that surrounding environment will broaden at an accelerating pace. Now the flip side of this coin is that one of the things you will be aware of is just how much you don't know. But hey ... that's just the way of things. There will come a time when you'll be talking about something, and will realize with surprise that you are expressing things that you did not even know about a few months previously. Kinda sneaks up on you. Learning is like that.)



The process of building modules is not all that complicated at heart. It made sense to me that free_space() should be in a module of system utilities, so I decided to name the module BB_UTIL. By convention, perl expects that modules will be named with the extension .pm, so the full name of the file is BB_UTIL.pm. I used all capitals in the identifying part of the name in order to make references to it stand out in the body of whatever code in which it is used. You will see what I mean in just a bit. Now there is no inherent reason to partition modules on the basis of their functionality. Obviously, the module that holds free_space() is going to have to be resident on the machine that runs the insert script, or at least, a module that holds it must be there. And of course, it would be possible to include all of the custom modules for the system's operation in a central location on the cluster's file system and access them from there. In fact, I think that's a good idea, but I'm not going to get into it right now.



In general perl use, it would not be at all unusual to have several associated scripts on a machine share a module that had several subroutines that had little in common with each other beyond the fact that each might be used by more than one script in some fashion. This is a perfectly legitimate thing to do, and worthwhile to keep in your bag of tricks. One of the "real world" uses of perl is in applications that sit behind the scenes, making everything else work, and you can use perl in the fashion to amaze your friends and confound your enemies. Things you put together like that seem to materialize out of thin air. In the context of a larger scale, somewhat more integrated application like this, however, it is more likely to see modules organized along functional lines. From an organizational standpoint, this kind of functional partitioning provides a coherent structure , and on a perl system level should help to effectively manage the application's namespace.



At base, this module need have nothing more than

package BB_UTIL;
1;

Then you can use the module in a program and, as long as perl can find it, it will not complain. (Evey perl module must end with the value 1 ... that is just the way that perl works.) Note that the shebang (#!/usr/bin/perl) line is not present. Since the code in a perl module is executed within the framework of a calling script, the interpreter has already been invoked. Without any subroutines within the structure, of course, the module doesn't provide a whole lot of functionality, but it is a viable module. To get some functionality, of course, I have to start putting stuff inside it. I can get a basic, Yugo-style module simply by dropping subroutines into it and providing some capability for them to communicated with the outside world. In the case of free_space(), that meant that I had to expand the argument list that I pass to the subroutine to include whatever the subroutine would have to know to do what it does. Thus, in the first real, functional line of the subroutine as implemented in the module,
my ($dir,$hostref,$df)=@_;

I initialize scalars scoped to this subroutine that have the same names as in the original script, so I don't have to worry about renaming throughout the subroutine.



An important thing to recognize here is that all three of the items pulled from the argument array hold references, the first two created in the script itself and the third as a reference in its object initialization. This is an important element in assuring both the integrity of what is being done by the module and its portability to other contexts. I am working with the real, original stuff, and not a copy of it. (Now there are times when it would be appropriate to be working with a copy of it ... I'll cross that bridge when and if I come to it.)



Since I am working with references, however, I have to reach back a level further if I need to do something with the actual contents of the scalar, array or hash to which the reference points. (I have one of each of 'em here. Funny how that works, huh?) This process is called de-referencing. A good example of that is found in the second line of the subroutine. In the previous version I was able to directly split the scalar to get at the value conatined within, but in this case the scalar I'm dealing with is just a reference to the $dir scalar in the main namespace. Therefore, the expression


my @full_path=split('/',$dir,4);

is changed to

my @full_path=split('/',$$dir,4);,

the extra $ sign being used to de-reference the scalar. Other than that, there are no other changes required to the code in the subroutine. As the previous version of the subroutine already accessed the slices of the array referenced by $hostref through de-referencing, no changes are required there. The $df object is a little different in a way that steps outside of the bounds of this framework, and I'll get into that when and if I start getting into objects.



Similarly, the amount of change to the insert script required to make use of the subroutine is small. First, I have to make sure that perl can find the module. Now there are a number of ways to do that, including modifying either the environment variable that holds perl's search path or the @INC array, which is how perl represents that search path internally. Probably the easiest thing to do is also uniquely well-suited to the use of a common directory for modules related to this system under the mfs mount point. Quite simply, I just tell the system to use a certain directory as a library path:


use lib "/home/www/bb_lib";

is the path I've used as I've developed this example. As a result, when I say "use BB_UTIL;" perl is able to find the module. Now all that is required to use the functionality included in the module is to call it appropriately:

"$free=BB_UTIL::free_space(\$dir,$hostref,$df);".

This, of course, calls the free_space() subroutine from the BB_UTIL namespace and passes the three arguments as references, storing the returned value into the $free scalar.



And that, ladies and gentlemen, is it. I now have a function that I can use to determine the amount of available space on the cluster, as long as the appropriate arguments are passed to it. Many would argue that having to specify the module namespace as I do here is a pain in the neck. I actually feel that it is not all that big a deal, and helps to keep the main namespace clean. But I will concede that it can be handy to refer to a module's subroutines without specifically pointing to its namespace, and most modules do provide the capability to incorporate their subroutines or a specified subset of them into the main namespace. You may have noticed that I'm nearing the end of the page, however, which indicates that I'm near the end of the virtual space I've allocated for this page. That must mean that I'm going to get into that on the next page, huh?<grin>



The version of the insert script and the module used here are in the samples file for this section as recs_insert_6_a.pl and first_version_of_module_rename_to_BB_UTIL.pm. I've chosen that way of naming the module file so I wouldn't have to be continuously naming the module to something different and using different names for it in the code, which I suspected might be somewhat confusing.



Next: Extending the Utility Module