I gave some thought to which subroutine should be the first to go into the first module. Something tough, but not too tough, with more than one piece of data required from the surrounding environment, but not an overwhelming amount. (In a situation in which a subroutine requires quite a lot of information from the main namespace there should be serious question as to whether that subroutine should be partitioned off into a seperate namespace anyway, but that is not an issue that I am going to address here. Just keep that in mind as I start through this stuff, and I think you will see why it is something to consider.) Although there are other good candidates, I quickly settled on free_space() in the insert script as the most logical choice. Beyond the considerations mentioned above, free_space() provides the kind of functionality that might have relevance on a broader scope within the application, and that is probably the primary rationale for incorporating code into portable libraries, which perl calls modules.
(As an aside, operating system and application code is increasingly made up of shared libraries. Generally in C or C++, in the Windows world these libraries are generally compiled into dll files registered at either the system or application level. Many of the errors associated with earlier versions of Windows are actually attributable to custom versions of dll's installed by applications. But that's besides the point, these files expose certain interfaces and return certain results, in just the same way as the modules I develop here will. There may be some difference in the form, but what is being done is the same thing. In the linux world, it is fairly common for the application to be compiled from source code on the machine on which it will be run, especially if it has not been installed as part of the specific linux distribution installed on the machine. In that environment, the shared files are generally called lib<something or other> and installed in one of the directories named lib. You might ask, why are you rambling on about this? If you know this, you're probably grumbling at the digression. Hey, this is my turf. My point is that the way this application will be developed reflects the way the vast majority of code is written these days. In any specific circumstance different tools may be used, and the expression will vary, but the forms are much the same. If you are new to this, you should know that as you start picking it up your understanding of that surrounding environment will broaden at an accelerating pace. Now the flip side of this coin is that one of the things you will be aware of is just how much you don't know. But hey ... that's just the way of things. There will come a time when you'll be talking about something, and will realize with surprise that you are expressing things that you did not even know about a few months previously. Kinda sneaks up on you. Learning is like that.)
The process of building modules is not all that complicated at heart. It made sense to me that free_space() should be in a module of system utilities, so I decided to name the module BB_UTIL. By convention, perl expects that modules will be named with the extension .pm, so the full name of the file is BB_UTIL.pm. I used all capitals in the identifying part of the name in order to make references to it stand out in the body of whatever code in which it is used. You will see what I mean in just a bit. Now there is no inherent reason to partition modules on the basis of their functionality. Obviously, the module that holds free_space() is going to have to be resident on the machine that runs the insert script, or at least, a module that holds it must be there. And of course, it would be possible to include all of the custom modules for the system's operation in a central location on the cluster's file system and access them from there. In fact, I think that's a good idea, but I'm not going to get into it right now.
In general perl use, it would not be at all unusual to have several associated scripts on a machine share a module that had several subroutines that had little in common with each other beyond the fact that each might be used by more than one script in some fashion. This is a perfectly legitimate thing to do, and worthwhile to keep in your bag of tricks. One of the "real world" uses of perl is in applications that sit behind the scenes, making everything else work, and you can use perl in the fashion to amaze your friends and confound your enemies. Things you put together like that seem to materialize out of thin air. In the context of a larger scale, somewhat more integrated application like this, however, it is more likely to see modules organized along functional lines. From an organizational standpoint, this kind of functional partitioning provides a coherent structure , and on a perl system level should help to effectively manage the application's namespace.
An important thing to recognize here is that all three of the items pulled from the argument array hold references, the first two created in the script itself and the third as a reference in its object initialization. This is an important element in assuring both the integrity of what is being done by the module and its portability to other contexts. I am working with the real, original stuff, and not a copy of it. (Now there are times when it would be appropriate to be working with a copy of it ... I'll cross that bridge when and if I come to it.)
Since I am working with references, however, I have to reach back a level further if I need to do something with the actual contents of the scalar, array or hash to which the reference points. (I have one of each of 'em here. Funny how that works, huh?) This process is called de-referencing. A good example of that is found in the second line of the subroutine. In the previous version I was able to directly split the scalar to get at the value conatined within, but in this case the scalar I'm dealing with is just a reference to the $dir scalar in the main namespace. Therefore, the expression
Similarly, the amount of change to the insert script required to make use of the subroutine is small. First, I have to make sure that perl can find the module. Now there are a number of ways to do that, including modifying either the environment variable that holds perl's search path or the @INC array, which is how perl represents that search path internally. Probably the easiest thing to do is also uniquely well-suited to the use of a common directory for modules related to this system under the mfs mount point. Quite simply, I just tell the system to use a certain directory as a library path:
And that, ladies and gentlemen, is it. I now have a function that I can use to determine the amount of available space on the cluster, as long as the appropriate arguments are passed to it. Many would argue that having to specify the module namespace as I do here is a pain in the neck. I actually feel that it is not all that big a deal, and helps to keep the main namespace clean. But I will concede that it can be handy to refer to a module's subroutines without specifically pointing to its namespace, and most modules do provide the capability to incorporate their subroutines or a specified subset of them into the main namespace. You may have noticed that I'm nearing the end of the page, however, which indicates that I'm near the end of the virtual space I've allocated for this page. That must mean that I'm going to get into that on the next page, huh?<grin>
The version of the insert script and the module used here are in the samples file for this section as recs_insert_6_a.pl and first_version_of_module_rename_to_BB_UTIL.pm. I've chosen that way of naming the module file so I wouldn't have to be continuously naming the module to something different and using different names for it in the code, which I suspected might be somewhat confusing.