Ralphie's Banner Page

Providing Common Network Applications and Services with Mosix Clusters of Surplus Machines
Permanently Under Construction


I have been using linux, generally Debian, as a platform for various network services at the remote office location I administer, which is comprised of roughly 100 users at one end of a T1 line. While the bulk of our official services are provided through Netware and Windows NT Servers, I've come to increasingly rely on debian and perl to fill in the cracks of our local architecture. For the last 16 months, for example, I've been using squid as a proxy cache to minimize the network traffic on the T1 line attributable to web traffic, and generally the use of squid has reduced the number of web requests going out by in the neighborhood of 50%, and overall traffic in megabytes through the pipe by 10-15%. This is with a relatively small population using the cache, those numbers would go up with a larger pool of users.


For some time I've been intrigued by clusters, but the relatively meager resources available to a state agency have prohibited me from being able to investigate further. (We still have users relying on 200 Mhz. pentiums as their primary machine.) Early in this year (2002), however, we were able to replace virtually all of the p75 - p100 machines in everyday use, and I resolved to use some of them to build a cluster. As I refreshed my previous investigations into clusters, I came to realize that an important niche for which not a lot was available was the use less-powerful machines to provide network resources. Anyone who is familiar with linux knows the feeling of surprise that comes when you realize how much meaningful processing can be accomplished by machines considered irrelevant by today's standards. I've used a p75 machine as a scripting host for some time, performing various maintenance and preprocessing chores on network resources as scheduled cron jobs. The key to providing reponsive networks and applications is far less hardware-based than most people realize, it's more a matter of having the appropriate resource available when the user wants it. It makes far more sense to have a p75 preprocess data into an intermediate or final state, for example, than it does to have that processing done at the time the user requests it, and then blame the hardware for delays in response. At the same time, the use of such a strategy can also avoid the problems associated with collapsed levels of detail in data warehouses, resulting in loss of information. At a minimum such resources make it possible to construct more richly-detailed multi-dimensional representations of the data. It is also possible, however, to overrun the capacity of this older hardware as you broaden its use. A p75 slicing its time between an sql query being executed from a cgi script and another process bogs down just as quickly as it does when a user interface puts multiple demands on it.


Applications that are high-profile may have sufficient budgets to avoid having to deal with the constraints imposed by stochastic variations in resource use, but those times are far fewer than most assume. Many applications, whether analytic or productivity-related, may be run only occassionally, or once a day, leaving the resources they use idle the remainder of the time. The key to optimizing network use is to handle jobs appropriately. If that is done, whatever investments are made in new hardware are immediately effective, and not merely attempts to buy a better response time for a single application. The use of clusters at the low-end can be singularly effective in this regard, both by providing resources that might well not be available if competing with higher-profile apps and by streamlining the application of processing power to the task appropriate at the moment.


This is starting to sound like an exercise in academic jargon, with the concommitant lack of real stuff that you can use to apply in the world outside of staged presentations. My intent here is to walk through the construction of this cluster, delineating the configuration of the machines in it, the installation and configuration of the debian linux environment on which mosix runs, the installation and configuration of mosix, and the installation and configuration of the application software that runs within the environment. If you are familiar with linux, you should have little trouble translating what is presented here into whatever distribution you choose, beyond the pain of figuring out why your distribution puts everything in a completely different place than mine does. If you are not familiar with linux, I'd suggest that you pick up a set of cds and a book from somewhere and spend at least a few weeks playing with it before going too much further. This will all make a lot more sense if you at least have a bit of a framework to hang it on. If you are just picking up linux, go ahead and pick up debian. People will probably tell you that this or that distribution is this or that, and there is no question that if you're new and after a desktop environment you might want to pick up something like Mandrake, but this is a server app, folks, and servers are generally considered to be debian's forte. Besides, that's what I use. (grin)


By no means should I be considered a Mosix expert, and this will likely be a decidedly non-linear document. As I've written this, I've found bits here and there that should have been configured differently. Rather than back-spacing and ironing those out, there are loops and diversions in here. I hope that will represent one of the strengths of this document. If you are using this to learn, you'll pick up information from those changes just as I did. It is not likely that what is presented here will represent a cookbook for implementing a mosix installation, but I intend to come as close as possible, and in so doing present a framework that you should be able to use as a reference for your own implementation. Within the document I intend to point toward sources of information that are relevant in contexts beyond the current. This is also intended to be a work in progress. As I'm able to tweak the environment under various scenarios I intend to update the document, which should provide a pool of practical experience from which an individual configuring their own cluster might be able to draw. Clusters like this have the potential of providing higher levels of local network resources to a wide range of organizations that might not have the financial wherewithall to acquire them otherwise. Developing such resources could represent a good opportunity for network and systems professionals to extend their capabilities in a real-world context while minimizing the associated costs for both the organization and the professional involved, a win-win proposition. If this document helps to foster that, it will have served its purpose.


Addendum, June 2006: By request, this site has morphed into a detailed discussion of the development of a mod_perl-driven custom web application. That aspect of this site starts with a simple cgi script, and moves into implementing the application in version 1.3 of the mod_perl environment on an Apache web server. The front-end machine of the cluster houses the web server itself, and the database server that houses the data used by the application resides on one of the machines interior to the cluster. Relatively little of this aspect of the site has anything to do with mosix per se, I hope at some point to return to that element in these pages.


And now, the moment you've all been waiting for, here it is, the massive cluster ... the one and only ...



RALPHZILLA! .


(All of my linux machines are named ralph something or other, with the exception of bridget.)


The links below detail the development of the various elements of Ralphzilla.


NAVIGATION

Introduction Debian Mosix Application Software Various and Sundry Items I Put Here For Lack of A Better Place to Put Them
Basic Hardware Components Partition and File System Structure Kernel Configuration Hylafax Adding A Machine to the Cluster - Quick Reference
.. Initial Debian Setup Mosix Configuratiom Postgresql Directing Output from Xwindows Apps to a Remote Windows Desktop
.. Potato (Stable) Installation MFS .. Squid Proxy Cache
.. Woody Upgrade Communication .. RAID Configuration
.. Installing Straight to Testing (Woody) Across the Network Cluster Package Management .. Basic Perl Configuration


Using Samba to Create a Virtual File Server The Database Application - Introduction An Integrated Dynamic Environment Bells and Whistles Creating Containers for Subroutine Arguments
A Basic Samba Implementation Database Design Extending the Integrated Environment Moving Onto Some New Turf - Mod_Perl Maintaining the League Schedule
Designing the Baseball Database Related Stuff: Debugging Perl and Cron Starting a Structure Managing Teams and Players
... Apache The Rest of the Record, and System Logging and Fail-Overs Restructuring the Entry Process Using the Apache API Revising the Database Structure
... HTML, Basic CGI,
and Examples
Rounding Out the CGI-Based Entry System Adding an Authentication Handler, Some Interface Bells and Whistles, and Other Odds and Ends ...
... Generating HTML with More Complex CGI
and Examples
Starting to Build Modules Adding the Editing Interface ...


... ... ... ... Management Scripts
... ... ... ... ...
... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... Links


CONTACT ME
In time, I may create some fancier form of feedback for these pages, but in the interim if you wish to offer feedback on the material please drop me a line.