For some time I've been intrigued by clusters, but the relatively meager
resources available to a state agency have prohibited me from being able to
investigate further. (We still have users relying on 200 Mhz. pentiums as
their primary machine.) Early in this year (2002), however, we were able to
replace virtually all of the p75 - p100 machines in everyday use, and I
resolved to use some of them to build a cluster. As I refreshed my previous
investigations into clusters, I came to realize that an important niche for
which not a lot was available was the use less-powerful machines to provide
network resources. Anyone who is familiar with linux knows the feeling of
surprise that comes when you realize how much meaningful processing can be
accomplished by machines considered irrelevant by today's standards. I've
used a p75 machine as a scripting host for some time,
performing various maintenance and preprocessing chores on network resources as
scheduled cron
jobs. The key to providing reponsive networks and applications is far less
hardware-based than most people realize, it's more a matter of having the
appropriate resource available when the user wants it. It makes far more sense
to have a p75 preprocess data into an intermediate or final state, for example,
than it does to have that processing done at the time the user requests it, and
then blame the hardware for delays in response. At the same time, the use of
such a strategy can also avoid the problems associated with collapsed levels of
detail in data warehouses, resulting in loss of information. At a minimum such
resources make it possible to construct more richly-detailed multi-dimensional
representations of the data. It is also possible, however,
to overrun the capacity of this older hardware as you broaden its use. A p75
slicing its time between an sql query being executed from a cgi script and
another process bogs down just as quickly as it does when a user interface puts
multiple demands on it.
Applications that are high-profile may have sufficient budgets to avoid having
to deal with the constraints imposed by stochastic variations in resource use,
but those times are far fewer than most assume. Many applications, whether
analytic or productivity-related, may be run only occassionally, or once a day,
leaving the resources they use idle the remainder of the time. The key to
optimizing network use is to handle jobs appropriately. If that is done,
whatever investments are made in new hardware are immediately effective, and
not merely attempts to buy a better response time for a single application.
The use of clusters at the low-end can be singularly effective in this regard,
both by providing resources that might well not be available if competing with
higher-profile apps and by streamlining the application of processing power to
the task appropriate at the moment.
This is starting to sound like an exercise in academic jargon, with the
concommitant lack of real stuff that you can use to apply in the world outside
of staged presentations. My intent here is to walk through the construction of
this cluster, delineating the configuration of the machines in it, the
installation and configuration of the debian linux environment on which mosix
runs, the installation and configuration of mosix, and the installation and
configuration of the application software that runs within the environment. If
you are familiar with linux, you should have little trouble translating what is
presented here into whatever distribution you choose, beyond the pain of
figuring out why
your
distribution puts everything in a completely different place than
mine
does. If you are not familiar with linux, I'd suggest that you pick up a set
of cds and a book from somewhere and spend at least a few weeks playing with it
before going too much further. This will all make a lot more sense if you at
least have a bit of a framework to hang it on. If you are just picking up
linux, go ahead and pick up debian. People will probably tell you that this or
that distribution is this or that, and there is no question that if you're new
and
after a desktop environment you might want to pick up something like Mandrake,
but this is
a server app, folks, and servers are generally considered to be debian's forte.
Besides, that's what
I
use. (grin)
By no means should I be considered a Mosix expert, and this will likely be a
decidedly non-linear document. As I've written this, I've found bits here and
there that should have been configured differently. Rather than back-spacing
and ironing those out, there are loops and diversions in here. I hope that
will represent
one of the strengths of this document. If you are using this to learn, you'll
pick up information from those changes just as I did. It is not likely that
what is presented
here will represent a cookbook for implementing a mosix installation, but I
intend to come as close as possible, and in so doing present a framework that
you should be able to use as a reference for your own implementation. Within
the document I intend to point toward sources of information that are relevant
in contexts beyond the current. This is also intended to be a work in
progress. As I'm able to tweak the environment under various scenarios I
intend to update the document, which should provide a pool of practical
experience from which an individual configuring their own cluster might be able
to draw. Clusters like this have the potential of providing higher levels of
local network resources to a wide range of organizations that might not have
the financial wherewithall to acquire them otherwise. Developing such
resources could represent a good opportunity for network and systems
professionals to extend their capabilities in a real-world context while
minimizing the associated costs for both the organization and the professional
involved, a win-win proposition. If this document helps to foster that, it
will have served its purpose.
Addendum, June 2006: By request, this site has morphed into a detailed discussion of the development of a mod_perl-driven custom web application. That aspect of this site starts with a simple cgi script, and moves into implementing the application in version 1.3 of the mod_perl environment on an Apache web server. The front-end machine of the cluster houses the web server itself, and the database server that houses the data used by the application resides on one of the machines interior to the cluster. Relatively little of this aspect of the site has anything to do with mosix per se, I hope at some point to return to that element in these pages.
And now, the moment you've all been waiting for, here it is, the massive
cluster ... the one and only ...
The links below detail the development of the various elements of Ralphzilla.
![]()
| ... | ... | ... | ... | Management Scripts |
| ... | ... | ... | ... | ... |
| ... | ... | ... | ... | |
| ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... |
| ... | ... | ... | ... | Links |