Document Home

Mosix Kernel Configuration

The process of configuring and compiling a custom linux kernel seems to be greeted by those unfamiliar with or new to linux with an amount of trepidation somewhat akin to performing open-heart surgery in the dark with only a philips screwdriver. The reality is much less intimidating, and since there is no reason why why you can't make a backup copy of your current kernel, you're really not risking anything. Now if you want to get into the business of modifying the kernel software and building a kernel based on that ... but that's a level of detail you do not have to get into to effectively use linux, and indeed few of the people who do use linux mess with the source.


Obviously, before you can build a new kernel you have to get the source. In the context of building a mosix kernel that involves just a couple of steps more than the usual, but don't worry, we're going to walk through that. First, download the kernel source here (.gz) or here (.bz2). Unless you configured samba and assigned yourself wholly inappropriate rights within the directory structure of the machine upon which you are going to build the kernel, you are not going to be able to save it to its ultimate destination, so just save it somewhere you won't lose it. (Not to say that I've ever done something like that, of course, but I've heard of people doing that.)


That brings up a point. There is no reason why you have to build the kernel on one of the machines that is going to be part of the cluster. If the machines in your cluster are all p75-p100's and you have a much faster debian machine available, it's perfectly alright to build the kernel on that machine. As you will see, we are going to wrap the kernel up for distribution anyway. The reasons why you wouldn't want to do that might involve not wanting to mess with the source tree and configuration on the faster machine. You can get around that, by renaming the current source tree before installing and by saving separate makefile configurations when using the same source versions. But if you are like most of us you have plenty of stuff to do anyway, so there's really nothing wrong with doing the build process on the slower machine and letting it chug happily away while you are doing something else. At least that way you don't have to worry about remembering to put something back the way it was.


Once you have downloaded the source, you need to make sure that you have three Debian packages installed on your machine: patch, kernel-package, and kernel-patch-mosix. As during virtually all package installations on a working Debian system, use the apt-get install command. In this case, issue the command "apt-get install patch kernel-package kernel-patch-mosix", and the apt utility will fetch the relevant files, including any requisite supporting ones, and install them for you.


Once this is done, copy the file holding the new version of the linux source to the /usr/src directory and extract it. If you downloaded the gz file you'll use the command "gzip -d linux-2.4.17.tar.gz", and if you downloaded the bz2 you'd use "bzip2 -d linux-2.4.17.tar.gz". (You could also form the command around gunzip or the bzip2 analog. This is just the way I do it.) After extracting you'll have the tar file, so if you use "tar -xvf linux-2.4.17.tar" you will create the linux source tree. If the Debian kernel-patch-mosix package has been installed, you should see the directory "kernel-patches" in the /usr/src directory, and under that the path i386/apply, with the script "mosix" in that directory. Once you've verified that it is there, go back to the root directory of the linux source distribution (/usr/src/linux) and issue the command "/usr/src/kernel-patches/i386/apply/mosix". That will patch the kernel source appropriately. Note that you will have to be in the root directory of the linux source distribution and the downloaded source must be for kernel 2.4.17 or the patch will not install. If these conditions are met and the patch still doesn't apply, take a good look at the error messages and try to asses whether there might be requisite packages missing from your installation. If everything seems complete but yet your patch will not apply, I would suggest that you download the distribution from the mosix webpage and follow the instructions for a manual installation. That might give you sufficient feedback to help you identify the problem area. However, I seriously doubt this will be necessary. I've gone through this process twice and as long as the kernel source is the right version and the mosix script is run from the right location I've not had any trouble patching the source.


At this point you are ready to begin configuring the kernel. The three generally used configuration choices are "make config", "make menuconfig", and "make xconfig". The first could drive a person crazy, stepping through all the options one by one, and the third, while effective, requires that you have an xserver configured. So we will use the second, which is dependent on ncurses for display and you almost certainly have that.


When you issue the command "make menuconfig", the script will execute a number of setup steps, some run only on its initial run, and then display the initial screen. If the Mosix option is not at the top of this screen, your patch did not apply properly and you should resolve that before going further.

The second screen of the main menu.

The third screen of the main menu.

These are the options available for Mosix configuration. The only option other than the defaults I've selected is that for MFS, the mosix file system.

I show you the code level maturity screen only because I have had drivers in other contexts that would not display unless this is turned on. Since then, I always turn it on.

Identify the processor type and family your machine(s) have. The kernel configuration will be built to optimize for that processor. Note that SMP support is enabled in the kernel. I just forgot to turn that off. Turning it off would save a little bit in the size of your kernel, but leaving it in does not really give you a performance hit.

Make sure plug and play support and ISA plug and play support are turned on. If they work on your older machine, so much the better.

If you are going to use RAID on your machine, you will need to highlight the appropriate option(s) here.

I show you this screen because if you are going to use a SCSI adapter you will have to specify it here. You areach this screen from the bottom of the menu accessed by SCSI support.

This screen is the opening screen under Network Device Support. Recall that I'm using a 3com 3c590, an Intel Etherexpress Pro, and NE2000's. All of these are found under the Ethernet (10 or 100 Mbit) option. (I'm going through this for this option just to give you a sense of how the options can set out for a given section, and because I've found that the network device section can be just a little confusing.)

Highlighting the 3c590. You'll note that "M" is between the brackets. That means that the driver will be loaded as a module. Highlighting the selection and hitting the space bar once toggles M (where appropriate), hitting a second time displays "*", which means that support will be built into the kernel, and hitting the space bar for the third time toggles support for the device off.

Highlighting the NE2000. Note that I had to toggle the Other ISA cards selection to get the NE2000 to appear. Note also that I've specified that support for it be built into the kernel rather than loaded as a module. As I said before, I've found loading ISA card drivers as modules to be somewhat more problematic than PCI cards.

Highlighting the EtherExpressPro. Note that Intel is not mentioned, unlike most of the drivers, which do mention the name of the manufacturer. It is not at all unusual to find that your card is represented in a fashion somewhat different than what you might expect. Be prepared to hunt if you have to, and perhaps go to Google and search on linux + the name of your card. You may find something that will clue you in to a name you would not have expected.

Under File Systems, specifying the DOS file system.

Network File Systems is an option under File Systems. Here I am specifying the Smbfs file system.

After you have finished configuring your kernel options and select Exit from the Main Menu, this option will appear. If you want to build a kernel based on what you've done, select "Yes".

Once you've saved the configuration, you are ready to build the kernel. As the menuconfig script ends you will be told you now need to "make dep". For what you are doing here, don't worry about it. Instead, issue the command "fakeroot make-kpkg -rev=mosix.1.0 kernel_image". To use this command you have to have the fakeroot package installed, if you get a "command not found" message just use apt-get to get the package (apt-get -i fakeroot) and run it again. The machine will go into the process of reading the configuration you've stored and building a kernel based on it. You have some time on your hands now. Depending on the speed of the machine doing the build you have anywhere from half an hour to several hours. You might want to watch what's going on on the screen for a few minutes, but in all likelihood this is going to take a while. Take an early lunch, work on something else, whatever. Don't sit and wait for this to finish, it'll drive you crazy.


Once the process is complete, you will have completed compiling a custom kernel. Congratulations!! By using the make-kpkg command you've created a kernel with the requisite module libraries wrapped inside a deb. You'll find it named "kernel-image-2.4.13_mosix.1.0_i386.deb" in the /usr/src directory. Don't worry about the face that it says 2.4.13, that is getting written into the makefile by the make config process. You started with the 2.4.17 kernel source, and a 2.4.17 kernel is what you have.


You are now ready to install the kernel. Thr first thing you might want to do is make a copy of the kernel from which your machine is currently booting. Just copy vmlinuz in the root directory to something else like vmlinuz.001 ("cp /vmlinuz /vmlinuz.001"). The kernel installation script will make a copy too, but it doesn't hurt to have a level of reassurance. Once you've done that, you can just issue the command "dpkg -i /usr/src/kernel-image-2.4.13_mosix.1.0_i386.deb" and the files will be installed in the appropriate places on your machine and lilo run to update the boot loader. You can now reboot the machine, but you might want to grab onto a copy of either a boot diskette for the machine or a rescue diskette, just in case something goes wrong. Not that anything will, you understand. Think of the fact that you have them as a talisman that you can wave in front of the computer, just to let it know that it can't seriously hurt you.


TROUBLESHOOTING

If your machine does have problems as it reboots, watch closely to see what the boot process is telling you. Obviously there are several levels on which you could have problems:

1 If the boot simply freezes the machine, which is not impossible, though very unlikely, you'll need to use either the rescue diskette or a boot diskette to continue. It is likely there was a major problem with the kernel generated, probably due to basic hardware incompatibilities. Once you've gotten the machine back up, take a look in the syslog (/var/log/syslog) to see if there are messages about hardware failures and at what point the boot stopped. This should give you a clue regarding where the problem is coming from, and you can go back into the kernel configuration script and make appropriate changes.

2 Far more likely is that you forgot to specify a device that is required for the system to recognize the drive system. This is not at all unlikely, the kernel that Debian uses on the rescue diskette (or that is loaded from the CD) includes support for a wide variety of devices. It is easy to take for granted that a device will be recognized, with the resultant surprise when you try to boot and your hard drive isn't there. Closely related to this is the difficulty that can result from misspecification of the appropriate driver. This is not all all difficult to do. I've pointed out that sometimes you need to hunt for the appropriate network card driver. That can extend to other resources as well. For example, I use quite a few Tekram DC-390U2W SCSI cards. Tekram has its own drivers, which I've had trouble patching into the kernel, and you can find messages on the Net suggesting that you should use a specific set of NCR drivers. If you do that, the machine will find the adapter but will not show any drives attached (a particularly frustrating situation). It is possible that the NCR driver would work with some tweaking, but that's the last thing you want to deal with if you don't have to. The driver that should be loaded for this adapter is a Symbios Logic driver. (Symbios Logic makes the SCSI chipset used by the card.) If you don't know that, and are mounting some key directories on the drives supported by that adapter, your kernel is going to have some trouble loading your system.

Like the situation described in 1, above, you should boot with a boot floppy or rescue disk and look at the syslog for any clues. The most important thing is to determine what device is giving you problems. You can then look around for information on why it might be giving you problems; searching the web and newsgroups is a good start.


3 For some reason, a device you are wanting to load as a module won't load that way. The symptoms of this are very similar to 2, above, and you'd track it down the same way, by looking at the syslog. There are a number of different things you might try here, but I think I'd be most inclined to try to compile support for the device directly into the kernel.


Once you are beyond these few basic steps, trouble-shooting your kernel configuration is likely to be hardware-specific enough that it is difficult here to do anything more than point you in the right direction. You should recognize that it is perfectly possible that the drivers for two devices will conflict with each other in some unusual way. This is most common when the hardware is relatively unusual, but it can happen in ways that surprise you. Remember that the purpose of the cluster is to combine relatively generic surplus machines. Therefore, if you are running into this situation on one of your cluster machines, you really should take a look at what hardware is in the machine and whether it should be there. The answer to that one, of course, will be dependent upon what role the machine is to play. It is likely that the only machine you really have to worry about is one playing the role of Ralphzilla-raider, database server and shared file space, as you add devices to provide storage.


This reinforces the idea that most machines within your cluster should be configured as simply as possible, with as common a set of components as possible. If you can do that, you could conceivably use one kernel to boot each machine in the cluster. However, it is quite likely that you won't be able to do that. Given that, catalog your kernels carefully. There is no reason why they all must remain kernel-image-2.4.13_mosix.1.0_i386.deb. You could have generic_kernel.deb and raid_kernel.deb. If you leave them in a configuration in which they can get confused with each other, they will. Next - Mosix Configuration