Document Home

Partition and File System Structure

If you have not yet done so, take some time to think through the partition structure of your cluster machines, because creating that structure will be the first real substantive step of the installation. Recall what I said in the hardware section about adding drives ... if you have the space in your box and on your IDE channels, go ahead and add drives. The naming convention of devices in linux, for those who don't remember it, is here.


For those not familiar with it, a good perspective to have of the Unix/Linux file system is to recognize that the several default directories under the root ( / ) can all be located on separate disks, which is different from the way that DOS and Windows handle physical drives, though it can be simulated in the more advanced Microsoft operating systems starting with Windows NT, including Windows 2000 and Windows XP. Indeed, there is no reason why you can't create a directory and mount it on a drive or drives, as I've done with both Ralphzilla-raider and Ralphzilla-faxa. More on that later.


At this point, the primary things of which you need to be aware about the structure of the file system is that the two directories most susceptible to growth are the /usr and /var directories. The /usr directory is the default location for program files and the /var directory is where more dynamic files, like log files, are stored. Depending on what you run on the specific machine, the /var directory can come to occupy a substantial amount of space. The backup program I use, for example, keeps files identifying what is on each tape that has been created in the /var directory, and on that machine I have to periodically prune those or they will come to occupy hundreds of megabytes of disk space. Similarly, the postgresql database by default stores its data files in a subdirectory of /var, but you would probably not want to leave them there if database performance is a concern, as will be discussed in more detail in the section dealing with application software.


For the most part, the configuration of the disk and file systems for the client machines should be relatively straight-forward. The file systgem on Ralphzilla-free, for example, resides on two physical drives of 1 Gb. and 850 Mb. The 1 Gb. drive, hda, is occupied by one bootable partition that holds the root filesystem. The second drive, hdc, holds a 100 Mb. partition configured as swap space (type 83) and a 714 Mb. partition on which the /var directory is mounted. Recalling that Ralphzilla-free is one of the two machines in the cluster without any dedicated functions, I should think that the configuration of its file system, i.e., spread across a couple of disks to aid performance and minimize any disk space constraints, would be fairly characteristic of machines playing a similar role. Remember, however, that many of the older machines you might be working with may not be able to boot to a drive other than the first drive on the first IDE channel. If by some chance your configuration requires that you place a drive other than the one on which you intend to mount the root directory, create a small (ca. 10-15 Mb) partition as the first partition on hda and mount a boot directory there. (You will be given that opportunity when you start initializing partitions.) Remember, of course, to flag that partition as bootable.


You may note that the 100 Mb. swap partition is about twice the size of the generally accepted requirement of 1.5 times the amount of ram in the machine. There is no special reason for that, I just rounded off. The configuration and speed of the defined swap space on a machine is of prime importance when it is being used for multiple purposes, thus requiring that processes be able to swap the contents of their memory space out to disk so another application/process can make use of that memory. If a considerable amount of that were happening on a machine in the cluster, especially one configured to be a free peocessor, there would be a serious problem with the configuration of the cluster itself. Those who know anything about cluster know that many are constructed without any local disk space at all, relying on network space for what little swapping any individual box should be doing. Ralphzilla, however, is explicitly configured with hardware you might take out of your organizational graveyard. It doesn't have a fiber-channel connected disk array, it has software-only RAID 0 hosted by a p200 on a 10 Mbps ethernet pipe. Obviously, some level of customization has to be reached, and the first level of that customization is that swap space, and the file system in general, is maintained locally.


A more complex installation, Ralphzilla-raider was configured with all three drives involved in software-only stripe sets (RAID 0). This involved configuring the partitions for the three drives as follows: (I specify the partition types at this point because the linux raid partitions have to be specified as such in fdisk. You'll be better off if you create the partition structure appropriately for your machine in the beginning. Think of this as a kick in the head, to make sure you're paying attention at the start. You'll thank me for it in the end.)

/dev/hda ( 2 Gb. IDE drive )
/dev/hda1 ca. 900 Mb. ext2, root mount point, bootable
/dev/hda2 ca. 148 Mb, swap space
/dev/hda3 1063 Mb. partition type linux raid

/dev/sda ( 4 Gb. SCSI )
/dev/sda1 1061 Mb. partition type linux raid
/dev/sda2 1085 Mb. partition type linux raid
/dev/sda3 1000 Mb. partition type ext2 (initial /var mount point)
/dev/sda4 1300 Mb. partition type ext2 (/usr mount point)

/dev/sdb ( 2 Gb. SCSI )
/dev/sdb1 1063 Mb. partition type linux raid
/dev/sdb2 1082 Mb. partition type linux raid

Configuration of the machine with RAID capabilities requires that the linux kernel has RAID support enabled. While this can be done in the Debian installation, a special kernel will have to be compiled to support mosix, so RAID will have to be enabled in that kernel. This will be discussed in the final section of this "chapter", kernel creation.



Ralphzilla-faxa developed a parallel configuration soon after its initial configuration, when one of the 850 Mb. drives initially in the system started corrupting the resident file system. As I configured the replacement, it occurred to me that the initial configuration may not have allocated a sufficient amount of space for storage of faxes as they are being processed. (The intended working environment of this system will include support for applications that generate automatic faxes of lab results. While unusual, it is by no means beyond the realm of possibility that two applications, each generating 100-200 faxes, might be running at once. The load this places on the fax software will be examined at a later point, but it seemed clear to me that it would be quite possible that the holding area for outgoing faxes might be inadequate at 300-400 Mb.) I looked around and realized that I had several 412 Mb. drives pulled from old machines, and likely usable for relatively little. I placed one on each of the two IDE interfaces as the slave drive, partitioning them as linux raid, and creating a RAID 0 stripe set, which is mounted in the fstab on mount point fax_cache.


NEXT - Remainder of Basic System Setup
PREVIOUS - Initial Debian Setup