Document Home

MOSIX System Communication Issues


Preface
When I originally configured the cluster I used the secure shell (ssh) tools, as described below. As that narrative indicates, the installation of the sshd daemon and ssh clients is detailed, but not overwhelming. At some point within the past few weeks, however, the tools enabling the cluster to use ssh have stopped working. My current suspicion points toward problems associated with the pluggable authentication modules (PAM) and their interaction with ssh. While I intend to diagnose the origin of those problems, it is not currently my inclination to hold up the work that I am doing elsewhere on the cluster to do so. Hey, it's my choice. Since this has happened to me, however, I have to recognize that it could happen to you as well, so I'm first going to add a section on configuring around rsh, which will be followed by the bulk of the original section on ssh. If there seem to be some temporal discontinuities as you read this, recognize that it's been written at two, and ultimately three, seperate periods. I will ultimately add the trouble-shooting trail for the ssh difficulties to the end of that section.


If Mosix is to function appropriately, a trust relationship muse be established between the machines in the cluster. Specifically, the root accounts of the machines within the cluster must be able to access the memory space of the other machines in the cluster. Therefore, the root account on each machine in the network must have root rights to every other machine in the cluster. What is described here is the most basic way to get that relationship established. It should not be considered as representing a secure environment for sensitive data. Securing the environment will be discussed in a later section. Our purpose at this point is to ensure that the cluster is functioning, then we will discuss securing it. Again, do not operate your cluster with sensitive or confidential data using the configuration described here. Address the issues raised in the section on securing the enviroment before you do so, and even then recognize that it will be based on information available in March of 2002. Securing your system is your responsibility, not mine. And may your lawyers be eaten by swarms of silicon bees.


There, now that's out of the way, we can talk about real stuff.



RSH
The one advantage that rsh does have relative to ssh is ease of configuration. There's really not much to it. Since the time of my original cluster configuration, the rsh client and server packages are again part of the woody distribution, so they can be installed by issuing "apt-get install rsh-client rsh-server". After the packages have been installed, the required functionality is enabled by creating a file with names of the other machines in the cluster followed by a space and the word "root" (without the quotes), one per line, naming it .rhosts, and placing it in the /root directory. A sample .rhosts file is included here. In effect, this tells the machine that remotely submitted commands from the root account on the specified machines should be executed as the root user, since the .rhosts file is in the home directory of the root user. If the file were in the root directory of the user joe, the root accounts on the other machines would connect as joe. If joe were the user specified in the .rhosts in the /root directory, the user joe on each of the other machines would be able to remotely execute commands on the subject machine as the root user.



SSH
You will recognize from what follows that at the time of initial setup I pursued this far enough to achieve cluster operation within the constraints of the packages installed by Debian. I would suggest that you do the same. If something breaks as we start tightening down on security we will be in a much more manageable context than if we were to go at it the other way around.


In any event, the initial configuration under which the machines in Ralphzilla operate uses ssh rather than rsh, because I could not find a package containing rsh in the testing distribution at the time I did the initial installation. Unfortunately, as I attempted to fully integrate ralphzilla-b-free into the system after re-formatting and re-installing to test optimal installation I discovered that the ssh client and daemon packages had been removed from the main Debian distribution and as I had propagated the updated packages files through the local installation I had lost the ability to install the package from the local archive. There are packages installed within the cluster for which that would not yhave represented a problem. SSH is not one of them. The prospect of potentially having differing versions of ssh and sshd, dependent on different configuration files was not one that I relished. Therefore, I decided that it was appropriate to download and build copies of the requisite files to distribute to the cluster machines. The following section details this process.


Building SSH

It is not a big deal to build ssh, it just takes a little time. Once it has been done, the generated executables and configuration files can be easily installed on the other cluster machines. In the process of building ssh you will gain a little knowledge regarding how ssh works, which is A Good Thing.


You will have to procure two source code packages to compile ssh. (The following describes the build process for the Openssh implementation of ssh. If you desire to build another version you are of course free to do so, but the following discussion is likely to be somewhat different from what is required to build your implementation. I would also suggest that you stick with the same implementation throughout your cluster, for obvious reasons.)


Two source code packages are required tk build the Openssh Implementation - the ssh code itself and a secure socket library implementation. I used that available from the Openssl project at www.openssl.org. At the time of this writing the full openssl implementation included a set of libraries and an engine used to implement interaction with other ssl implementations. You do not need that engine for use in the cluster, although I can conceive of circumstances in which it would be handy to have on the cluster controller. The code version installed is 0.9.6c. It is, of course, possible that if you have a different version there will be some slight differences in the process described here. In any event, download the gzipped tarball of source code and extract it. If you want the installation to correspond to the default debian file structure, issue the following command "./config -prefix = /usr " in the root directory of the extracted source tree. If for some reason you wish a dufferent structure read the INSTALL file in that directory. Once config has created the makefile, just type "make" and, when that finishes, "make install".


The relevant copy of the Openssh source is available from this page. Again, download and extract the gzipped tarball. To configure thge makefile to build an installation conforming to Debian norms, issue the following command from the root of the resoultant source code tree: "./configure --prefix=/usr --sysconfdir=/etc/ssh". As before, the INSTALL file will help if you wish to install to an alternative structure. "make" and "make install" will, of course, build and install the package.



Copying Required Files to Another Machine

WARNING: It is entirely possible that the process I am about to describe will not install the full ssh functionality. For that reason I would suggest building the package on your cluster controller because if you do have a need for full functionality it will be on that machine. The process described here will provide you with a working client program and a working server daemon, and these represent what you need to install to allow process migration within the cluster.


Before I start, I'll say that at some point I'll probably script all this, but you probably wanted to know what was going on anyway, right? First, I would suggest making a directory tree to hold the files that will be copied to other machines. For example, I created a directory named "ssh" in my home directory, and under that created subdirectories named "bin","sbin", and "etc". Logically enough, I copied the files starting with ssh* from the /usr/bin directory to the bin directory, sshd from /usr/sbin to the sbin directory, and /etc/ssh/* to the etc directory. You also want the ssh script from the /etc/init.d directory. I then used ftp to copy the files into a similar structure on the target machine. (Given that I routinely configure samba on cluster machines, I could have mounted shares and copied the directory structure in its entirety. If I had been moving the files to a large number of machines I probably would have done that.) Once on the target machine it is a straightforward matter to copy them to the appropriate locations. After you have placed the files, make sure that those in /usr/bin, the sshd daemon in /usr/sbin, and the ssh script in /etc/init.d are flagged world readable and executable (e.g., "chmod a+rx /usr/bin/ssh*")


The encryption keys that were generated on the source machine are not valid for the machine to which you copied the files, and as a result you will need to regenerate the keys for that target machine. You can generate the full set by issuing the following command in the three variants shown:

ssh-keygen -t rsa1 -N "" -f /etc/ssh/ssh_host_key
ssh-keygen -t rsa -N "" -f /etc/ssh/ssh_host_rsa_key
ssh-keygen -t dsa -N "" -f /etc/ssh/ssh_host_dsa_key

At this point you should be able to test the sshd daemon by simply typing "sshd". If the daemon does not load, the error message(s) displayed should give you a good indication of what the problem is. If the daemon does load, you can use the update-rc.d script to update the links used in the system initialization process to include loading the sshd daemon. (From the /etc/init.d directory, issue the command "update-rc.d ssh defaults".)


To allow the root accounts of the machines in the cluster to exchange processes, ssh must be configured to allow such communication to take place without requiring password authentication. Within the cluster, I've done this in the most basic way possible. Line 28 in the sshd_config file is changed to allow ("RhostsAuthentication Yes"), this tells the sshd daemon to allow such access to the root account from the host in question as long as the host is specified inb the /root/.rhosts file. (Other accounts can be granted such access if the account name follows the host name in the .rhosts file. That is not really pertinent to the situation at hand, for further information see the man page for .rhosts). Once the sshd_config file is modified and the .rhosts file specifies every machine in the cluster, the root accounts will have that access, given that the sshd daemon has been restarted on the machine in question.


Obviously, this implies that the root account on any given machine can create havoc with any or all of the machines in the cluster. This vulnerability, however, is absolutely required for the cluster to work. The key to securing the cluster is restricting physical access to the hub/switch/whatever that provides the communication medium, constraining the traffic that can pass between the network interfaces in the cluster controller (the firewall), and protecting the password of the root account on the cluster controller, in this case Ralphzilla. The cluster controller is, after all, a member of the cluster. We will discuss security in a later section.




Next - Application Software