Document Home

Apache

If we're going to install and run an application as a browser-based application, then we're going to need web server software to respond to our requests. In some senses, the web server acts as a coordinator of the requests your browser sends, based on the definitions you create when you configure your documents. When you are returning static html documents, like the page you are reading now, it is a fairly straightforward matter of sending back the html docs requested by whatever page is held in the browsing computer's memory. In other cases, it can be a great deal more complicated than that.


We are going to use the Apache web server package to fill this role. Apache is the most widely-used web server in the world, running on a wide variety of platforms. I'm going to leave unaddressed the issue of whether it is the best web server, but it is very definitely a capable one. I think that its popularity is primarily attributable to the fact that it is free, and because it can run on relatively minimal hardware. I have seen it written that a pentium 75 linux box running Apache and serving static web pages is capable of saturating a 10 Mbps ethernet connection, and I see no reason to doubt that. Our baseball application, of course, is not comprised of static web pages, but neither do we require it to sustain the kind of throughput that amazon.com does.


There are two basic models for executing perl scripts from an apache document: basic CGI and CGI scripts written to to execute under mod_perl. The difference lies in the fact that basic CGI scripts written in perl are executed by invoking a copy of the perl interpreter when the script is started, while mod_perl involves using a copy of the perl interpreter compiled into the apache server itself. (If you do much investigation into the Apache world you'll discover that extensions to the basic apache executable are prefaced with "mod".) While you can use a variant of standard cgi execution that registers the scripts with Apache to achieve speed gains, executing scripts under the version of Apache with the embedded interpreter results in much faster script execution than when the interpreter has to be invoked at each script execution. Sites that use perl to do substantial back-end processing generally use mod_perl. We will do things both ways, so you understand the differences that come into play when you shift environments.


There are a number of perl modules allow you to generalize the site development and maintenance process, ranging from those that allow the incorporation of templates into your scripts, simplifying html generation on the fly, to modules that implement entire frameworks under which your application is run. If you are confused by this, don't worry. We're going to step through the entire process, developing bits and pieces here and there and rolling them into the next level. In some of these circumstances the implementation of a given tool may represent my own first use of that tool. If you are following along, working to pick this stuff up, my suggestion would be that you work through any given piece of this application, and then attempt to extend that example into a context of your own choice. I generally find that it is a good idea to pick a project that is similar to the one under discussion, but with significant differences to force yourself to think through some things on your own. Bowling or basketball leagues? Not bad choices.


Initial Apache Installation

If you've never done this before, I can guarantee that you will be amazed at how easy it is to get apache configured and running. I can almost hear you saying "Whoa ... is that all there is to this?" Don't worry, it gets harder.


First, of course, you have to install the apache package itself. This is one installation in which I would definitely suggest that you stick with the debian packages. We're going to be doing a bit of switching around once we get things up and rolling and we need the advantages offered by consistency in setup. The importance of that is accentuated by the fact that the apache packages maintained by different groups, whether pre-compiled binaries or gzipped tarballs of source code, often have different scripts packaged with the distribution. This is partially attributable to differences between linux distributions and partially attributable to the very popularity of apache, which I suspect makes everyone want to add wrinkles that they think make sense. Regardless, if it's okay with you we'll just stick to the debian versions. (Actually, even if it is not okay with you, I'm still going to do it anyway. So there.)


Therefore, the first thing you should do is issue the command "apt-get install apache". Once the package has installed you should familiarize yourself with some settings in the httpd.conf file, located in the directory /etc/apache. This file largely configures the operation of any given instance of apache. At this point, there are really only two lines that you need to worry about. The first is the directory in which apache looks for html documents, known a DocumentRoot. On bridget, this setting is configured on line 314. By default, this is the directory "/var/www". Therefore, if I were installing the ralphzilla site (the pages you are reading) on bridget and wanted it to be accessible as http://bridget/mosix/index.html I'd create a directory named mosix under /var/www (/var/www/mosix) and copy the site structure under that directory. The second line that you need to verify is that which specifies script locations. In the file on Bridget this is on line 563. This line defines the value for ScriptAlias, which means that apache will treat files located in this directory as files to be executed. This value should be "/usr/lib/cgi-bin". If it's not, I'm surprised, but don't change it now. If you were to change it there are some other things we would have to change, and we don't need to mess with that right now. Just note what the directory is ... we will need to know that after a bit.


Now you have a sense of how apache is set up to run, you can just type "apachectl start" to fire it up. That should start several instances of the server with the settings defined in the httpd.conf file. The apachectl script should be in the /usr/bin directory ... if it isn't found, check to make sure that it is in fact there and check the contents of the path environment variable. Assumin that the script did run, list the running processes ("ps ax"). You should see several (by default, five) instances of "/usr/bin/apache" running. If you do not see them, look in the apache error log (/var/log/apache/error.log). Hopefully the log will give you sufficient diagnostic information to track down the problem. I rather doubt, however, that you'll have any problems.


You should now have apache up and running. For those of you who have never set one up, we'll create your first web page. If you don't have one, download a freeware html editor (see the appropriate section in links) and install it. Create a new html document (depending on the editor you are using you may be asked to set a few parameters for the page ... don't worry about those settings now) and an html doc that would produce a blank page will be created. The result should look something like the image at right. (Different editors will put in different sets of lines at the very top of the page. Don't worry about that right now, we'll discuss some of those later and you can readily find out what the others represent by finding a good html reference on the web.) Look down through the generated material until you run into to the <body> line, which will be followed , generally after a few blank lines, by a </body> line. In the example here you've probably noticed that there are several settings between the <body> brackets, and being the quick individual that I know you to be (you must be very bright, after all, because you've perceived the wisdom in continuing to march through line after line of my document) you've quickly recognized that these set the default values for the appearence of the text on the page. If there are no values at all after body in the brackets the default settings for the local browser will be used when the page is displayed. Between the two body lines type "Hello, world", with or without the quotes. Save the file as index.html in the /var/www directory. (There is a default index.html file located in the /var/www directory that is an introductory page to apache on debian. If you want to keep that file rename it to something else, e.g., old_index.html. For your page to be the default page for the server it has to be named index.html.) Make sure it is world-readable. You can restrict access to html docs to certain id's if you wish, we'll get into that later, but that is not what we're up to right now. Open your browser, and just type the name of the cluster controller into the address window (e.g., "http://ralphzilla"), and hit enter. Presto! You're a rocket scientist!