Document Home

Postgresql

If you've not built a package from source before, postgresql would be a good place to start. The build process is both configurable and relatively straightforward, so if something goes wrong you are likely to be able to figure out the problem. I'll step through the process, which should keep you from wandering off. Should you not wish to build, or if you've made an abortive run at compiling and just want to get on with things, you have two options. First, you can install the debian packages, which are in the non-us distribution. As with ssh, the only rationale I can see for this is in this time of fear of terrorist assault and heavy-handed lawyering the debian people are playing a strategy that removes any possibility that they will be held culpable because a terrorist used a database that included strong encryption and installed it in binary form from the debian site. Regardless of the merits of a regulatory stance that has been rendered moot by the widespread proliferation of those tools, I can understand why the people behind debian would not wish to expose themselves to the gratuitous posturing of lawyers. So they only put together a version with weak-encryption and put that in the non-us distribution, leaving those of us in the U.S. to build our own or install the non-us version. Not an onerous problem, but clearly if you or I can build the version with strong encryption, so can the bad guys. Circling the wagons doesn't keep one safe for long at the best of times, and in a context in which a quarter of the circle is missing it is clearly not a winning strategy.


Regardless, if you have configured the non-us distribution in the /etc/apt/sources.list file you should be able to just issue the command "apt-get install postgresql" and the apt-tool will fetch the packages. If you are in the U.S. and haven't configured the non-us distribution as a source, I wouldn't do that just to install postgresql. Rather I would suggest that you either download the debs from the debian website into a single directory on the target machine and install them from that directory using "dpkg -i *.deb" or download the latest binary rpms from the of the postgresql ftp sites into a single directory and do a mass conversion of them to debs with the alien tool ("alien --to-deb *.rpm") and then install the resultant debs as above. I can attest that the conversion will work on a woody box, but I haven't tried to install so there is some chance that there would be a problem during installation, but the set of files that are in the rpm set seems complete and they converted without problem, so I think the odds are strong that this would result in a viable installation. Although the rpm method involves an extra step, it has the advantage on involving the latest release of postgresql, while the testing non-us distribution, perhaps a release or two behind the latest, has the advantage of being tested for at least long enough that the environment as a whole is not considered unstable. At this point I would say that it likely doesn't make much difference one way or another. If you were launching into a production context you might want to make the conservative choice, but I doubt that is the case here. Do whichever you wish. Just make sure that you install the libraries that allow you to use perl to write procedures (libpgperl for debian and postgresql-perl for the converted rpms). We may want to make use of them later.



Enabling the Perl Interface

Enabling the perl interface in the database server itself also places a little more onto our plate when it comes to manually building postgresql, but what the hey, if you were the kind of person who took the easy road you wouldn't be here anyway, right? You'd probably be looking at becoming a lawyer. The material that follows builds on previous versions of the material that is now part of the Postgresql Interactive Documentation, available here. PDF versions of this material are available here. You may notice in those that "make" is spelled "gmake" ... don't worry about that. Postgresql is used on many platforms, some of which use different compilers. That is the way the authors assure that readers understand that the GNU tools are to be used. Since we're building on linux, we are using the GNU tools by default. (Unless, that is, you went out and did something weird without telling me.)


Anyway, the first thing that must be done is to compile a version of perl with support for shared libraries. Perl is, of course, the interpreted language that is commonly used to install and configure packages within linux. Perl's applicability is far more broad than that, but it is in that context that we are primarily concerned with it now. (Actually, there is a quasi-documented workaround that can be used to circumvent the requirement for shared perl libraries, but since we should be dealing with a freshly-configured machine, there is really no reason not to do this by the book.) While compiling a new interpreter for perl should be relatively straightforward, its import for the health of your system is non-trivial, and you should pay close attention to what is going on. That's not to say that you should freak out. The worst that could happen is that you'd have to re-install the system, and I can almost guarantee that you'll be doing that on a machine sooner or later. One of the tradeoffs of dealing with surplus hardware is that at some point, the hardware in some machine is going to start acting flaky, and you'll be reconfiguring or replacing it. That's why we have several of them.


The first step is to grab a copy of the latest production perl, available here. At the time of this writing that is perl 5.6.1. Download the gzipped source and extract and untar it to a directory on your machine. If you have configured a raid drive pack on the machine, that would be a good location for the directory, because disk io is generally an important factor in the speed of any compilation process. In fact, if you have a little time, you might want to compile once from a directory located under a mount point associated with a single drive and once from a directory under a mount point to which a raid drive pack is mounted. Time each of the runs and compare the results. In fact, if your configuration is similar to Ralphzilla-raider's, with both tow-drive and three-drive stripe sets, you might want to compile three times and compare the results of all three. Since the new version of perl will not actually be installed until you type "make install", you can simply delete any of those directories after you are done experimenting, with the exception of course of the directory from which you are installing. Compiling is also very dependent upon the amount of ram in the machine and the speed of the processor, and different packages use different strategies in compiling, so YMMV (your mileage may vary), but this is nonetheless an interesting experiment, and doing little things like this is how you learn. If the idea intrigues you, go for it! Think of it as kind of like demolition derby.


So now you probably want to know how to do it. First, enter the root directory of the source tree. If you've unzipped and extracted the tarball in the normal fashion, it will be named perl-5.6.1. Delete the files named config.sh and Policy.sh ("rm -f config.sh" and "rm -f Policy.sh") and type "./Configure". Remember that linux finds file names in a case-sensitive fashion. Therefore, you must type "./Configure" ... type the word in any other fashion and the file will not be found. Only two of the settings set by the script have to be changed from the default value. The first change you'll want to make is to change the default prefix to "/usr" from "/usr/local", and the second will be when you inform the script that you do indeed wish to build a shared libperl.so. When the script asks you a question like this, you need to change the default values, in this case "N" to "Y". Each of these are set well into the script, so pay attention so you do not miss them or change the wrong one. (On the other hand, don't worry too much about it if you miss one or the other. You can either keep hitting enter until you finish the script, not a bad idea if this is your first time through, or press ctrl-c to abort the script. Delete Policy.sh and config.sh again if they have been created, and start all over again.) Once you've made those changes, you can just keep hitting enter until the script finishes.


At this point you should type "make", and the actual compilation process will begin. As with other packages we've built this is entirely automatic from this point and likely to take a while, so if you are doing this in a telnet session you can minimize the window and check your email or something like that. Once control of the session returns to you, type "make install" to have libraries and executables copied into the appropriate directories.


As I've mentioned, perl is used by some debian system configuration utilities to install and configure packages. From time to time after having installed a new perl I've had trouble with the package setup process not working, and those errors can be very difficult to track through. Before you go any further, you should double-check the integrity of that setup by trying to install a package. The package does not have to be large, virtually anything will do. One package that would make a good candidate is libdbd-pg-perl, in effect the perl database interface for postgresql. Installing this should also install the perl dbi package, which is what enables the use of dbd drivers, so it is a good test of the integration of your system configuration utilities and perl. Try installing the package with "apt-get install libdbd-pg-perl". If you get an error it will generally be "can't find debconf.pm in @INC... " @INC is the search path perl uses when looking for the perl modules that extend its functionality. To an extent, you can get by with this error, but it is fairly easy to fix. While there are a number of more elegant ways to do this, all you really need to do is move the Debconf directory from whatever location it is in to a directory in the @INC path, and those directories are mentioned in the error message. When I had it happen as I was writing this, for example, I just moved that directory from /usr/share/perl5 to /usr/lib/perl5/5.6.1, i.e., "mv /usr/share/perl5/Debconf /usr/lib/perl5/5.6.1". That should fix the problem. Indeed, while you're at it, you might just as well move all of the directories to the new location. You'd likely be doing it soon enough anyway.


I've seen the dbi interface and the postgresql dbd driver described as an alternative way to include perl as a language for providing the functionality for internal database triggers and procedures. Sound confusing? Generally, the dbi interface and the dbd driver are used by perl scripts as a way to communicate with the database server. The odds are good that when you access a web site that displays dynamic content, such as amazon or ebay, there is a perl script being invoked by the web server that is used to retrieve records from a database server through the use of such an interface. I have perl scripts operating in my environment that do things like parsing the text log written by the squid server and inserting the records into a database server, and another that reads records from that server, converts the data in the record into a form better for analysis, and inserts them into a second server on another machine. When used internal to the database server, such language interfaces implement generally rather small snippets of code to do things like enforce rules concerning the structure of the database (e.g., if a record exists in a table of charges there must be a record for the customer with whom the charges are associated in the customer table) and perform housekeeping operations (e.g., at 3 AM each morning copy records more than a month old into a records database and delete those records from the main database). I've not worked through how the dbi-dbd interface provides the internal functionality, at this point we are going to keep them conceptually seperate and may revisit the issue as a sidebar at a later date.



Building Postgresql

Once perl has been built with shared library support, you can configure, build, and install postgresql. Download the latest gzipped package from one of the ftp sites here, and extract it to a directory where you have space. (While you can select parts of the full postgresql distribution to download, I don't think that makes a lot of sense. Postgresql 7.2.1 is approximately 9.3 Mb. in size as a gzipped tarball. Even over a relatively slow modem, that might take 30-45 minutes to download. The hassle of recognizing that you need to get something else, and going out there and getting it, would take far more time that you'll be able to save by cutting 750 Kb from the total download.) As was the case when you built perl, if you have a RAID drive pack mounted you should locate that directory under the mount point for the pack. As with perl, the configure script for postgresql is in the root directory of the postgresql source tree ... for postgresql v. 7.2.1 that is postgresql-7.2.1. Unlike the perl configure script, the postgresql script runs largely automatically, making sure that requisite libraries are present. Whatever customization is done is accomplished by calling the configure script with switches, that set things like the prefix of the destination tree and the default data directory. My personal preference is not to worry about those because it is easy enough to set them as you start the daemon, either via a script or manually. The one switch you should set is the one that enables the perl interface in the database server, the resultant command would be "./configure --with-perl".


There is some chance that at some point the configure script will bomb out because a library required to build is not available. These errors can be a little difficult to fix, but generally only because the library mentioned in the error message may not be identified in a corresponding fashion as part of any debian package. There is really no way around this. If you get such a message, with postgresql or any other package you are trying to configure, I would suggest that you first search the descriptions of packages in the testing distribution here for the library name mentioned in the configure script error message. If you find something there, of course, your first response would be to install whatever package was listed by the search. This can be an effective strategy, if somewhat shotgun-like. While it could be a problem on space-constrained systems, if you are setting up a space-constrained database server you probably need to rethink your cluster configuration. Part of the appeal of this approach lies in the fact that it represents something of a mirror image of the process that results in error messages. For those relatively new to linux, you will find that in a fashion very similar to the Windows world, an error message can be a manifestation of an error condition that exists at two or three removes from the point at which the error occurs, and knowing what actually produced the error at times can require reasonably deep knowledge of linux, perl, or C, and sometimes all three at once. (It's not really that bad, I just liked saying that.) To cut to the chase, you can use the power of debian's package management system as your ally. Since that system will fill in the requisite packages behind the one you requested, you can often fix a problem without ever having traced the specific root of the problem. As time goes on and your familiarity with linux grows there will likely be times when you recognize specific problems, but still use the package management system to fix it simply because of the time it saves.


At other times, however, you will be forced back on a strategy of doing web and newsgroup searches on file or library names at google and downloading gzipped tarballs of the source of what you find, installing them manually. At such times you can find yourself downloading and building a package or library, and when you re-run configure it bombs out just a few lines further along than you were before, forcing you to go through another iteration of the entire process. This can happen several times as you install a single package, although that would be a fairly extreme instance. There will be times, however, when the process seems just that frustrating. That's just part of the deal, think of it as a learning experience.


Once the configure script has executed, finishing the build process is as simple as typing "make", and "make install". As with other build processes, this can take some time. One additional thing must be done while we are in the build process. While we have enabled support for the perl interface, we have to actually build the modules that provide that capability. To accomplish this, descend in the postgresql source tree to the directory postgresql-7.2.1/src/pl/perl. At this point type "make clean, make all, make install" either all at once or as a series of three commands.



Initializing Postgresql, and Starting the Server

If you didn't change the default location of anything, the default location of the postgresql files is in the directory /usr/local/pgsql and the default location of the data directory is /var/log/postgres/data. If you have created a RAID 0 stripe set to hold the data or have otherwise planned to put a set of databases in a custom location, you must first create the data directory to hold them. Following the model of Ralphzilla-raider, the following example will create a directory under the mount point /data. In that directory I created the postgres directory to hold the files created by postgresql. (In the parlance of postgresql, a group of databases serviced by a single server is called a "database cluster". That is what we are creating at this point. I have been avoiding the use of that term, and will continue to do so, because I don't want anyone getting confused over terminology.) Once you have created the directory you should make the postgres user, which was created by the "make install" script, the owner of that directory. ("chown postgres postgres", assuming that you are root and in what corresponds to Ralphzilla-raider's /data directory). Then su to the postgres account ("su postgres") and create the data directory with initdb ("/usr/local/pgsql/bin/initdb -D /data/postgres/data"). At this point you are ready to start the database server. We're going to do this without worrying about access control right now because there is still a little housekeeping to be done. Start the database server with the command "/usr/local/pgsql/bin/postmaster -D /data/postgres/data".


We're now going to create the language interfaces, pl/pgsql - the default postgresql interface, as well as plperl, the perl interface. This can be done database by database, or as the default condition for every database stored in this location. I'm going to follow the latter course, as it strikes me this is the more likely scenario, to either enable the interface or not enable it at all. Postgresql uses a base database structure, called template1, from which all of the databases created at that location inherit certain characteristics. When a language interface is enabled location-wide, support for whatever languages are enabled is included in template1. To create the default sql subset interface, type "/usr/local/pgsql/bin/createlang plpgsql template1". Assuming that you have built the shared library version of perl and the perl language interface, as described above, the perl interface should install without difficulty with the command "/usr/local/pgsql/bin/createlang plperl template1". Unfortunately, that is not currently the case. In the current configuration of Ralphzilla-raider, I get an error associated with the GCC linker when I attempt the createlang command with the plperl interface. The system should be configured appropriately. After doing some research I've found a real dearth of material on the net regarding this problem, and it could be associated in some way with the unique characteristics of our installation, possibly the kernel. Rather than getting stuck here, I'm going to move forward with other aspects of the cluster, and revisit the issue when it actually becomes pertinent.


Access Control

One strength of the configuration of Ralphzilla is that it creates an environment with the potential for a high level of data security. My current intent is to create a fairly simple database application in which all user actions are supported through a browser interface and the database server is accessed through cgi scripts that are called by the browser. That's not something that will happen overnight, nor will it start tomorrow, although I am giving thought to what the best sample application would be. (Suggestions are welcome. Feel free to drop me a line here.) Detailed discussion of access control and data security will be an element of that discussion, but it is appropriate at this point to address the basic elements of how access to the database is controlled in postgresql. There are many permutations of the general schema that could allow you to generate unique access control frameworks. Ironically, the nature of Ralphzilla renders the task of configuring access control on Ralphzilla-raider very straightforward. The pga_hba.conf file, located in the data directory, controls the authentication process. For ip connections, access rules are established by a series of records that follow the following format:

host dbname ip_address address_mask userauth auth_argument


For the purpose of this discussion, we are going to concern ourselves only with allowing the database server to accept connections from the specified machines, as long as the connection is as a user who has an ID in the database. As I said above, the structure of the cluster makes this specification very easy. Since all the machines in the cluster are in the class c network 192.1.1.x, and there is no need to restrict the range within that network, the relevant line to add to the pga_hba.conf file is:

host all 192.1.1.0 255.255.255.0 trust


If you read through it you can see that it says that a trust relationship exists from all machines in the 192.1.1.0 network segment to all databases. If we had serious data in the database we would not leave it in this condition, even semi-hidden within the cluster as it is. What we have created to this point is a database server blank slate. As we write upon that slate we'll also work on controlling who gets to see the data.