Use of the following software is assumed:
WAIS provides advanced server-side search and retrieval capabilities, including support for binary datatypes and very fast searches of the entire contents of large textual databases.
Use Mosaic as a front-end client and WAIS as a back-end server and you can provide your users with a friendly yet powerful window into your information universe and sophisticated query, retrieval, and indexing capabilities.
Download and install the freeWAIS 0.202
(or later) distribution from the UNC SunSITE
FTP server. Installation instructions are in the file
INSTALLATION in the freeWAIS distribution.
You can place data files of any type in a WAIS database; possibilities include HTML documents, plaintext documents, GIF images, audio files, and so on. In the following example, we will assume there will at least be HTML documents in the WAIS database, and possibly other types of files as well.Create a directory (e.g.
~/fluff) and put copies of all the files you wish to place in the database in that directory. Make sure they all have relevant extensions (e.g.
".html"for HTML documents,
".gif"for GIF images) to make life easy for you in the short term.
Create a directory (e.g.
~/localwais/sources) to hold the WAIS index file for your
database. This index file will be created automatically by the WAIS
waisindex, and will be consulted by the
WAIS server program,
waisserver, when clients ask the
WAIS database for query information or specific documents.
Create and run a shell script (call it
doindex) that will index all of the files in
~/fluff and place the resulting index file in
~/localwais/sources. The following is such a shell
#!/bin/csh # Go to the directory with the documents to be indexed. cd ~/fluff # Create index, initially with HTML documents. waisindex -export -d ~/localwais/sources/marc -T HTML *.html # Add plaintext documents to index. waisindex -a -d ~/localwais/sources/marc -T TEXT *.txt # Add PostScript documents to index -- index contents, why not? waisindex -a -d ~/localwais/sources/marc -T PS *.ps # The following types are all indexed without contents # (thus use of the -nocontents flag). So all you can # do is search on filenames... # Add GIF images to index. waisindex -a -d ~/localwais/sources/marc -T GIF -nocontents *.gif # Add RGB images to index. waisindex -a -d ~/localwais/sources/marc -T RGB -nocontents *.rgb # Add HDF data files to index. waisindex -a -d ~/localwais/sources/marc -T HDF -nocontents *.hdf # Add audio files to index. waisindex -a -d ~/localwais/sources/marc -T AU -nocontents *.au
waisindex, the program that looks at files you are adding to a database and adds information about them to the database's index file. The information built up in that index file is used to allow very fast searches to be made across the entire contents of the files in the database.
-exportflag, which specifies that the database we're creating is to be made available over the network (the actual effect is to make sure that the database has a reasonable name).
-aflag to tell the indexer to add to an existing index rather than creating a new index. (The first call to
waisindexcreated a new index.)
-d ~/localwais/sources/marcarguments to
waisindextell the indexer what the name of the index should be. Since a single WAIS server can serve multiple WAIS indexes (databases), all the indexes are commonly kept in a single directory (in this case,
~/localwais/sources) and each index is given a distinct name (in this case,
-Tflag to specify the type of the files being indexed at that time.
WAIS types have historically been ad hoc but straightforward --
TEXT for text files,
GIF for GIF
images, etc. Mosaic recognizes these ad hoc types using a method
that the author thinks is actually pretty damn slick -- a WAIS
type retrieved as the result of a query is matched to a MIME type
as though it were a file extension.
In other words, since a file with extension
is normally considered plaintext (MIME type
text/plain) by Mosaic, a WAIS query result of WAIS
TEXT is also considered
Similarly, if Mosaic were configured to recognize file extension
".foo" as MIME type
a WAIS query result of WAIS type
FOO would also be
considered of type
(Note: At some point in the future, WAIS will start using MIME types directly. Mosaic supports this already: if a WAIS type corresponds to a MIME type that Mosaic understands, then Mosaic will recognize that and act appropriately.)
-nocontentsflag is used while indexing binary filetypes for which it would make no sense to actually index the contents. (E.g., indexing a GIF file's binary contents would do nothing useful.) Use of the
-nocontentsflag means that only the filename for each file being indexed is added to the index.
waisindexcan be made recursive -- files in subdirectories will be indexed also -- via the
-rflag (which we don't use in this example).
waisserver-- the WAIS server program -- and therefore make your new index available to Mosaic clients over the network, construct and run a shell script (call it
doserve) that looks like this:
#!/bin/csh # Go to the directory containing the WAIS sources. cd ~/localwais/sources # Start the WAIS server in standalone mode; # have it use port 2010. waisserver -p 2010 &
The URL for connecting to the server from Mosaic is:
wais://machine:2010/marcIn this URL,
machineis the name of the system on which you are running the WAIS server.
2010is the port you chose to run the WAIS server on, and
marcis the name you gave the WAIS database.
When you do a query on your new database, the resulting URL will look like this:
queryis the search string you enter.
A WAIS gateway, in this context, is a server that accepts a query from a Web client via HTTP, issues a query to a WAIS database on behalf of the client, post-processes the results of querying the WAIS database, and returns the information to the Web client (again via HTTP). The purpose of this is to provide access to WAIS databases by clients that do not speak the WAIS protocol natively.With Mosaic 2.0 and some of the other more advanced Web clients coming along now, the rules are changing, since it is now possible to have the same client capable of accessing both the normal range of Web servers (HTTP, Gopher, FTP, NNTP) as well as WAIS servers, without requiring a gateway at any stage of the information retrieval process.
But, many Web clients still don't have native WAIS support -- two good examples are NCSA Mosaic for the Mac version 1.0 and NCSA Mosaic for Microsoft Windows version 1.0. Those clients still must go through a WAIS gateway, as must any instance of Mosaic for X version 2.0 that isn't compiled with native WAIS support.
The big catch here is that, at the present time, the WAIS gateways available on the network don't do a good job of providing full access to WAIS databases. In particular, access to anything other than plain text files is likely not work, and multiformat query responses (see below) will not work.
The solution is to write a better WAIS gateway, probably based on the native WAIS support in Mosaic 2.0. We'll probably do that at some point, but it isn't done yet (that I know of).So what do you do if you want to provide WAIS databases to people using various Web clients, some of which don't support native WAIS?
Web clients without native WAIS access should be set up to
automatically use one of the public WAIS gateways (probably either
NCSA's or CERN's) to handle
Mosaic for X version 1.2 and earlier did not do this properly, for which we are ashamed, but Mosaic for X 2.0 will do this properly if it's not compiled with direct WAIS support.What this means is that a
waisURL that looks like the following:
...should be automatically converted to a URL that looks something like the following:
www.ncsa.uiuc.edu:8001is the address of the public NCSA WAIS gateway; everything after the first single slash in this URL is exactly the same as in the original
waisURL. This should give the gateway all the information it needs to access the specified WAIS database and provide the non-native-WAIS client with the equivalent of direct access, with a minor performance hit.
So, that's a stopgap solution that will provide transparent access to at least text files in WAIS databases by a wide range of Web clients.One final note: If you happen to be using a Web client that is lacking both native WAIS support and the ability to automatically feed
waisURLs through a gateway, your remaining option is to explicitly use the
waisURLs as shown above. This is not a good solution and hopefully it won't ever be necessary in the very near future.
A big problem here is that, using WAIS as it currently exists as the search and retrieval engine for existing sets of HTML documents, any and all relative links and relative pointers to inlined images in all indexed HTML documents will break.
Why is this? Well, when you retrieve an HTML document from a WAIS server, the URL corresponding to that document will be an encoded WAIS "docid", or document identifier. This docid is not the same thing as the path and filename of the file that you're retrieving. (In fact, it looks like a horribly mangled stream of random and spurious bytes -- its structure and meaning are definitely not transparent at the user level.)
So, when an HTML document contains a relative link or inlined image
pointer, the document is pulled over via WAIS, and Mosaic tries to
resolve the relative link into an absolute URL by combining it with
the URL for the current document
... -- well, it just
One near-term but generally undesirable solution is to always use absolute URLs for hyperlinks and inlined images in all HTML documents on your server.
The real solution is for HTTP servers (which, of course,
commonly use URLs that correspond exactly to directory and file names
and therefore allow relative links to freely work) to use WAIS as a
search engine only -- and to make sure that URLs given to
browsers as the results of searches are exactly normal
This is completely technically possible and will be more and more common in the very near future. An experimental WAIS back-end interface that provides this functionality is known to exist for Plexus, and either that interface or something similar will eventually be made available for NCSA httpd (and presumably other HTTP servers). I'll attempt to stay up to date on the progress of these efforts and roll the results of ongoing work into this tutorial.One more thing: WAIS is evolving towards greater separation of indexing and retrieval. It should eventually be possible to have WAIS itself return arbitrary URLs (matching, say, the actual directory and file names of files it indexes), which would allow relative links to work. This is an intriguing idea because it would mean that you could potentially run an entire standard Web server entirely with a single WAIS server.
(See experimental information on integrating WAIS and HTTP servers.)
This is a useful capability if, for example, you have a set of images, each of which has a corresponding text description. You can set up your WAIS database in such a way that the text descriptions are searched, but appropriate images are given to the user as a result of successful search hits in the text descriptions.The following describes how to set up a WAIS server to return multiformat responses. We'll assume you're using the
doindexscript and directory structures as given in the examples above.
Create a directory called
~/multifluff. This is where you'll put all files to be
indexed with WAIS's multiple format support.
A condition of freeWAIS's multiformat support is that the various files follow certain file name and extension conventions very closely.Place the various text files and associated GIF and PostScript files in
We'll assume, for this example, that you have a set of text files; each text file has either an associated GIF image, an associated PostScript document, or both a GIF image and a PostScript document.
You will give all the text files the extension
".TEXT", all the GIF files the extension
".GIF", and all the PostScript files
".PS". Note use of uppercase.
It is assumed that related files have the same name, with the exception of the extension -- in other words,
"foobar.GIF"will be considered to be related.
"blorf.GIF", however, will not.
~/multifluff. Be sure they have appropriate filenames and extensions, as described above: filenames match for related files; extensions are
Add the following lines to the end
# Go to the directory containing the files in multiple formats. cd ~/multifluff # Index *.TEXT and associate *.GIF and *.PS. waisindex -a -d ~/localwais/sources/marc -T TEXT -M TEXT,GIF,PS *.TEXT
waisindex: the types in the comma-separated list following
-Mare used by the indexer to determine how to tie different files in
~/multiflufftogether. A given query will be able to return a matching
TEXTfile as well as an associated
GIFimage (if one exists with the same filename and extension
".GIF") and an associated
PSdocument (if one exists with the same filename and extension
Example: Here's an example set of files that you might place in
crufty.GIF crufty.TEXT maybe-marc.GIF maybe-marc.PS maybe-marc.TEXT tarot.PS tarot.TEXTAfter you index these files as described above, a query on
"crufty"should return a hit corresponding to
"crufty.TEXT". When you access that hit, Mosaic should tell you that you are at a "Multiple Format Opportunity" and present you with a menu from which you can choose
Works for me!
Important Note: Mosaic for X version 2.0 compiled with direct WAIS support is the only Web client known to actually handle multiformat responses. The modifications we made to the common Web library WAIS code to make this happen should be easy to roll into other clients, but to our knowledge no one has yet done so, and certainly no gateways will be able to handle multiformat responses.
However, this is a quite powerful capability, and if you are able to assume use of Mosaic for X, we certainly suggest you give it a shot and see if it works for you.
firstname.lastname@example.org we'll try to help you. You can also post questions to the Usenet newsgroup
comp.infosystems.www, which the Mosaic authors read.