WAIS and HTTP Integration

Introduction

This document overviews existing methods for using WAIS as a back-end search engine for HTTP servers.

Information herein is currently experimental and may or may not work for you.


WAIS and Plexus

Plexus is a powerful Perl-based HTTP server written and maintained by Tony Sanders at BSDI.

WAIS and WN

WN is a multi-protocol server written and maintained by John Franks at NWU. It is shipped with support for WAIS as a back-end search engine.


WAIS and NCSA HTTPd 1.0

Rob McCool has written a CGI script which allows NCSA HTTPd 1.0 as well as other CGI compliant servers to access a WAIS database in the same way that is mentioned in this document. The script is in the CGI archive. It contains instructions for setting it up under HTTPd 1.0.


freeWAIS 0.202's URL Type

freeWAIS 0.202 is shipped with support for type "URL". Use of this type is a little tricky.

First, Mosaic 2.0 doesn't know how to deal with this type directly, but Mosaic 2.1 (when it is released) will.

Second, use of this type apparently implies overloading the "headline" of a WAIS hit with the URL. This is fine, except then the description that the user sees of a given document is the URL, and URLs are, as usual, pretty cryptic things to just throw in front of average users.

But anyway, here's how it works:

    waisindex ... -t URL what-to-trim what-to-add ...
So what does that mean?

Well, first, -t URL tells waisindex to use type URL (note use of lowercase -t in this instance).

Second, what-to-trim and what-to-add are parameters that tell the indexer how to put together the URL that's returned as the result of a query.

Suppose your documents are normally stored in /X11/mosaic/public. Suppose also that these documents are normally served via a URL that begins with http://hoohoo.ncsa.uiuc.edu:8080.

This means that a file stored as /X11/mosaic/public/foo.html, for example, is normally served as http://hoohoo.ncsa.uiuc.edu:8080/foo.html.

The waisindex command you'd use in this case would be something like the following:

    waisindex -d ~/localwais/sources/www -export
      -t URL /X11/mosaic/public http://hoohoo.ncsa.uiuc.edu:8080
      /X11/mosaic/public/*.html
... where ~/localwais/sources/www is the name of the WAIS index file and /X11/mosaic/public/*.html are the files you are indexing.

When queries are made on this database, the string /X11/mosaic/public is removed from the beginning of the filename of a matching file and the string http://hoohoo.ncsa.uiuc.edu:8080 is put in its place.

As per our previous example: /X11/mosaic/public/foo.html turns into http://hoohoo.ncsa.uiuc.edu:8080/foo.html as the result of a WAIS hit.

As you can see, this is perfect -- the WAIS server passes back the exact same URL that would normally be used to access this file via HTTP. So, everything from relative hyperlinks to relative inlined image references in the file will work correctly when the file is retrieved.


NCSA HTTPd Development Team / httpd@ncsa.uiuc.edu / 9-11-95