w4ais 3.1

derived from EIT's wwwwais 2.5 by Miles O'Neal

Contents


What is w4ais?

W4AIS is a small ANSI C program that acts as a gateway between programs that create indexed catalogs of files and a forms-capable World-Wide Web browser. With the freely distributable freeWAIS or SWISH packages, this program, and your local Web site, you can:

For an example of what the program can do, try http://www.rru.com/~meo/w4ais.html and search for any of the examples listed.

This program is an extension of EIT's wwwwais, which in turn was very loosely based on the Perl waisq interface program that comes with NCSA's httpd.


Great! How do I get started?

First, you need SWISH, freeWAIS or some other WAIS software, and a forms-capable web browser.

SWISH

Unlike the original wwwwais, w4ais is currently only tested with SWISH. The others should still work, but I haven't tested them, so I really don't know. SWISH (Simple Web Indexing System for Humans), is a program that does many things similar to WAIS but is highly simplified, so it's relatively easy to set up and install.

You may want to put SWISH things somewhere such as /usr/local/httpd/conf/swish. You may also want to create a directory to hold SWISH databases, such as /usr/local/httpd/swish/sources, or you can simply keep them in the indexed directories.

freeWAIS or other WAIS software

Once you get freeWAIS, you'll need to compile it and set it up. The following documentation may help to give you a better understanding of what you're going to do next.

If you're only using freeWAIS to do Web searching, you may want to put it all somewhere web-related such as /usr/local/httpd/wais . You'll also want to create a directory to hold the WAIS database for your site, somewhere like /usr/local/httpd/wais/sources.


What to do after compiling

You may wish to skip this sectiom and instead read the documentation for your preferred indexing program. Otherwise, this is simply an introduction to get you going; it is by no means a tutorial or reference for any of the indexing programs mentioned (or anything else!)

After you've compiled (and installed) freeWAIS and/or SWISH, make sure the waisindex, waisq, waissearch, and/or swish programs are somewhere in your executable path (somewhere such as /usr/local/bin).

Now you'll want to put a script such as the following C-shell script somewhere where you can run it. This script will index your Web site into a searchable database that WAIS clients can read.


#!/bin/csh

set rootdir = /usr/local/www
#	This is the root directory of the Web tree you want to index.

set index = /usr/local/httpd/wais/sources/index
#	This is the name your WAIS indexes will be built under.
#       Index files will be called index.* in the /usr/local/httpd/wais/sources
#       directory, in this example.

set indexprog = /usr/local/httpd/wais/waisindex
#	The full pathname to your waisindex program.

set nonomatch
cd $rootdir
set num = 0
foreach pathname (`du $rootdir | cut -f2 | tail -r`)

    echo "The current pathname is: $pathname"
    if ($num == 0) then
        set exportflag = "-export"
    else
    	set exportflag = "-a"
    endif
    $indexprog -l 0 -nopairs -nocat -d $index $exportflag $pathname/*.html
    $indexprog -l 0 -nopairs -nocat -d $index -a $pathname/*.txt
    $indexprog -l 0 -nopairs -nocat -d $index -a $pathname/*.c
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.ps
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.gif
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.au
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.hqx
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.xbm
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.mpg
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.pict
    $indexprog -nocontents -l 0 -nopairs -nocat -d $index -a $pathname/*.tiff
    @ num++
end
echo "$num directories were indexed."

Make sure you've configured everything correctly, then run this script. The more files your Web site has, the longer indexing will take.

Taking as an example the above configuration in the script, you'd have the directory /usr/local/httpd/wais/sources and a number of files with the prefix index in the directory. The name of the database you've just created is index.src.

If you're using SWISH, you'll want to create a configuration file (one for each index, perhaps) . The configuration file below will do pretty much the same thing as the above script for waisindex:


# SWISH configuration file

IndexDir /usr/local/www
# This is the root directory of the Web tree you want to index.

IndexFile /usr/local/httpd/swish/sources/index.swish
# This is the name your SWISH index will be built as.

IndexOnly .html .txt .c .ps .gif .au .hqx .xbm .mpg .pict .tiff
# Only files with these suffixes will be indexed.

IndexVerbose yes
# Put this to show indexing information as swish is working.

NoContents .ps .gif .au .hqx .xbm .mpg .pict .tiff
# Files with these suffixes won't have their contents indexed,
# only their file names.

Assuming you named the config file swish.conf and placed it in /usr/local/httpd/swish/, you now type:

  swish -c /usr/local/httpd/swish/swish.conf
to run swish and index your site.

Taking as an example the above configuration in the script, you'd have the directory /usr/local/httpd/swish/sources and one file called index.swish in the directory. The name of the database you've just created is index.swish.


The next step

Now you'll need the C source for w4ais. You can get it at http://www.rru.com/~meo/useful/w4ais.tgz. Please read the license before building the source or using the program. Should you have licensing or business-related questions only, please contact SUzi Styrofoam at suzi@rru.com. Depending on several issues, you may also need to contact EIT, but if so, we will tell you, or it will be apparent from the licensing documentation.

The orginal wwwwais required you to specify a configuration file path in the source code. You might want to make the path to this file something like /usr/local/httpd/conf/w4ais/site.conf. For w4ais, this simply becomes the default path; others may be specified by the WWWW_ variables. These are typically set by the indexing program.

Now compile w4ais - it seems to compile fine with gcc.

Put the program in your Web server's /cgi-bin directory. It's a CGI (Common Gateway Interface) program, so it should be in this directory. Make sure it's executable by everyone.


Setting up the configuration file

The configuration file is easy to set up - it's much like NCSA and CERN server configuration files. Hash marks (#) and blank lines are ignored. To specify a variable and its value, type the variable name, a space, and its value. Here's an example configuration file:

# w4ais configuration file
# 30/May/1998
# Miles O'Neal, meo@rru.com
# Documentation at http://www.netads.com/Doc/w4ais.html
#
####################################################################
#
#   NEW STUFF [meo@rru.com 12/Jan/98]

Body /www/pages/rru/srchbody.inc
# file containing BODY tag (for colors, etc) [if unset, default provided]

PromptString "Search for:"
# prompt to place before keywords text field

# Only one of "Instructions" or "Help" should be provided.

#Instructions /www/pages/rru/srchinst.inc
# file containing instructions text (after keyword text field)
# [defaults are provided]

Help /~meo/srchinst.html
# URL containing instructions text (after keyword text field)

Trailer /www/pages/rru/srchtrail.inc
# file containing trailer text [if unset, no trailer text]

#
####################################################################

PageTitle /www/pages/rru/srchhead.html
#PageTitle "Searches-B-We Search Results"
# If this is a string, it will be a title only.
# If it specifies an HTML file, this file will be prepended to w4ais results.

SelfURL "http://www.rru.com/cgi-bin/w4ais"
# The self-referencing URL for w4ais.

MaxHits 50
# The maximum number of results to return.

SortType score
# How results are sorted. This can be "score", "bytes", "lines",
# "title", or "type".

Format table
# How results are formatted.  This can be "brief", "standard", or
# "table".  wwwwais only supported the "standard" format.

AddrMask "all"
# Only addresses specified here will be allowed to use the gateway.
# For the above mask option, these rules apply:
# 1) You can use asterisks in specifying the string, at either
#    ends of the string:
#    "192.100.*", "*100*", "*2.100.2"
# 2) You can make lists of masks:
#    "*192.58.2,*.2", "*.100,*171.128*", ".58.2,*100"
# 3) A mask without asterisks will match EXACTLY:
#    "192.100.58.2"
# 4) Define as "all" to allow all sites.

#WaisqBin /usr/local/bin/waisq
#WaissearchBin /usr/local/bin/waissearch
SwishBin /www/httpd/cgi-bin/swish
# The full path to your waisq, waissearch or swish program.

SourceRules replace "/www/pages/rru/" "http://www.rru.com/"
SourceRules replace "/u/" "~"
SourceRules replace "public_html/" ""
#SourceRules prepend "http://www.rru.com/cgi-bin/munge?"
#SourceRules append "?$KEYWORDS#first_hit"

#SwishSource /www/pages/rru/index.swish "Roadkills-R-Us"
#WaisSource /usr/local/httpd/wais/index/index.src "Search EIT's Web (bolded results)"
#WaisSource quake.think.com 210 directory-of-servers "WAIS directory of servers"
# For waisq sources:
#    WaisSource full_path_to_source/source.src "description"
# For waissearch sources:
#    WaisSource host.name port source "description"

UseIcons yes
# yes = use icons, no = don't

IconUrl /icons
# Where all your icons are kept.

TypeDef .html "HTML file" $ICONURL/text.xbm text/html
TypeDef .htm "HTML file" $ICONURL/text.xbm text/html
   :
TypeDef .src "WAIS index" $ICONURL/index.xbm text/plain
TypeDef .?? "unknown" $ICONURL/unknown.xbm text/plain
# Information for figuring out file types based on suffix.
# Suffix matching is case insensitive.
#    TypeDef .suffix "description" file://url.to.icon.for.this.type/ MIME-type
# You can use $ICONURL in the icon URL to substitute the root icon directory.
Variables such as SelfURL, SortType, and MaxHits should be fairly explanatory; see below for a list of options. You can use the AddrMask option to allow only certain sites to search your site, and WaisqBin, WaissearchBin, and SwishBin are pointers to your waisq, waissearch, and swish programs, respectively.

The WaisSource and SwishSource variables

The WaisSource and SwishSource variables tells w4ais the sources you want the user to be able to search. For an index that will be searched with waisq, specify the full pathname to the source you want to search (ending in .src) and a short description of the database. For WAIS servers, specify the host name, port, source name (without a .src) and a short description. For SWISH indexes, specify the full path to the source (ending in .swish) and a short description.

If you've specified more than one WAIS and/or SWISH source in your configuration file, a pop-up menu will appear on the w4ais page with the descriptions as menu items (in the order that you specified the sources). Choose one and start searching! If you have only one source specified, no menu will appear. Just enter your search text and hit return to search that source.

Using SourceRules

When results are returned from WAIS servers and waisq, you may get a bunch of funny pathnames to files that you can't access. Using SourceRules, you can specify a series of operations to perform on the pathname result to change it into a URL, a CGI program to filter the file through, and so on.

There are three operations you can specify: replace, append, and prepend. They will parse the pathname in the order they appear in the configuration file. More than one command and its arguments can appear on the same line, but it's easier to read when commands are broken up over a few lines. You can't put a command and its argument(s) on different lines, however.

Commands apply to the WAIS or SWISH source specified just before the commands. Here's the syntax:

   replace "the string you want replaced" "what to change it to"
      This replaces all occurrences of the old string
      with the new one.
   prepend "a string to add before the result"
   append "a string to add after the result"
In any command argument, $KEYWORDS will be replaced with the keywords you used to search, so you can pass them to filtering programs that can use them. One good program is print_hit_bold.pl, a Perl program which bolds the found text in search results.

Study the above sample configuration file and try things out. You'll find you can do a lot of nifty things with WAIS sites and filters.

Using TypeDef

The program often will need to know the types of files it's returning, so you won't be so confused when you get results back, and so your browser will know what to do if you ask for a file. The TypeDef option maps different suffixes to MIME types and short descriptions.

On a TypeDef line, you need to specify the suffix for the particular type(with a period), a short description to include in results (this shouldn't be any more than two or three words), the URL to the icon representing the file type (unused if you're not displaying icons), and the MIME type corresponding to the particular type.

The information specified with the suffix .?? will be associated with all other files that w4ais can't figure out.

Appearance variables

The following variables affect the appearance of the output: NOTE: Only one of Help or Instructions should be provided. If both appear in the configuration file, Help takes precedence.

General configuration variables


Setting up the HTML form

In order to create a basic interface for your indexing software, you need an HTML form. The best thing to do is copy one from the Form/ directory provided with the w4ais distribution and modify it, but at a minimum you need something like the following in an HTML page for which you want the search capability:
<FORM METHOD="get" ACTION="/cgi-bin/w4ais">
Search for:
<INPUT TYPE="text" NAME="keywords" SIZE=40>
<INPUT TYPE="submit" VALUE=" Search ">
</FORM>
Enter keywords for your search. You can try both single and multi-word searches, separating words with ``and'', ``or'' and/or ``not''. You may use parentheses for grouping. A trailing asterisk (``*'') matches all words beginning with whatever precedes the asterisk.

NOTE: The above all work for swish; I'd hope they also work for waisq or waissearch, but don't know.


Wait! What's the difference between waisq, waissearch, and swish?

Both the waisq and waissearch programs that come in the freeWAIS distribution search WAIS databases for the information you're looking for. However, waisq looks in databases on the host machine you're doing the searching on, and waissearch can look in databases on different machines all over the Internet. swish is much like waisq in that only indexes that are locally available to w4ais can be searched.

The waissearch program does things remotely by contacting WAIS servers on different machines, each of which has their own databases. In order for waissearch to do its thing, you need to tell it a machine name and a port to connect to, and that machine needs to have a WAIS server of their own running on that port.

By telling w4ais to use the waissearch program and specifying a host name and port in the URL (see below), you can search WAIS databases on other machines. If you wish to run your own WAIS server on your machine, make sure you have the waisserver program (from the freeWAIS package) and run it like this (all on one line with no returns):

./waisserver -p 2010 -d /usr/local/httpd/wais/sources
    -e /usr/local/httpd/logs/wais.log &
This will run a WAIS server on your machine on port 2010. This server should log its results to a file named wais.log and will search through any databases located in the /usr/local/httpd/wais/sources directory.

Say you have a database named "index.src" in that directory and you've started your WAIS server with the line above. To search "index.src" using your server, you could call w4ais with a URL like this (all on one line, of course):

http://foo.bar.com/cgi-bin/w4ais?searchprog=waissearch&
    host=foo.bar.com&port=2010&source=index&keywords=heart+of+gold
For both SWISH and W4AIS, I recommend one configuration file each for every separate index. For instance, you might have an overall public site index, an internal index, and smaller indexes for individual users. The configuration files may be kept wherever you wish, but the obvious places are in the roots of the various page trees, or in central locations in the server config directories. I usually prefer the latter.

Calling w4ais with options

You can call w4ais with different options in the URL:

conffile
      example: /cgi-bin/w4ais?conffile=/www/httpd/conf/wwwwais/meo.conf
  description: Specifies the config file.  One will be compiled in
               as a default, but you can have multiple files so that
               different directories, users, projects, etc. can be
               configured with different appearances, defaults, etc.

selection
      example: /cgi-bin/w4ais?selection=none
      example: /cgi-bin/w4ais?selection="My+WAIS+server"
      example: /cgi-bin/w4ais?selection="Files+about+me"
  description: Specifies the index source to use (as set up
               in the configuration file). The argument is
               the source description. If selection
               is not defined as "none", any corresponding
               source information in the configuration file
               will override arguments and environment
               variables.

searchprog
      example: /cgi-bin/w4ais?searchprog=waissearch
  description: Specifies the program to do the searching.
               This can be "waisq", "waissearch", or
               "swish".

source
      example: /cgi-bin/w4ais?source=index.src
  description: Specifies the index database to search.

sourcedir
      example: /cgi-bin/w4ais?sourcedir=/usr/local/sources
  description: Specifies the directory the index database
               resides in. (You can have multiple databases
               in the same directory)

maxhits
      example: /cgi-bin/w4ais?maxhits=40
  description: Determines the maximum number of URLs
               to return after a search.

keywords
      example: /cgi-bin/w4ais?keywords=these+are+keywords
  description: You can specify the search keywords by using
               the keywords label.

isindex
      example: /cgi-bin/w4ais?isindex=these+are+keywords
  description: This works the same as keywords.

format
      example: /cgi-bin/w4ais?format=brief
  description: How results are formatted.  This can be "brief",
               "standard", or  "table".  wwwwais only supported
               the "standard" format.

sorttype
      example: /cgi-bin/w4ais?sorttype=bytes
  description: This determines how w4ais sorts its output.
               Valid arguments for sort are "score",
               "lines", "bytes", "title", and "type".

version
      example: /cgi-bin/w4ais?version=true
  description: This gives the version information for
               w4ais and the search program it runs.

host
      example: /cgi-bin/w4ais?host=eit.com
  description: This gives the host machine to search. This
               is only valid when using waissearch to do
               the searching.

port
      example: /cgi-bin/w4ais?port=2010
  description: This gives the port of the wais server that
               will be doing the searching. This is only
               valid when using waissearch with w4ais.

useicons
      example: /cgi-bin/w4ais?useicons=yes
  description: This tells w4ais to use icons for different
               files in the search results. This can be
               "yes" or "no".

iconurl
      example: /cgi-bin/w4ais?iconurl=http://www.eit.com/icons/
  description: This tells w4ais the master URL at which to
               find the icons it may need.

<keywords only>
      example: /cgi-bin/w4ais?these+are+keywords
  description: Keywords can be specified by themselves from
               "isindex" forms. This only works if no other
               options are used.

<no arguments>
      example: /cgi-bin/w4ais
  description: Calling the program with no arguments brings
               up a blank field in which users can enter
               search keywords.
Examples of specifying multiple options in the URL:

  /cgi-bin/w4ais?isindex=these+are+keywords&maxhits=80
  /cgi-bin/w4ais?source=index.src&keywords=test+search
There are other ways w4ais can get variable information - you can specify these variables in forms using either the GET or POST methods, and PATH_INFO is supported as well, so you can make something like:
<FORM METHOD="get" ACTION="/cgi-bin/w4ais/host=quake.think.com&port=210&searchprog=waissearch">
Search for:
<INPUT TYPE="text" NAME="keywords" SIZE=40>
<INPUT TYPE="submit" VALUE=" Search ">
</FORM>
The above form sets w4ais up to search the WAIS server at quake.think.com, port 210. Note that you can use the POST method as well in this example, and the result will be exactly the same.

Environment variables are also supported - just put "WWWW_" before variables and make everything uppercase. For instance, instead of putting searchprog=waisq in a URL, you can type setenv WWWW_SEARCHPROG waisq and run w4ais from a shell script.


Other pointers

Here are some other WAIS resources on the web:

The detailed personal page in the Netherlands has disappeared.


That's it!

You can take advantage of the command-line options by creating forms which give the user control over sorting, searching multiple databases, etc. Let me know if you make any nice interfaces with this!

If you come across any bugs or problems while searching a WAIS server, please send me the host, port, and source information so I can try things out and track down the problem.

If you know of other file indexers that can be interfaced with W4AIS, I highly recommend that you try modifying W4AIS to do so - after all, the point is to provide a single, easy to use method of searching many different types of indexes.


Last updated: 25 October 2001