SWISH 1.2.1

Configuration



[Index] [Previous Chapter] [Next Chapter]

SWISH configuration is done at three levels - compile time, the configuration file, and with command line options. This chapter focuses on the configuration file, and touches on the command line options. For information on compile time configuration, see the Installation Guide.

Future releases of SWISH will hopefully provide configuration capability to override all compile times options.

The SWISH configuration file

You can specify variables and values in the configuration file by typing the variable name (it's not case sensitive), a space (tabs are OK), and the value you want for the variable. If the value has spaces, you can enclose it in quotes to keep the space. If you want to specify multiple values, separate the values with a single space. In the configuration file, lines beginning with a hash mark (#) and blank lines are ignored.

The configuration variables are:


Basic index variables


Using ReplaceRules

When results are returned from swish searches, you may get a bunch of funny pathnames to files that you can't access. Using ResultRules, you can specify a series of operations to perform on the pathname result to change it into a URL and other things if you desire.

There are three operations you can specify: replace, append, and prepend. They will parse the pathname in the order you've typed these commands. More than one command and its arguments can appear on the same line, but it's easier to read when commands are broken up over a few lines. You can't put a command and its argument(s) on different lines, however.

Here's the syntax:

    ReplaceRules replace "the string you want replaced" "what to change it to"
        This replaces all occurrences of the old string
        with the new one.
    ReplaceRules prepend "a string to add before the result"
    ReplaceRules append "a string to add after the result"
Study the sample configuration file in Appendix C and try things out. You'll find that by having swish return URLs instead of pathnames, you can create interfaces to swish that can allow users to get to the search results over the World-Wide Web.


Using FileRules

You can specify certain file directives in the configuration file - any files or directories matching these criteria will be ignored and will not be indexed. Append all of these operations to a FileRules directive:

Advanced indexed variables

These variables affect indexing by either changing the weight of a word, or by changing what constitutes a word. The defaults should be fine for most people. See the FAQ for suggestions on using these to fine tune your configuration.

Building configuration files

There are three basic ways to create configuration files:
  1. the ez-swish web page interface (basic, user-level, configuration files)
  2. the mkswishconf script (prompts for all configuration parameters)
  3. roll your own (copy a sample or in-use config file, tweak it)
Each of these has a targeted user base, as explained in the descriptions below.
  1. EZ swish web page interface

    The first method is useful when users want to create their own indexes, and don't care about heavy customization of the indexing algorithm. The web page scripts provide reasonable defaults for most things, and let the user fill in a minimum of information. Since the web server cannot, by default, write into a user's directory, the config file is written to a known location (/tmp/$LOGNAME-swish.conf), and the user is prompted to move it into the proper directory. If the user later desires, they can read this (self-documenting) file and play with the various configuration parameters.

    To use this, the ez-swish*cgi scripts should be installed in a known, public place, such as http://wherever/cgi-bin/ . The user points a web browser at http://wherever/cgi-bin/ez-swish.cgi and fills in the few blanks there. To get reasonable defaults, the user actually accesses http://wherever/cgi-bin/ez-swish.cgi?username where username is the user's login name. Ths script, as delivered, assumes the user's web pages will be in $HOME/public_html .

  2. mkswishconf

    The mkswishconf script guides a user through creation of a fully customized swish configuration file. The script is invoked with a single parameter, the path of the config file to be created. Examples include:
        mkswishconf /www/httpd/conf/swish/acctg.conf
        mkswishconf ./swish.conf
    
    This is most useful for people who don't know, or have confidence in, any local, UNIX-based editor, or people who need extra handholding.

  3. Roll your own

    Most people simply copy one of the example files (in swish/Conf/) and edit that. Since the files are fairly well documented, this will work for most people, especially after reading this manual.

Command line options

Running SWISH with the -z, -h or -? options shows us the command-line options (and a couple of other things we aren't worried about here). This chapter briefly covers configuration options; other options devoted only to searching or indexing are covered in the Users Guide.

Ways to invoke swish:

    swish [-i dir file ... ] [-c file] [-f file] [-l] [-v [num]] [-e] [-E]
    swish -w word1 word2 ... [-f file1 file2 ...] [-m num] [-t str]
    swish -M index1 index2 ... outputfile
    swish -D file
    swish -V
Options (defaults are in brackets):
    -i : create an index from the specified files
    -w : search for words "word1 word2 ..."
    -t : tags to search in - specify as a string
         "HBthec" - in head, body, title, header,
         emphasized, or comments
    -f : index file to create or search from [index.swish]
    -c : configuration file to use for indexing
    -l : follow symbolic links when indexing
    -v : verbosity level (0 to 3) [0]
    -e : emphasize words in comments when indexing
    -E : emphasize words in META tags when indexing
    -m : the maximum number of results to return [50]
    -M : merges index files
    -D : decodes an index file
    -V : prints the current version

-m (number) (number of results)

While searching, this specifies the maximum number of results to return. The default is 50. If no numerical value is given, the default is assumed. If the value is 0 or the string all, there will be no limit to the number of results. There is no corresponding MaxHits parameter in the configuration file.

-i directory file ... (files to index)

This specifies the directories and/or files to index. Directories will be indexed recursively. This overrides IndexDir in the configuration file.

There is no default.

-c configfile ... (configuration file)

This specifies the configuration file to use for indexing or searching. You can use this as an only option to swish to do automatic indexing, if all the necessary variables are set in the configuration file.

Many parameters in the configuration file may also be overridden by other command line options.

You can specify multiple configuration files in order to split up common preferences. For instance, you might store a file with the stopwords in it and have multiple other files that have different index file information.

  example 1: swish -c swish.conf
  example 2: swish -i /usr/local/www -f index.swish -v -c swish.conf
  example 3: swish -c swish.conf stopwords.conf
Notes on examples:
  1. The settings in swish.conf will be used to index the pages defined therein. Therefore all necessary parameters must be defined in swish.conf .
  2. The command-line options override the corresponding parameters in the configuration file.
  3. The variables in swish.conf will be read, then the variable in stopwords.conf will be read. Note that if the same variables occur in both files, older values may be added to, written over or ignored, depending on the parameter.
You can also use the same configuration file for multiple indexes, by specifying common parameters in the file and differing parameters (such as directories to index and the index file location) on the command line.
  example :
      swish -c /www/httpd/conf/swish/users.conf \
          -f /u/meo/public_html/index.swish  -i /u/meo/public_html
      swish -c /www/httpd/conf/swish/users.conf \
          -f /u/sbo/public_html/index.swish  -i /u/sbo/public_html
These commands generate indexes for two separate user's webs, with all other parameters in common. This requires, of course, an IndexTitle nebulous enough to work for both. Again, you could also use multiple configuration file - one per user and a common file with overall settings.

-f indexfile1 (index file to create)
-f indexfile1 indexfile2 ... (index file[s] to search)

If you are indexing, this specifies the file to save the generated index in, and you can only specify one file. If you are searching, this specifies the index files (one or more) to search from. The default index file is index.swish in the current directory.

-l (symbolic links)

Specifying this option tells swish to follow symbolic links when indexing. This overrides the FollowSymLinks configuration file parameter. The default is no.

-e (emphasize words in comments)

Specifying this option tells swish to emphasize words found in HTML comments when indexing. This means that words in comments will have a heavier weight when determining a file's score for the word[s]. This overrides the EmphasizeComments configuration file parameter. The default is no.

-E (emphasize words in META tags)

Specifying this option tells swish to emphasize words found in META tags when indexing. This means that words in META tags will have a heavier weight when determining a file's score for the word[s]. This overrides the EmphasizeMetaTags configuration file parameter. The default is yes.

-M indexfile1 indexfile2 indexfile3... (index merging)

This allows you to merge two or more index files - the last file you specify on the list will be the output file. Merging removes all redundant file and word data. To estimate how much memory the operation will need, sum up the sizes of the files to be merged and divide by two. That's about the maximum amount of memory that will be used. You can use the -v option to produce feedback while merging and the -c option with a configuration file to include new administrative information in the new index file.

There are no defaults for this option.

-D [indexfile] (decode)

This option is provided so you can check the word, file, and maintenance information in index files. You can specify multiple files to decode.

There is no default for this option.

-v [number] (verbosity option)

The -v option can take a numerical value from 0 to 3. Specify 0 for completely silent operation and 3 for detailed reports. This overrides the IndexReport configuration file parameter. If no number is specified, the verbosity is set to 0.
verbosity 0 - silent running
Level 0 is silent for normal operation. Only errors are reported.
verbosity 1 - normal details
Level 1 lets you know the bare statistics, as in the following (real) example from http://www.rru.com/ :
Removing very common words... no words removed.
Writing main index... 10611 unique words indexed.
Writing file index... 408 files indexed.
Running time: 15 seconds.
Indexing done!
verbosity 2 - directory-level commentary
Level 2 is the same as level 1, except that as swish traverses the directories to be indexed, it reports each directory it enters. Here's a partial, real-life example from the same website:
Checking dir "/www/pages/rru"...
Checking dir "/www/pages/rru/Images"...
Checking dir "/www/pages/rru/RNN"...
Checking dir "/www/pages/rru/RNN/Morgue"...
Checking dir "/www/pages/rru/RNN/NS"...
Checking dir "/www/pages/rru/RNN/Propoganda"...
verbosity 3 - detailed commentary
Level 3 gives all the information of level 2, plus commentary as it indexes each file, telling how many unique words it found in that file. Here's another partial, real-life example from the same website:
In dir "/www/pages/rru/RNN":
  headlines.html (168 words)
  index.html (202 words)
  morgue.html (270 words)

In dir "/www/pages/rru/RNN/Spumoni":
  current.html (624 words)
  current.txt (740 words)

In dir "/www/pages/rru/RNN/Spumoni/Extra":
  staff.html (51 words)
  subs.html (59 words)

In dir "/www/pages/rru/RNN/Spumoni/Morgue":
  001.html (755 words)
  002.html (288 words)
  003.html (705 words)
  004.html (309 words)
  005.html (1124 words)
  005s.html (397 words)
  007a.html (425 words)
  008.html (559 words)
  009.html (811 words)

-V (version option)

The -V option causes swish to spit out its version number. The result looks like this:
  swish 1.2.1

[Index] [Previous Chapter] [Next Chapter]

Last update: 18/Aug/1998