Some years ago well before the surge in popularity
	  of web search engines I used to be interested in
	  textual databases, which is what web search engines
	  use as backends. At that time these were largely based
	  on the
	  Z39.50
	  protocol for textual database queries, and I used
	  FreeWAIS
	  for indexing my e-mail and news archives. It used to
	  be that only information retrieval people knew about
	  WAIS and similar tools, and they fell a bt into
	  obscurity, while various people started applying
	  information retrieval to web searching, and I stopped
	  using FreeWAIS, relying more on archiving by topic.
	The popularity of web based index searching has made
	  textual databases a current topic again, and this has
	  resulted in several new WAIS-like projects, only done
	  a bit amateurishly at times. Anyhow I wanted to start
	  using one to index email and saved web pages, so I had
	  a look, and looked at several and tried briefly
	  some.
	The ones I tried are:
	
	  - Strigi
- This is the
	    KDE
	    default indexer, and I tried version 0.5.11 under
	    KDE 3.5, and it is based on Lucene. The good points
	    are that it is fairly UNIXy, with decent command
	    line support, and an interesting idea for query
	    language.  Well, it sort of worked, sometimes; but
	    it looped all the time, and would get stuck, and the
	    communication between the GUI client and the indexer
	    and searcher was unreliable.
- Google Desktop for Linux
- Interesting package, a bit old, but seemed to
	    mostly work. It seems to have comprehensive format
	    support and searches semms to be quite good. But it
	    has three big issues:
	    
	      - It is a closed sources application with
		virtually no documentation, and the GNU/Linux
		version seems to be low priority. It could
		easily become abandonware.
- It is extremely un-UNIXy as it has no command
		line option; once launched all use and control
		is via a web interface,and all files are
		binaries with an unknown format.
- In order to reduce system load while indexing,
		it indexes very slowly even when there is no
		load on the system.
 and a showstopper: if the system crashes while
	    it is indexing, something gets corrupted and the
	    crawler subprocess dies repeatedly. one can recover
	    by saving therepodirectory and then
	    recreating the containingdesktopdirectory, but that is rather incovenient.
- Beagle
- I have only very lightly tested Beagle, and it
	    seems fairly nice, with a rather negative point for
	    me that it written for the Microsoft CLR. But it has
	    a showstopper too: Beagle keeps a cache of indexed
	    documents, and this cache contains a very large
	    number of files under 2KiB in size. This makes
	    things like backups very very slow.
- recoll
- This is a very nice package indeed. It is very
	    UNIXy, with good command line support, sensible
	    configuration files, and even
	    extensive documentation.