February 24, 2000

January 12, 2000

uri-2.7 is available.

Renamed uri struct member to _uri because some compilers do not like that and think that's a name clash.

January 3, 2000

mifluz-0.11

Several bug fixes, speedups, and code cleanups. Added possibility to monitor what's going on inside the indexing. Preparing for full scale, real-world tests.

December 16, 1999

mifluz-0.10, webbase-5.6 and uri-2.6 are available.

This set of versions must be used together. See each product page for more information on the modifications. We've fixed memory leaks, configuration errors and bugs.

December 09, 1999

mifluz-0.9 is available.

A new compression algorithm was implemented. It reduces the index size by a factor of 8 compared to an uncompressed index. It works in the same context as the previously implemented compression (it compresses/uncompresses pages within Berkeley DB when they are written/read to the db file), but the comperssion algorithm is specifically designed for compressing DB pages (th previous compression used zlib). Since pages are generally full of redundant data this can achieve good compression ratios.

December 8, 1999

Search-Mifluz-0.01 is available.

This is the pre-release version of the Perl interface to mifluz. It was generated using SWIG. We had to patch SWIG in order to achieve proper package encapsulation. The patches will be integrated in the next SWIG version but at present they are included in the Search-Mifluz distribution.

The release of Search-Mifluz was also the opportunity to use SourceForge as a repository for the project. SourceForge provides all facilities available on Senga for OpenSource projects. If we're satisfied with SourceForge for Search-Mifluz, we consider moving all the products to SourceForge. It's much easier to contribute to a shared source distribution environment than dealing with it on our own :-)

December 7, 1999

webbase-5.5 is available.

In this minor maintainance release we've fixed a few leaks and memory overrun. It has been tested on a set of 150 000 URLs, some of them containing really weird data.

November 29, 1999

webbase-5.4 is available.

The most important thing is that many memory leaks have been removed. The crawler has been extensively tested (around 2 million URLs crawled on 150 000 different web sites). The mifluz full text indexing library is now integrated. It generates very big indexes at present but will improve dramaticaly next week thanks to Marcel Bosc. For more information on this subject refer to the mifluz mailing list and the htdig3-dev mailing list (on htdig). The hook to the full text indexing library is located in the new hooks library.

In order to definitely fix the problems related to long URLs, the url field is now a text field. To resolve the indexing issue, a field was added to the url and start table: url_md5. Following the same idea, the directory tree that contains the temporary copies of the pages (WLROOT) now contains cryptic MD5 based file names. It's activated by default with the version 2.4 of the uri library.

The MySQL connection functions have been upgraded so that it takes in account a ~/.my.cnf file. Always using -user, -password etc. is not mandatory anymore.

The -schema option was added to crawler and displays the builtin database schema. It's usefull if you want to add fields of your own in the start table.

Thanks to Bertrand Demiddelaer who fixed a timeout problem. Many other small bugs were fixed while testing, refer to ChangeLog for detailed information.

November 05, 1999

mifluz-0.8 is available.

Version 0.7.0 forgot to include examples subdirectory... Some portability and bug fixes. The docs on the API were extended, some examples were added to help starting up with mifluz.

The storage key (WordKey) class has evolved a bit: accesors for getting numerical fields were added. Input operators for streaming were added to WordKey,WordList,WordReference...

A speed-up for skiping useless sequential walking when using partialy defined searchkeys was added, as well as tests.

The use of the (important) WordList::Walk method was simplified.

October 12, 1999

mifluz-0.6 is available.

After two months of maturation and coding, the first working version of mifluz-0.6 is finaly available. It is in alpha stage but we stronly believe that the architectural choices are appropriate and will allow mifluz to reach maturity rapidly. It provides very few functionalities and is merely an inverted index manipulation library. It knows nothing about parsing documents or displaying search results.

We worked very closely with the Ht://dig Group and Berkeley DB staff. mifluz-0.6 is used in the 3.2 version of Ht://dig (or mifluz-0.6 is a packaging of the Ht://dig indexing library, depending on your point of view :-). We implemented a transparent compression layer in Berkeley DB 2.2.7 that will (maybe) be included in future releases of Berkeley DB.

A new developper, Marcel Bosc (bosc@senga.org), joined Senga two days ago. He will eventually take over on mifluz. The work required is huge and having someone working full time on this subject is great news. The immediate future is to integrate mifluz with the crawler and Catalog.

September 7, 1999

Catalog-1.01 is available.

This is a maintainance release.

  • Various bug fixes. All easy to fix bugs have been fixed. Take a look at bugzilla to see what hasn't been fixed.
  • The _PATHTEXT_ and _PATHFILE_ tags syntax has been extended to specify a range of path component.
  • Graham Barr added a recursive template feature for a catalog root page. This allows to show sub-categories of the root categories in the root page of a catalog.

Don't hesitate to submit bugs or ideas to bugzilla. Hopefully the next version of Catalog will have a fast full text indexing mechanism and I'll be able to implement new functionalities.

Have fun !

July 13, 1999

The first release of the URI manipulation C library (uri) and the internet crawler C library (webbase) are available. These two libraries are core component of our search engine. One would say : what ? another internet crawler ? we already have dozens ! Of course there is a difference with this one : it is able to efficiently crawl millions URLs. The crawler information is stored in a MySQL database.

July 6, 1999

The whole www.senga.org site has been restructured. It now contains general information about Senga, at the home page level. The top level menu on the left gives access to the bug tracking system for all the products (Bug Track), a catalog of resources that we use for development (Links). The Products page points to all the products or development projects at Senga. This is where you will find Catalog.

July 3, 1999

Catalog-1.00 is available.

This release includes PHP3 code to display a catalog. The author is Weston Bustraan (weston@infinityteldata.net). The main motivation to jump directly to version 1.00 is to avoid version number problems on CPAN.

July 2, 1999

Catalog-0.19 is available.

This is a minor release. The most noticeable addition is the new search mechanism.

  • Searching : two search modes are now available. AltaVista simple syntax and AltaVista advanced syntax. Both use the Text-Query and Text-Query-SQL perl modules.
  • Dmoz loading is much more fault tolerant. In addition it can handle compressed versions of content.rdf and structure.rdf. The comments are now stored in text fields instead of char(255).
  • The template system was extended with the pre_fill and post_fill parameters.
  • Searching associated to a catalog dumped to static pages is now possible using the 'static' mode.
  • Fixed two security weakness in confedit and recursive cgi handling.
  • Many sql queries have been optimized.
  • The configuration was changed a bit to fix bugs and to isolate database dependencies.
  • The tests were updated to isolate database dependencies.
  • Fixed numerous minor bugs, check ChangeLog if you're interested in details.

Many thanks to Tim Bunce for his numerous contributions and ideas. He is the architect of the Text-Query and Text-Query-SQL modules, Eric Bohlman and Loic Dachary did the programming.

Thanks to Eric Bohlman for his help on the Text-Query module. He was very busy but managed to spend the time needed to release it.

There is not yet anything usable for full text indexing but we keep working on it. The storage management is now handled by the reiserfs file system thanks to Hans Reiser who is working full time on this. Loic Dachary does his best to get something working, if you're interested go to http://www.senga.org/mifluz/.

For some mysterious reason CPAN lost track of Catalog name. In order to install catalog you should use perl -MCPAN -e 'install Catalog::db'. Weird but temporary.

Have fun !

May 26, 1999

There currently are four contributors to Catalog. Here they are:

  • Tim Bunce (Tim.Bunce@ig.co.uk) is working on a commercial project involving Catalog. He fixes bugs, change the programming interface and has ideas on how to do things.
  • Christophe Le Bars (clb@alcove.fr) is packaging Catalog for Debian.
  • David Walker (dwalker@c-wheeler.agelena.net) is adding Postgres support.
  • Weston Bustraan (weston@infinityteldata.net) works on PHP3 code to display the content of Catalog.
Of course I won't be posting this list on the home page every month. If you want to know who's working on what you can bookmark the list of assigned tasks.

May 18, 1999

Catalog-0.10 replaces the Catalog-0.9 version published yesterday because of an installation bug that makes it completely unusable except for people ugrading from Catalog-0.5. Thank you for your patience.

May 17, 1999

Catalog-0.10 is available.

This is a maintainance release. We are happy to announce that Catalog is now available at your nearest CPAN mirror. The bug tracking system installed two weeks ago proved very usefull. It allows anyone to enter bug reports, ideas and suggestions about Catalog. If you are in need of commercial support on Catalog, two new companies are entering the business : Alcove and Atrid. (for details go to the support page).

  • The Bundle::Catalog module has been changed to include Catalog to simplify the installation process.
  • The installation procedure has been simplified a bit and now includes the possibility to re-use an existing configuration and to specify the installation root of MySQL.
  • The dmoz.org loading process is better documented and the interface now clearly explains the loading steps.
  • The Catalog directory containing the documentation is now created by the installation process.
  • Tim Bunce bug fixes and enhancements have been integrated.
  • A FreeBSD 3.1 section was added to the installation process. The makefiles no longer depend on GNU Make, except for the documentation makefile. We strongly suggest using GNU Make :-)
  • Contributions guidelines and script have been added (CONTRIBUTIONS file). It provides a framework to easily contribute to the software, using the latest development branch.
  • A memory leak has been found in XML-Parser-2.23, we strongly recommend using XML-Parser-2.22 instead, if you manipulate big amount of data such as dmoz.org.
  • The loading of dmoz.org is now resistant to duplicates in the author section.
  • A bug in the _PATH_ tag handling was fixed. Additional tags have been added to have access to individual path components (_PATH0_, _PATH1 ...).
  • A first step was made to make the code database independant. There is still some work to be done. If you have experience on Oracle, Informix, Postgres, you could already provide the table definitions and the database configuration procedures.
  • The verbosity of the error messages has been reduced.

For more details on bug fixes you can search the bug tracking system at (bugzilla). We are working hard on the full text indexing library. There will be more on this subject very soon.

Have fun !

May 2, 1999

The Bugzilla bug tracking system is installed in http://www.senga.org/bugzilla/. It is used not only to report bugs of Catalog but also to suggest enhancements or new features. Anyone can add an entry, go ahead !

April 19, 1999

Catalog-0.5 is available.

The main features added to this version are:

  • XML external representation of a thematic catalog. This allow easy export and import of existing catalogs. The XML format is a custom one and you could argue that we should have used XML/RDF instead. The lack of tools handling XML/RDF prevented this.
  • A new module has been derived from Catalog to display and manage dmoz (www.dmoz.org) catalog. This effectively allow anyone to run a mirror of dmoz. The database is only 400Mb big for 400 000 URLs and 65 000 categories. Response time is really fast provided you've installed Apache + mod_perl.
  • The Makefiles and installation procedures have been rebuilt from scratch for more flexibility and clarity.
  • A Perl bundle was added to automate the installation of dependent modules. This became really necessary since Catalog now depends on 9 external modules found on CPAN.

Altough Catalog was added last month to CPAN, the module list has not been re-generated since then and we impatiently wait for it.

A mirror of dmoz.org has been loaded to show that Catalog is able to handle a large number of records and categories.

March 16, 1999

Catalog-0.4 is available.

The main features added to this version are:

  • Intuitive browsing : /cgi-bin/Catalog/Sport/Events/Tennis/ will display the expected category content. This is much more readable than the name=catalog&context=cbrowse&id=3 parameters.
  • Static dump : the whole catalog can be dumped in a directory tree that replicate the category structure. The result may be copied and browsed using only static HTML pages. This can be very convenient if your web site is not cgi-bin enabled.
  • Search function : the thematic catalogs may now be searched in full text. Category names and record contents are searched. The search may be limited only to the category names or only to the record contents.
  • A complete example is installed with Catalog. A chapter was added to the documentation to comment the example. It is a step by step guide to configure the catalogs. The example contains a thematic catalog, a chronological catalog and an alphabetical catalog.
  • Option in configuration files for nph scripts.
  • The configuration generated by Makefile.PL is saved and reused in the config.cache file so that repeated installations do not require answering the same questions multiple times.

Catalog now depends on the MD5 Perl module. A copy of this module is kept on the www.senga.org download page. We have upgraded the MySQL distribution to 3.22.19 because it is now stable. Some users may have noticed formating errors in the HTML version of the documentation : it has been fixed.

Two real world usage of Catalog may be seen at Ghana International Trade Fair (english) and Interbat (french). The example delivered with Catalog is also available on www.senga.org for browsing only: a thematic catalog, a chronological catalog and an alphabetical catalog.

Last but not least, the Catalog name space was approved by Perl maintainers and Catalog should appear at your nearest CPAN site in the following weeks.

February 24, 1999

Catalog-0.3 is available.

The main features added to this version are:

  • A new kind of catalog has been added : the chronological catalog. As expected it shows the entries of a catalog ordered by date. That's what you want to add a What's New section to your existing catalog.
  • The context_allow instruction has been added to the sqledit.conf configuration file to allow only a specific set of actions. You must use this instruction if you want to publish a Catalog, otherwise the users will have the ability to alter the catalog by changing the parameters manually in the URL.
  • Fix a security hole implying eval.
  • The catalog management interface has been improved, allowing editing of category properties, editing of the entries in a category. The display is nicer, graphic buttons are used instead of links.
  • The installation now requires a directory to put the HTML documentation and images used by the catalog management interface. This directory will also be used later on for examples.
  • The tests run when make test is used now cover most of the cases.
  • The documentation has been updated and improved, many typos have been fixed.
  • Some memory leaks have been found/fixed and the processes have a reasonable size when running Apache and mod_perl.
  • The dir file is automatically modified by the installation process if you've chose to install the info format.
  • New tags are available in all templates : _SCRIPT_ and _HTMLPATH_.
  • A few common errors that may occur when using the catalog management interface in the wrong way now show explicit error messages in an HTML page instead of crashing. That prevents looking in the HTTP logs to find out what was wrong.
  • A mixture of POST and GET in the catalog management interface confused caches. It has been fixed.

Since a subtle bug was found in mysql-3.22.8-beta, we have switched to the latest version, mysql-3.22.16a-gamma. At the same time we've upgraded the DBI version used and mysql module. Those upgrades are not mandatory.

Catalog now uses the Test module to run tests. This requires perl 5.005. If you were running perl 5.004 (native on RedHat 5.2), you will have to compile the perl 5.005. There is not rpm at the moment.

February 10, 1999

Catalog-0.2 is available. It fixes installation problems, the documentation and some bugs.

The installation process has been made simpler by removing the need set the password and user of the MySQL database after the installation. This was confusing because most people thought it was a fatal error message.

The make test now works with a local invocation of the MySQL daemon to prevent possible corruption of an existing database.

At the request of Lynx users, all images of this site now have alt tags.