Strus documentation

Project organization

The whole Strus universe is implemented in several subprojects hosted on github. Here is a list of its subprojects:

  1. strusBase implements some helper functions as the common code base. It also defines the error buffer interface for buffering exception error messages. Exceptions cannot be thrown across borders of dynamically linked libraries and modules. All other Strus projects depend on strusBase.
  2. strus provides the interface to the storage and the query processing of the search engine. It also provides the default key value store database connector (to LevelDB).
  3. strusAnalyzer provides document and query analysis for transforming content into retrievable items.
  4. strusTrace implements all methods of the strus analyzer and core as a proxy that logs its calls made via a specified interface. The whole mechanism is implemented as own aspect without touching any code of the core or the analyzer. The produced trace can be visualized as call tree or the logs can be processed as a readable dump.
  5. strusPattern provides an implementation of the analyzer pattern matching interface with a lexer based on the Intel hyperscan library.
  6. strusVector provides an implementation of the strus vector storage interface with a search for the nearest vectors implemented with brute force LSH (Local Sensitive Hashing).
  7. strusModule provides the loading of search engine components from dynamically loadable modules. (depends on strus and strusAnalyzer)
  8. strusRpc provides a proxy interface for strus objects residing on a server via RPC. If you want to use strus in a web server context, where loading modules by another instance than the web server itself is not allowed or at least not recommended, you should access strus via RPC or a similar mechanism. (depends on strus, strusAnalyzer, strusModule)
  9. strusUtilities provides some command line programs to access the search engine. (Depends on strusModule and strusRpc
  10. strusBindings provides language bindings to use strus with other programming languages like PHP, Python, Java, etc... (Depends on strusModule and strusRpc)

Interface documentation

Strus provides two classes of interfaces with diametric objectives. One is the C++ interface that is the base of the implementation and the other is the interface for language bindings with implementations that wrap to the C++ interface.

C++ interfaces

Language bindings

The language binding interfaces are for other programming languages calling strus functions. The following list gives you the pointers to the bindings of the languages supported till now:

PHP logo  Java  Python logo

Functions documentation

When writing an application with Strus, you have various functions of a different kind at your hands. You can write your functions in C++ and load them dynamically into your application. But there exist also a lot of predefined functions in strus. You find a complete list of the built-in functions of the core and the analyzer here.

Command line tools documentation

You do not need the command line tools of strus. All functionality is accessible with the API.
But there exist a lot of command line tools helpful for access and maintenance of a strus storage. A list of the standard command line tools and their documentation can be found here (utilities).

Developer documentation

Programming guidelines

The programming guidelines contributors should respect, can be found here. Suggestions for strenghtening these rules are welcome.

Writing Strus extension modules in C++

Strus core

The Strus core can be extended with dynamically loadable modules with weighting functions, summarizers and posting join operators written in C++. This codeproject article writes about the expandability of Strus and tries to explain the basic models and concepts used in the query evaluation.

Strus analyzer

You can also extend the Strus analyzer with own dynamically loadable modules with functions written in C++. A codeproject article about the expandability of the Strus analyzer is planned. Here is for now a short list of components you can write as dynamically loadable analyzer modules for Strus:

  1. Segmenter
    You can define your own segmenters for the document formats you need to process.
  2. Tokenizer
    You can define your own tokenizers splitting the document segments into tokens.
  3. Normalizer
    You can define your own normalizer functions to produce the retrievable items from the document tokens for the storage and the query.
  4. Aggregator
    You can define your own aggregator functions to produce some statistical values from the document structure after analysis.