Strus query evaluation configuration source

Language grammar

The following grammar (as EBNF) describes the formal language for describing a query evaluation scheme used by the strus utilities (strusUtilities).

Comments

Comments are starting with # and are reaching to the end of the line. Using # as part of a symbol is possible if it is part of a single or double quoted string.

Handling of spaces

Spaces, control characters and end of lines have no meaning in the language.

Case sensivity/insensivity

Parameter names (keys) of the query evaluation scheme are case insensitive. Keywords and identifiers referring to elements in the storage are case insensitive.

EBNF

IDENTIFIER     : [A-Za-z][A-Za-z0-9_]*
STRING         : <single or double quoted string with backslash escaping>
NUMBER         : <integer or floating point number in non exponential notation>

config         = statement ";" config
               |
               ;
statement      = evalexpr | selectexpr | weightexpr | restrictexpr | termdef | evalexpr
               ;
evalexpr       = "EVAL" [ NUMBER "*" ] functionname "(" parameterlist ")" ;
               ;
scalarexpr     = "FORMULA" STRING ;
               ;
selectexpr     = "SELECT" featureset ";"
               ;
weightexpr     = "WEIGHT" featureset ";"
               ;
restrictexpr   = "RESTRICT" featureset ";"
               ;
termdef        = "TERM" featureset termvalue termtype
               ;
evalexpr       = "SUMMARIZE" functionname "(" parameterlist ")"
               ;
functionname   = IDENTIFIER
               ;
featureset     = IDENTIFIER
               ;
termtype       = IDENTIFIER
               ;
termvalue      = IDENTIFIER | STRING
               ;
parameterlist  = parameter { "," parameter }
               |
parameter      = parametername "=" parametervalue
               ;
parametername  = [ "." ] IDENTIFIER
               ;
parametervalue = IDENTIFIER | STRING | NUMBER
               ;

Meaning of the grammar elements

functionname

Name of the weighting or summarization function as provided by the query processor.

parametername

Name of the parameter passed to the weighting or summarization function. A parameter name with dot '.' as prefix is specifying a feature parameter declaration. The known names of weighting and summarization function depend on its implementation.

EVAL function

Defines a query evaluation function used for weighting

FORMULA scalar-function

Defines a scalar function (with _0,_1,.. referring to query evaluation function results in order of their definition) used to combine query evaluation function results to one result. If the specified, the different results are just added up to one.

SUMMARIZE function

Defines a summarizer function used for building the results

SELECT featureset

Defines the feature set used for selection of the documents to weight

WEIGHT featureset

Defines the feature set used for weighting

RESTRICT featureset

Defines the feature set used as restriction

Example

The following example declares the feature set 'selfeat' to define what is weighted. All documents containing the feature 'selfeat' will be selected for ranking.
As weighting function we take the arithmetic sum of the 'bm25' weight of the document plus 3 times the value of the meta data element called 'pageweight'.
For presentation of the result we use the summarizer extracting the title attribute and taking the content elements of the best matching phrases.

SELECT selfeat;

EVAL bm25( k1=0.75, b=2.1, avgdoclen=1000, .match=docfeat );
EVAL metadata( name=pageweight );
FORMULA "0.7 * _1 * _0 + 0.3 * _0";

SUMMARIZE title = attribute( name=title );
SUMMARIZE content = matchphrase(
                        type=orig, nof=4, len=60,
                        structseek=40, .struct=sentence, .match=docfeat );