Strus query language

Language grammar

The query language used by the command line utility strusQuery does not have too many syntax elements. All query syntax elements are optional. A plain text without operators of the strus query language is always a valid query.

Comments

Comments are starting with # and are reaching to the end of the line. Using # as part of a symbol is possible if it is part of a single or double quoted string.

Handling of spaces

Spaces and control characters and end of lines have no meaning in the language.

Case sensivity/insensivity

Keywords and identifiers referring to element types, metadata fields, section names and feature set identifiers are case insensitive. It depends on the query analyzer configuration if query terms are case sensitive or not.

Relation Query Field / Term

A query field is mapped by the query analyzer to one or more query terms. If used in an expression the query terms resulting from one query field are grouped together implicitly in a sensible way, so that resulting expression still corresponds to the original query expression.

Selection feature

If no selection features are explicitly specified, then the query parser defines one from the set of features specified. You can use the operator '~' to mark features that should not get into the set of selection features implicitly specified.

Query elements

A query consists of expressions of query fields that are mapped by the query analyzer to expressions of query terms. The resulting expressions of query terms are internally represented as trees that can be translated to query instructions sent to the storage for evaluation. The original expressions of query fields are parsed from a query string. Each query field has a type identified by the name that determines how the query analyzer processes the field.
In the simplest case, we have a query string without any syntactic elements that are interpreted as a single query field. In this case, the name of the query field type is determined by the query analyzer program. Default query field names are all search query fields used in the query analyzer program.
If the query analysis gets more complex, using more than one query field, then plain text queries getting default query field names assigned may not make sense anymore.

Syntax elements

If you want to form a query beyond the default case of a single query field, you can use the following syntax elements:

':' TYPE

A colon followed by an identifier <TYPE> specifies the previous phrase or token to have the query field type <TYPE>.
Examples
		Hello:WORD Nature:CATEGORY
	
		Basketball Sports:CATEGORY
	

'~' FIELD

A field following the operator '~' is not considered as selection feature if the selection features are implicitly defined.
Examples
		Hello ~World
	

NAME compareop TERM {',' TERM}

An identifier followed by a compare operator (one out of '<=','>=','=','>;','=','==','!=') and a term <TERM> or a comma-separated list of terms specifies a query restriction. <NAME> is referring to a metadata field and <TERM> to an element to compare the metadata field with. If you specify more than one <TERM>, then the restriction condition is true, if one of the lists fulfills the condition.
Examples
		Date <= '3/3/1979'
	
		Category = 'Sports','Politics'
	

OP '(' ARG { ',' ARG } { '|' RANGE } { '^' CARDINALITY } ')'

An identifier followed by an oval bracket '(' starts a join of posting sets. The Argument features <ARG> are query fields with or without field type name or expressions themselves. Arguments are separated by comma ','. At the end of the argument list, you have the possibility to add a range and a cardinality specifier. The range specifies the proximity of the terms involved and the cardinality specifies the number of elements needed for a valid result in the case of operators selecting a subset of the posting sets represented by the arguments.
Examples
		within( "War", "Religion" | 30 )
	
		sequence_struct( :SENTDELIM, "painting", "exhibition" | 30 )
	
		sequence_imm( any("John", "Anne"), "Doe" )
	

Putting all together

Finally, we present a query with all syntax elements introduced:

Example
		Category = 'Sports','Politics'
		Date <= '3/3/1979'
		university ~graduate
		sequence_imm( any("John", "Anne"), "Doe" )