Strus Python Bindings
0.16.0
Patrick P. Frey
Mozilla Public License v. 2.0 (MPLv2)
Python interface for strus, a set of libraries and programs to build a text search engine

strus.Context

Object holding the global context of the strus information retrieval engine

There a two modes of this context object operating on a different base. If you create this object without parameter, then the context is local. In a local context you can load modules, define resources, etc. If you create this object with a connection string as parameter, then all object created by this context reside on the server (strusRpcServer) addressed with the connection string. In this case loaded modules and resources are ignored. What modules to use is then specified on server startup.

strus.StorageClient

Object representing a client connection to the storage

The only way to construct a storage client instance is to call Context::createStorageClient( config)

strus.StorageTransaction

Object representing a transaction of the storage

The only way to construct a storage transaction instance is to call StorageClient::createTransaction()

strus.Inserter

Object representing a client connection to the storage and an analyzer providing inserts of content to be analyzed

The only way to construct an inserter instance is to call Context::createInserter( storage, analyzer)

strus.InserterTransaction

Object representing a transaction of the storage for insering content to be analyzed

The only way to construct an inserter transaction instance is to call Inserter::createTransaction()

strus.VectorStorageSearcher

Object used to search for similar vectors in the collection

The only way to construct a vector storage searcher instance is to call VectorStorageClient::createSearcher( from, to)

strus.VectorStorageClient

Object representing a client connection to a vector storage

The only way to construct a vector storage client instance is to call Context::createVectorStorageClient( config) or Context::createVectorStorageClient()

strus.VectorStorageTransaction

Object representing a vector storage transaction

The only way to construct a vector storage transaction instance is to call VectorStorageClient::createTransaction()

strus.DocumentAnalyzer

Analyzer object representing a program for segmenting, tokenizing and normalizing a document into atomic parts, that can be inserted into a storage and be retrieved from there.

The only way to construct a document analyzer instance is to call Context::createDocumentAnalyzer( doctype)

strus.QueryAnalyzer

Analyzer object representing a set of function for transforming a field, the smallest unit in any query language, to a set of terms that can be used to build a query.

The only way to construct a document analyzer instance is to call Context::createQueryAnalyzer()

strus.QueryEval

Query evaluation program object representing an information retrieval scheme for documents in a storage.

The only way to construct a query eval instance is to call Context::createQueryEval()

strus.Query

Query program object representing a retrieval method for documents in a storage.

The only way to construct a query instance is to call QueryEval::createQuery( storage)

strus.Context

Constructor

Parameter

config
(optional) context configuration. If not defined, create context for local mode with own module loader
{'rpc':"localhost:7181"}
{'trace':"log=dump;file=stdout"}
{'threads':12}

loadModule

Load a module

Parameter

name
name of the module to load
"analyzer_pattern"
"storage_vector"

Remarks

Only implemented in local mode with own module loader (see constructors)

Notes

name
this function is not thread safe and should only be called in the initialization phase before calling endConfig when used in a multithreaded context.

Examples

loadModule( "storage_vector")

Result

no value returned

addModulePath

Add one or more paths from where to try to load modules from

Parameter

paths
a string or a list of module search paths
["/home/bob/modules", "/home/anne/modules"]
"/home/bob/modules"

Remarks

Only implemented in local mode with own module loader (see constructors)

Notes

paths
this function is not thread safe and should only be called in the initialization phase before calling endConfig when used in a multithreaded context.

Examples

addModulePath( "/home/bob/modules")

Result

no value returned

addResourcePath

Add a path where to load analyzer resource files from

Parameter

paths
a string or a list of resource search paths
["/home/bob/resources", "/home/anne/resources"]
"/home/bob/resources"

Remarks

Only implemented in local mode with own module loader (see constructors)

Result

no value returned

defineWorkingDirectory

Define the working directory where files are written to

Parameter

path
a string specifying the working directory
"/srv/strus"

Notes

All paths used for data written must be relative from the working directory, if the working directory is defined

Result

no value returned

endConfig

End the configuration of the context, creates the object builders

Parameter

no parameters defined

Remarks

If this function is not called, then the object builders are created on the first request

Result

no value returned

createStorageClient

Create a storage client instance

Parameter

config
(optional) configuration (string or structure with named elements) of the storage client or undefined, if the default remote storage of the RPC server is chosen

Examples

createStorageClient( )
createStorageClient( "path=/srv/searchengine/storage; metadata=doclen UINT32, date UINT32, docweight FLOAT")
createStorageClient( {'path':"/srv/searchengine/storage", 'metadata':"doclen UINT32, date UINT32, docweight FLOAT"})
createStorageClient( {'path':"/srv/searchengine/storage", 'metadata':"doclen UINT32, date UINT32, docweight FLOAT", 'max_open_files':256, 'write_buffer_size':"4K", 'block_size':"4M", 'cache':"1G"})

Result

storage client interface (class StorageClient) for accessing the storage

createVectorStorageClient

Create a vector storage client instance

Parameter

config
(optional) configuration (string or structure with named elements) of the storage client or undefined, if the default remote vector storage of the RPC server is chosen

Examples

createVectorStorageClient( )
createVectorStorageClient( "path=/srv/searchengine/vecstorage")
createVectorStorageClient( {'path':"/srv/searchengine/vecstorage"})

Result

vector storage client interface (class VectorStorageClient) for accessing the vector storage

createStorage

Create a new storage (physically) described by config

Parameter

config
storage configuration (string or structure with named elements)

Remarks

Fails if the storage already exists

Examples

createStorage( "path=/srv/searchengine/storage; metadata=doclen UINT32, date UINT32, docweight FLOAT; acl=yes")
createStorage( {'path':"/srv/searchengine/storage", 'metadata':"doclen UINT32, date UINT32, docweight FLOAT", 'acl':True})

Result

no value returned

createVectorStorage

Create a new storage (physically) described by config

Parameter

config
storage configuration (string or structure with named elements)

Remarks

Fails if the storage already exists

Examples

createVectorStorageClient( "path=/srv/searchengine/vecstorage")
createVectorStorageClient( {'path':"/srv/searchengine/vecstorage"})

Result

no value returned

destroyStorage

Delete the storage (physically) described by config

Parameter

config
storage configuration (string or structure with named elements)

Remarks

Handle this function carefully

Notes

Works also on vector storages

Examples

destroyStorage( "path=/srv/searchengine/storage")
destroyStorage( {'path':"/srv/searchengine/storage"})

Result

no value returned

storageExists

Tests if the storage described by config exists

Parameter

config
storage configuration (string or structure with named elements)

Notes

Works also on vector storages, it does not distinguish between those

Examples

storageExists( "path=/srv/searchengine/storage")
storageExists( {'path':"/srv/searchengine/storage"})

Result

true, if the storage with this configuration exists

detectDocumentClass

Detect the type of document from its content

Parameter

content
the document content to classify

Examples

detectDocumentClass( "<?xml version='1.0' encoding='UTF-8'?><doc>...</doc>")

Result

the document class
{'mimetype':"application/xml", 'encoding':"UTF-8", 'scheme':"customer"}
{'mimetype':"application/json", 'encoding':"UTF-8"}

createDocumentAnalyzer

Create a document analyzer instance

Parameter

doctype
structure describing the segmenter to use (either document class description structure or segmenter name)
{'mimetype':"application/xml", 'encoding':"UTF-8", 'scheme':"customer"}
{'mimetype':"application/json", 'encoding':"UTF-8"}
{'segmenter':"textwolf"}
"application/json"
"json"

Examples

createDocumentAnalyzer( {'mimetype':"application/xml", 'encoding':"UTF-8"})

Result

document analyzer interface (class DocumentAnalyzer)

createQueryAnalyzer

Create a query analyzer instance

Parameter

no parameters defined

Examples

createQueryAnalyzer( )

Result

query analyzer interface (class QueryAnalyzer)

createQueryEval

Create a query evaluation instance

Parameter

no parameters defined

Examples

createQueryEval( )

Result

query evaluation interface (class QueryEval)

createInserter

Create an inserter based on a storage and a document analyzer

Parameter

storage
storage client to insert into
analyzer
document analyzer to use for preparing the documents to insert

Examples

createInserter( storage, analyzer)

Result

inserter interface (class Inserter)

unpackStatisticBlob

Unpack a statistics blob retrieved from a storage

Parameter

blob
binary blob with statistics to decode (created by StorageClient:getAllStatistics or StorageClient:getChangeStatistics)
procname
(optional) name of statistics processor to use for decoding the message (use default processor, if not defined)
"default"
""

Result

the statistics structure encoded in the blob passed as argument

close

Force cleanup to circumvent object pooling mechanisms in an interpreter context

Parameter

no parameters defined

Result

no value returned

debug_serialize

Debug method that returns the serialization of the arguments as string

Parameter

arg
structure to serialize as string for visualization (debuging)
deterministic
(optional) true, if output is deterministic

Notes

this function is used for verifying if the deserialization of binding language data structures work as expected

Examples

debug_serialize( {'surname':"John", 'lastname':"Doe", 'company':{'name':"ACME", 'url':"acme.com"}})

Result

the input serialization as string
"open name 'surname' value 'John' name 'lastname' value 'Doe' name 'company' open name 'name' value 'ACME' name 'url' value 'acme.com' close close"

introspection

Introspect a structure starting from a root path

Parameter

path
list of idenfifiers describing the access path to the element to introspect
["queryproc", "weightfunc"]
["weightfunc"]
["env"]

Result

the structure to introspect starting from the path

enableDebugTrace

Enable the debug trace interface for a named component for the current thread

Parameter

component
name of component to enable debug tracing for

Result

no value returned

disableDebugTrace

Disable the debug trace interface for a named component for the current thread

Parameter

component
name of component to disable debug tracing for

Result

no value returned

fetchDebugTrace

Fetch all debug trace messages of the current thread

Parameter

no parameters defined

Notes

return
Clears all messages stored for the current thread

Result

all messages

nofDocumentsInserted

Get the number of documents inserted into this storage

Parameter

no parameters defined

Result

the total number of documents
112324
9873247

documentFrequency

Get the number of inserted documents where a specific feature occurrs in

Parameter

type
the term type of the feature queried
"WORD"
"stem"
"Name"
term
the term value of the feature queried
"John"
"21314"
"Z0 ssd-qx"

Result

the number of documents where the argument feature occurrs in
12321
98

documentNumber

Get the internal document number from the document identifier

Parameter

docid
document identifier
"doc://2132093"
"http://www.acme.com/pub/acme?D=232133"

Result

internal document number or 0, if no document with this id is inserted
892374
1233
0

documentForwardIndexTerms

Get an interator on the tuples (value,pos) of the forward index of a given type for a document

Parameter

docno
internal local document number
312332
termtype
term type string
"WORD"
pos
(optional) ordinal start position in forward index (where to start iterating)
311

Result

iterator on tuples (value,pos)

documentSearchIndexTerms

Get an interator on the tuples (value,tf,firstpos) of the search index of a given type for a document

Parameter

docno
internal local document number
123
termtype
term type string
"stem"

Result

iterator on tuples (value,pos)

postings

Get an iterator on the set of postings inserted

Parameter

expression
query term expression
["sequence", 10, ["sequence", 2, ["word", "complet"], ["word", "diff"]], ["sequence", 3, ["word", "you"], ["word", "expect"]]]
["word", "hello"]
["sequence", 2, ["word", "ch"], ["number", "13"]]
restriction
(optional) meta data restrictions
[[["=", "country", 12], ["=", "country", 17]], ["<", "year", "2007"]]
["<", "year", "2002"]
start_docno
(optional) starting document number
973141
873

Result

iterator on a set of postings

select

Get an iterator on records of selected elements for matching documents starting from a specified document number

Parameter

what
list of items to select: names of document attributes or meta data or "position" for matching positions or "docno" for the document number
["docno", "title", "position"]
expression
query term expression
["within", 5, ["word", "world"], ["word", "money"]]
["word", "hello"]
["sequence", 2, ["word", "ch"], ["number", "13"]]
restriction
(optional) meta data restrictions
[[["=", "country", 12], ["=", "country", 17]], ["<", "year", "2007"]]
["<", "year", "2002"]
start_docno
(optional) starting document number
973141
873
accesslist
(optional) list of access restrictions (one of them must match)
["public", "devel"]

Result

iterator on a set of postings

termTypes

Get an iterator on the term types inserted

Parameter

no parameters defined

Result

iterator on the term types

docids

Get an iterator on the document identifiers inserted

Parameter

no parameters defined

Result

iterator on the docids

docid

Get the document identifier associated with a local document number

Parameter

docno
local document number queried
79213
1

Result

the document identifier

usernames

Get an iterator on the user names (roles) used in document access restrictions

Parameter

no parameters defined

Result

iterator on the user names (roles)

attributeNames

Get the list of inserted document attribute names

Parameter

no parameters defined

Result

list of names
["name", "title", "docid"]

metadataNames

Get the list of inserted document metadata names

Parameter

no parameters defined

Result

list of names
["date", "ccode", "category"]

getAllStatistics

Get an iterator on message blobs that all statistics of the storage (e.g. feature occurrencies and number of documents inserted)

Parameter

sign
(optional) true = registration, false = deregistration, if false the sign of all statistics is inverted

Notes

The blobs an be decoded with Context::unpackStatisticBlob

Result

iterator on the encoded blobs of the complete statistics of the storage

getChangeStatistics

Get an iterator on message blobs that encode changes in statistics of the storage (e.g. feature occurrencies and number of documents inserted)

Parameter

no parameters defined

Notes

The blobs an be decoded with Context::unpackStatisticBlob

Result

iterator on the encoded blobs of the statistic changes of the storage

createTransaction

Create a transaction

Parameter

no parameters defined

Result

the transaction object (class StorageTransaction) created

config

Get the configuration of this storage

Parameter

no parameters defined

Result

the configuration as structure
{'path':"/srv/searchengine/storage", 'metadata':"doclen UINT32, date UINT32, docweight FLOAT"}

configstring

Get the configuration of this storage as string

Parameter

no parameters defined

Result

the configuration as string
"path=/srv/searchengine/storage; metadata=doclen UINT32, date UINT32, docweight FLOAT"

close

Close of the storage client

Parameter

no parameters defined

Result

no value returned

introspection

Introspect a structure starting from a root path

Parameter

path
list of idenfifiers describing the access path to the element to introspect
["config"]
["termtypes"]
["attributenames"]
["metadatanames"]

Result

the structure to introspect starting from the path

insertDocument

Prepare the inserting a document into the storage

Parameter

docid
the identifier of the document to insert
doc
the structure of the document to insert (analyzer::Document)

Notes

The document is physically inserted with the call of 'commit()'

Result

no value returned

deleteDocument

Prepare the deletion of a document from the storage

Parameter

docid
the identifier of the document to delete

Notes

The document is physically deleted with the call of 'commit()'

Result

no value returned

deleteUserAccessRights

Prepare the deletion of all document access rights of a user

Parameter

username
the name of the user to delete all access rights (in the local collection)

Notes

The user access rights are changed accordingly with the next implicit or explicit call of 'flush'

Result

no value returned

commit

Commit all insert or delete or user access right change statements of this transaction.

Parameter

no parameters defined

Remarks

throws an error on failure

Result

no value returned

rollback

Rollback all insert or delete or user access right change statements of this transaction.

Parameter

no parameters defined

Result

no value returned

createTransaction

Create a transaction

Parameter

no parameters defined

Result

the transaction object (class InserterTransaction) created

insertDocument

Prepare the inserting a document into the storage

Parameter

docid
the identifier of the document to insert or empty if document id is extracted by analyzer
doc
plain content of the document to analyze and insert
documentClass
(optional) (optional) document class of the document to insert (autodetection if undefined)

Notes

The document is physically inserted with the call of 'commit()'

Result

no value returned

deleteDocument

Prepare the deletion of a document from the storage

Parameter

docid
the identifier of the document to delete

Notes

The document is physically deleted with the call of 'commit()'

Result

no value returned

deleteUserAccessRights

Prepare the deletion of all document access rights of a user

Parameter

username
the name of the user to delete all access rights (in the local collection)

Notes

The user access rights are changed accordingly with the next implicit or explicit call of 'flush'

Result

no value returned

commit

Commit all insert or delete or user access right change statements of this transaction.

Parameter

no parameters defined

Remarks

throws an error on failure

Result

no value returned

rollback

Rollback all insert or delete or user access right change statements of this transaction.

Parameter

no parameters defined

Result

no value returned

findSimilar

Find the most similar vectors to vector

Parameter

vec
vector to search for (double[])
maxNofResults
maximum number of results to return

Result

the list of most similar vectors (double[])

findSimilarFromSelection

Find the most similar vectors to vector in a selection of features addressed by index

Parameter

featidxlist
list of candidate indices (int[])
vec
vector to search for (double[])
maxNofResults
maximum number of results to return

Result

the list of most similar vectors (double[])

close

Controlled close to free resources (forcing free resources in interpreter context with garbage collector)

Parameter

no parameters defined

Result

no value returned

createSearcher

Create a searcher object for scanning the vectors for similarity

Parameter

range_from
start range of the features for the searcher (possibility to split into multiple searcher instances)
0
1000000
range_to
end of range of the features for the searcher (possibility to split into multiple searcher instances)
1000000
2000000

Result

the vector search interface (with ownership)

createTransaction

Create a vector storage transaction instance

Parameter

no parameters defined

Result

the transaction instance

conceptClassNames

Get the list of concept class names defined

Parameter

no parameters defined

Result

the list
["flections", "entityrel"]

conceptFeatures

Get the list of indices of features represented by a learnt concept feature

Parameter

conceptClass
name identifying a class of concepts learnt
"flections"
"entityrel"
""
conceptid
index (indices of learnt concepts starting from 1)
1
121
3249432

Result

the resulting vector indices (index is order of insertion starting from 0)
[2121, 5355, 35356, 214242, 8309732, 32432424]

nofConcepts

Get the number of concept features learnt for a class

Parameter

conceptClass
name identifying a class of concepts learnt.
"entityrel"
""

Result

the number of concept features and also the maximum number assigned to a feature (starting with 1)
0
3535
324325
2343246

featureConcepts

Get the set of learnt concepts of a class for a feature defined

Parameter

conceptClass
name identifying a class of concepts learnt
"flections"
""
index
index of vector in the order of insertion starting from 0
0
3785
123325
8793246

Result

the resulting concept feature indices (indices of learnt concepts starting from 1)
[2121, 5355, 35356, 214242, 8309732, 32432424]

featureVector

Get the vector assigned to a feature addressed by index

Parameter

index
index of the feature (starting from 0)
0
3785

Result

the vector
[0.08721391, 0.01232134, 0.02342453, 0.0011312, 0.0012314, 0.087232243]

featureName

Get the name of a feature by its index starting from 0

Parameter

index
index of the feature (starting from 0)
0
71243

Result

the name of the feature defined
"castle"

featureIndex

Get the index starting from 0 of a feature by its name

Parameter

name
name of the feature
"castle"

Result

index -1, if not found, else index of the feature to get the name of (index is order of insertion starting with 0)
-1
52636

nofFeatures

Get the number of feature vectors defined

Parameter

no parameters defined

Result

the number of features
0
15612336

config

Get the configuration of this vector storage

Parameter

no parameters defined

Result

the configuration as structure
{'path':'storage', 'commit':10, 'dim':300, 'bit':64, 'var':32, 'simdist':340, 'maxdist':640, 'realvecweights':1}

configstring

Get the configuration of this vector storage as string

Parameter

no parameters defined

Result

the configuration as string
"path=storage;commit=10;dim=300;bit=64;var=32;simdist=340;maxdist=640;realvecweights=1"

close

Controlled close to free resources (forcing free resources in interpreter context with garbage collector)

Parameter

no parameters defined

Result

no value returned

addFeature

Add named feature to vector storage

Parameter

name
unique name of the feature added
"castle"
"conquest"
vec
vector assigned to the feature
[0.08721391, 0.01232134, 0.02342453, 0.0011312, 0.0012314, 0.087232243]

Result

no value returned

defineFeatureConceptRelation

Assign a concept (index) to a feature referenced by index

Parameter

conceptClass
name of the relation
"entityrel"
featidx
index of the feature
1242321
conidx
index of the concept
32874

Result

no value returned

commit

Commit of the transaction

Parameter

no parameters defined

Remarks

throws an error on failure

Result

no value returned

rollback

Rollback of the transaction

Parameter

no parameters defined

Result

no value returned

close

Controlled close to free resources (forcing free resources in interpreter context with garbage collector)

Parameter

no parameters defined

Result

no value returned

addSearchIndexFeature

Define a feature to insert into the inverted index (search index) is selected, tokenized and normalized

Parameter

type
type of the features produced (your choice)
"word"
"stem"
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of option strings, one of {"content" => feature has own position, "unique" => feature gets position but sequences or "unique" features without "content" features in between are mapped to one position, "pred" => the position is bound to the preceeding feature, "succ" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"

Examples

addSearchIndexFeature( "word", "/doc/elem", "word", ["lc", ["stem", "en"]])

Result

no value returned

addForwardIndexFeature

Define a feature to insert into the forward index (for summarization) is selected, tokenized and normalized

Parameter

type
type of the features produced
"word"
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of options, one of {"content" => feature has own position, "unique" => feature gets position but sequences or "unique" features without "content" features in between are mapped to one position, "pred" => the position is bound to the preceeding feature, "succ" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"

Result

no value returned

addPatternLexem

Declare an element to be used as lexem by post processing pattern matching but not put into the result of document analysis

Parameter

type
term type name of the lexem to be feed to the pattern matching
"word"
selectexpr
an expression that decribes what elements are taken from a document for this feature (tag selection in abbreviated syntax of XPath)
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer (ownership passed to this) to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizers (element ownership passed to this) to use for this feature
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]

Result

no value returned

defineMetaData

Define a feature to insert as meta data is selected, tokenized and normalized

Parameter

fieldname
name of the addressed meta data field.
"date"
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]

Result

no value returned

defineAggregatedMetaData

Declare some aggregated value of the document to be put into the meta data table used for restrictions, weighting and summarization.

Parameter

fieldname
name of the addressed meta data field.
"doclen"
function
defining how and from what the value is aggregated
["count", "word"]

Result

no value returned

defineAttribute

Define a feature to insert as document attribute (for summarization) is selected, tokenized and normalized

Parameter

attribname
name of the addressed attribute.
"docid", "title"
selectexpr
expression selecting the elements to fetch for producing this feature
"/doc/text//()"
"/doc/user@id"
"/doc/text[@lang='en']//()"
tokenizer
tokenizer function description to use for this feature
"split"
["regex", "[0-9]+"]
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]

Result

no value returned

addSearchIndexFeatureFromPatternMatch

Define a result of pattern matching as feature to insert into the search index, normalized

Parameter

type
type name of the feature to produce.
"concept"
patternTypeName
name of the pattern to select
"word"
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of option strings, one of {"content" => feature has own position, "unique" => feature gets position but sequences or "unique" features without "content" features in between are mapped to one position, "pred" => the position is bound to the preceeding feature, "succ" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"

Result

no value returned

addForwardIndexFeatureFromPatternMatch

Define a result of pattern matching as feature to insert into the forward index, normalized

Parameter

type
type name of the feature to produce.
"concept"
patternTypeName
name of the pattern to select
"word"
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]
options
(optional) a list of options, elements one of {"BindPosPred" => the position is bound to the preceeding feature, "BindPosSucc" => the position is bound to the succeeding feature}
"content"
"unique"
"succ"
"pred"

Result

no value returned

defineMetaDataFromPatternMatch

Define a result of pattern matching to insert as metadata, normalized

Parameter

fieldname
field name of the meta data element to produce.
"location"
patternTypeName
name of the pattern to select
"word"
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]

Result

no value returned

defineAttributeFromPatternMatch

Define a result of pattern matching to insert as document attribute, normalized

Parameter

attribname
name of the document attribute to produce.
"annotation"
patternTypeName
name of the pattern to select
"word"
normalizers
list of normalizer function descriptions to use for this feature in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]

Result

no value returned

definePatternMatcherPostProc

Declare a pattern matcher on the document features after other query analysis

Parameter

patternTypeName
name of the type to assign to the pattern matching results
"location"
patternMatcherModule
module id of pattern matcher to use (empty string for default)
""
lexems
list of all lexems generated by the feeder (analyzer)
"word"
["word", "number"]
patterns
structure with all patterns

Result

no value returned

definePatternMatcherPostProcFromFile

Declare a pattern matcher on the document features after other query analysis

Parameter

patternTypeName
name of the type to assign to the pattern matching results
"location"
patternMatcherModule
module id of pattern matcher to use (empty string for default)
""
serializedPatternFile
path to file with serialized (binary) patterns
"/srv/strus/patterns.bin"

Result

no value returned

defineSubDocument

Declare a sub document for the handling of multi part documents in an analyzed content or documents of different types with one configuration

Parameter

subDocumentTypeName
type name assinged to this sub document
"employee"
selectexpr
an expression that defines the content of the sub document declared
"/doc/employee"

Notes

Sub documents are defined as the sections selected by the expression plus some data selected not belonging to any sub document.

Result

no value returned

defineSubContent

Declare a sub parrt of a document with a different document class, needing a switching of the segmenter

Parameter

selectexpr
an expression that defines the area of the sub content
"/doc/content"
documentClass
document class of the content, determines what segmenter to use for this part
{'mimetype':"application/json", 'encoding':"UTF-8"}

Result

no value returned

analyzeSingle

Analye a content and return the analyzed document structure (analyzing single document)

Parameter

content
content string (NOT a file name !) of the document to analyze
"<?xml version='1.0' encoding='UTF-8' standalone=yes?><doc>...</doc>"
documentClass
(optional) document class of the document to analyze, if not specified the document class is guessed from the content with document class detection
{'mimetype':"application/xml", 'encoding':"UTF-8", 'scheme':"customer"}
{'mimetype':"application/json", 'encoding':"UTF-8"}

Result

structure of the document analyzed (sub document type names, search index terms, forward index terms, metadata, attributes)

analyzeMultiPart

Analye a content and return the analyzed document structures as iterator (analyzing multipart document)

Parameter

content
content string (NOT a file name !) with the documents to analyze
"<?xml version='1.0' encoding='UTF-8' standalone=yes?><doc>...</doc>"
documentClass
(optional) document class of the document set to analyze, if not specified the document class is guessed from the content with document class detection
{'mimetype':"application/xml", 'encoding':"UTF-8", 'scheme':"customer"}
{'mimetype':"application/json", 'encoding':"UTF-8"}

Notes

If you are not sure if to use analyzeSingle or analyzeMultiPart, then take analyzeMultiPart, because it covers analyzeSingle, returning an iterator on a set containing the single document only

Result

iterator on structures of the documents analyzed (sub document type names, search index terms, forward index terms, metadata, attributes)

introspection

Introspect a structure starting from a root path

Parameter

path
list of idenfifiers describing the access path to the element to introspect

Result

the structure to introspect starting from the path

addElement

Defines an element (term, metadata) of query analysis.

Parameter

featureType
element feature type created from this field type
"stem"
"word"
fieldType
name of the field type defined
"text"
"word"
tokenizer
tokenizer function description to use for the features of this field type
"content"
"word"
["regex", "[A-Za-z]+"]
normalizers
list of normalizer function descriptions to use for the features of this field type in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]
["date2int", "d", "%Y-%m-%d"]

Result

no value returned

addElementFromPatternMatch

Defines an element from a pattern matching result.

Parameter

type
element type created from this pattern match result type
"name"
patternTypeName
name of the pattern match result item
"entity"
normalizers
list of normalizer functions
["lc", ["stem", "en"]]

Result

no value returned

addPatternLexem

Declare an element to be used as lexem by post processing pattern matching but not put into the result of query analysis

Parameter

termtype
term type name of the lexem to be feed to the pattern matching
"titlestem"
"titleword"
fieldtype
type of the field of this element in the query
"text"
"word"
tokenizer
tokenizer function description to use for the features of this field type
"content"
"word"
["regex", "[A-Za-z]+"]
normalizers
list of normalizer function descriptions to use for the features of this field type in the ascending order of appearance
"uc"
["lc", ["convdia", "en"]]

Result

no value returned

definePatternMatcherPostProc

Declare a pattern matcher on the query features after other query analysis

Parameter

patternTypeName
name of the type to assign to the pattern matching results
"entity"
"match"
patternMatcherModule
module id of pattern matcher to use (empty string for default)
lexems
list of all lexems generated by the feeder (analyzer)
["word", "number", "name"]
"word"
patterns
structure with all patterns

Result

no value returned

definePatternMatcherPostProcFromFile

Declare a pattern matcher on the query features after other query analysis

Parameter

patternTypeName
name of the type to assign to the pattern matching results
patternMatcherModule
module id of pattern matcher to use (empty string for default)
serializedPatternFile
path to file with serialized (binary) patterns

Result

no value returned

defineImplicitGroupBy

Declare an implicit grouping operation for a query field type. The implicit group operation is always applied when more than one term are resulting from analysis of this field to ensure that you get only one node in the query from it.

Parameter

fieldtype
name of the field type where this grouping operation applies
"text"
"word"
groupBy
kind of selection of the arguments grouped ("position": elements with same position get their own group, "all" (or "" default): all elements of the field get into one group
"position"
"all"
""
opname
query operator name generated as node for grouping
"join"
"within"
range
(optional) positional range attribute for the node used for grouping (0 for no range)
0
2
cardinality
(optional) cardinality attribute for the node used for grouping (0 for all)
0
3

Result

no value returned

declareElementPriority

That all query elements assigned to a feature type get a priority that causes the elimination of all elements with a lower priority that are completely covered by a single element of this type.

Parameter

type
feature type name
"word"
priority
priority value assigned to 'type'
1

Result

no value returned

analyzeTermExpression

Analye a term expression

Parameter

expression
query term expression tree
["within", 5, ["word", "Worlds"], ["word", "powers"]]
["word", "PUBLIC"]

Result

structure analyzed
["within", 5, ["word", "world"], ["word", "power"]]
["word", "public"]

analyzeSingleTermExpression

Analye a unique term expression resulting in a single and unique result

Parameter

expression
query term expression tree
["within", 5, ["word", "Worlds"], ["word", "powers"]]
["word", "PUBLIC"]

Remarks

issues an error if the result does not exist or is not unique

Result

structure analyzed
["within", 5, ["word", "world"], ["word", "power"]]
["word", "public"]

analyzeMetaDataExpression

Analye a metadata expression

Parameter

expression
query metadata expression tree
["<", "year", "26.9.2017"]

Result

structure analyzed
["<", "year", "17071"]

introspection

Introspect a structure starting from a root path

Parameter

path
list of idenfifiers describing the access path to the element to introspect

Result

the structure to introspect starting from the path

addTerm

Declare a term that is used in the query evaluation as structural element without beeing part of the query (for example punctuation used for match fields summarization)

Parameter

set
identifier of the term set that is used to address the terms
"eos"
type
feature type of the of the term
"punct"
value
feature value of the of the term
"."

Result

no value returned

addSelectionFeature

Declare a feature set to be used as selecting feature

Parameter

set
identifier of the term set addressing the terms to use for selection
"select"

Result

no value returned

addRestrictionFeature

Declare a feature set to be used as restriction

Parameter

set
identifier of the term set addressing the terms to use as restriction
"restrict"

Result

no value returned

addExclusionFeature

Declare a feature set to be used as exclusion

Parameter

set
identifier of the term set addressing the terms to use as exclusion
"exclusion"

Result

no value returned

addSummarizer

Declare a summarizer

Parameter

name
the name of the summarizer to add
"matchphrase"
"matchpos"
"attribute"
parameter
the parameters of the summarizer to add (parameter name 'debug' reserved for declaring the debug info attribute)
{'sentencesize':40, 'windowsize':100, 'cardinality':5}
resultnames
(optional) the mapping of result names

Examples

addSummarizer( "attribute", [["name", "docid"], ["debug", "debug_attribute"]])
addSummarizer( "metadata", [["name", "cross"], ["debug", "debug_metadata"]])

Result

no value returned

addWeightingFunction

Add a weighting function to use as summand of the total document weight

Parameter

name
the name of the weighting function to add
"BM25"
parameter
the parameters of the weighting function to add
{'b':0.75, 'k':1.2, 'avgdoclen':1000, 'match':{'feature':"seek"}, 'debug':"debug_weighting"}

Examples

addWeightingFunction( "tf", {'match':{'feature':"seek"}, 'debug':"debug_weighting"})

Result

no value returned

defineWeightingFormula

Define the weighting formula to use for calculating the total weight from the weighting function results (sum of the weighting function results is the default)

Parameter

source
of the weighting formula
"_0 / ln( _1 + 1)"
defaultParameter
(optional) default parameter values

Examples

defineWeightingFormula( "_0 / _1")

Result

no value returned

createQuery

Create a query to instantiate based on this query evaluation scheme

Parameter

storage
storage to execute the query on

Result

the query instance

addFeature

Create a feature from the query expression passed

Parameter

set
name of the feature set, this feature is addressed with
"select"
"seek"
expr
query expression that defines the postings of the feature and the variables attached
["contains", 0, 1, ["word", "hello"], ["word", "world"]]
{'from':"title_start", 'to':"title_end"}
weight
(optional) individual weight of the feature in the query
0.75
1.0
2.5

Remarks

expr
The query expression passed as parameter is refused if it does not contain exactly one element

Examples

addFeature( "select", ["contains", 0, 1, ["word", "hello"], ["word", "world"]])
addFeature( "titlefield", {'from':"title_start", 'to':"title_end"})

Result

no value returned

addMetaDataRestriction

Define a meta data restriction

Parameter

expression
meta data expression tree interpreted as CNF (conjunctive normalform "AND" of "OR"s)
[[["=", "country", 12], ["=", "country", 17]], ["<", "year", "2007"]]
["<", "year", "2002"]

Notes

expression
leafs of the expression tree are 3-tuples of the form {operator,name,operand} with operator: one of "=","!=",">=","<=","<",">" name: name of meta data element value: numeric value to compare with the meta data field (right side of comparison operator) if the tree has depth 1 (single node), then it is interpreted as single condition if the tree has depth 2 (list of nodes), then it is interpreted as intersection "AND" of its leafs an "OR" of conditions without "AND" is therefore expressed as list of list of structures, e.g. '[[["<=","date","1.1.1970"], [">","weight",1.0]]]' <=> 'date <= "1.1.1970" OR weight > 1.0' and '[["<=","date","1.1.1970"], [">","weight",1.0]]' <=> 'date <= "1.1.1970" AND weight > 1.0'

Result

no value returned

defineTermStatistics

Define term statistics to use for a term for weighting it in this query

Parameter

type
query term type name
"word"
value
query term value
"game"
stats
the structure with the statistics to set
{'df':74653}

Examples

defineTermStatistics( "word", "game", {'df':74653})

Result

no value returned

defineGlobalStatistics

Define the global statistics to use for weighting in this query

Parameter

stats
the structure with the statistics to set
{'nofdocs':1234331}

Result

no value returned

addDocumentEvaluationSet

Define a set of documents the query is evaluated on. By default the query is evaluated on all documents in the storage

Parameter

docnolist
list of documents to evaluate the query on (array of positive integers)
[1, 23, 2345, 3565, 4676, 6456, 8855, 12203]

Result

no value returned

setMaxNofRanks

Set number of ranks to evaluate starting with the first rank (the maximum size of the result rank list)

Parameter

maxNofRanks
maximum number of results to return by this query
20
50
5

Result

no value returned

setMinRank

Set the index of the first rank to be returned

Parameter

minRank
index of the first rank to be returned by this query
10
20

Result

no value returned

addAccess

Allow read access to documents having a specific ACL tag

Parameter

rolelist
Add ACL tag or list of ACL tags that selects documents to be candidates of the result
["public", "devel", "sys"]

Notes

If no ACL tags are specified, then all documents are potential candidates for the result

Result

no value returned

setWeightingVariables

Assign values to variables of the weighting formula

Parameter

parameter
parameter values (map of variable names to floats)

Result

no value returned

setDebugMode

Switch on debug mode that creates debug info of query evaluation methods and summarization as attributes of the query result

Parameter

debug
true if switched on, false if switched off (default off)

Notes

Debug attributes are specified in the declaration of summarizers and weighting functions (3rd parameter of QueryEval::addSummarizer and QueryEval::addWeightingFunction)

Result

no value returned

evaluate

Evaluate this query and return the result

Parameter

no parameters defined

Result

the result (strus::QueryResult)

tostring

Map the contents of the query to a readable string

Parameter

no parameters defined

Result

the string
generated by papugaDoc (Strus 0.16.0)