Wikipedia search on a NUC with Strus

We run a fulltext search engine on the complete Wikipedia collection English (without citations, but with contents of tables) as demo project. The machine we use is an Intel NUC (NUC6i3SYK with 32GB Ram and a 256GB SSD)

Why using a NUC for a demo system ?

With new generation SSDs, non-volatile memory units are grouped closer to the CPU cores of modern servers. The hardware of a NUC is conceptually close to such a server. Just like one node of it. Because of the scalability of Strus we can now make some predictions about how Strus will perform on real servers.


The scripts and is the scripts directory of the strusWikipediaSearch project are needed for building the wikipedia storage for retrieval. They have to be adapted for your use. We suggest to use a stronger machine than a NUC for building the data and the storages. On an Intel NUC the whole process of building the data and the storages will last for about roughly 10 days (4 days NLP + 4 days insert + 2 days Word2vec and some other helpers). This is substantially longer than 5 1/2 hours in a previous version.


For building the data for the wikipeadia search the following steps have to be done: