Index and search text from local files using ElasticSearch and sist2

The following article explains how to install and run ElasticSearch with sist2 to index files and search your local machine using Docker. Index content from *.pdf, *.pptx, *.docx, and other such files and use browser UI to search text from inside files…

The following article explains how to install and run ElasticSearch with sist2 to index files and search your local machine using Docker. Index content from *.pdf, *.pptx, *.docx, and other such files and use browser UI to search text from inside files.

The motivation for this stack, for me at least, was to prepare for exams. 🤭 Some courses shared 40+ .pdf’s whereas some files had over 100+ pages, so CTRL+F is out of the question. But I still needed to find the right files fast, for which sist2 is great! It gives a small preview of the file and the content. Controlling the length of the returned text is possible via web UI (look at tip#2).

Of course, this doesn’t mean that’s the only use for this tool. Take a look at the sist2 documentation for inspiration to index something else.

I looked at different options to index local files and make them searchable. I found some applications online, but most seemed to be designed for Windows 98. Besides, I wanted this stack to be shareable with coursemates (so they could use this on open-book exams as well).

Alternatives that I tried

Fscrawler using ElasticSearch and Kibana.🤔 Though it worked, I ended up not using that stack. Getting Fscrawler to work on Windows was hard due to the conflicting versions and dependencies required. Besides, Kibana may not be that intuitive for the first time. Also, file thumbnail generation was not supported, and getting more text previews in Kibana or opening files via localhost links was too much granular configuration for me to do.

Tip #1

Before you start indexing, convert .doc, .pptx, etc. file types that require external tools to be opened into .pdf. This allows you to open files inside the browser.

Tip #2

In sist2 UI, go to settings -> Highlight context size in characters — to increase the length of text preview.

How to use

  1. Install Docker
  2. Download start.bat and docker-compose.yml
  3. Run start.bat

Source

Instructions and source can be found on GitHub https://github.com/Nurech/sist2_index_files

CMD script

Does basic setup for folders, pulling of the images, and then running Docker Compose. Drag the files you want to be indexed into the\documents folder. Containers run exactly once, meaning the scan will index files you currently have in the documents folder. If you want indexing and scanning to work in some intervals, you have to set up windows service to run containers periodically that do the indexing and scanning jobs.

https://raw.githubusercontent.com/Nurech/sist2_index_files/main/start.bat

Docker Compose

Starts stack of ElasticSearch and sist2 images.

  1. Start ElasticSearch
  2. Scan files and make an index
  3. Send index to ElasticSearch
  4. Launch web UI to view indexed files

https://raw.githubusercontent.com/Nurech/sist2_index_files/main/docker-compose.yml

Thanks for reading!

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job


Index and search text from local files using ElasticSearch and sist2 was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


Print Share Comment Cite Upload Translate
APA
Joosep Parts | Sciencx (2024-03-29T13:41:12+00:00) » Index and search text from local files using ElasticSearch and sist2. Retrieved from https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/.
MLA
" » Index and search text from local files using ElasticSearch and sist2." Joosep Parts | Sciencx - Monday November 28, 2022, https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/
HARVARD
Joosep Parts | Sciencx Monday November 28, 2022 » Index and search text from local files using ElasticSearch and sist2., viewed 2024-03-29T13:41:12+00:00,<https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/>
VANCOUVER
Joosep Parts | Sciencx - » Index and search text from local files using ElasticSearch and sist2. [Internet]. [Accessed 2024-03-29T13:41:12+00:00]. Available from: https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/
CHICAGO
" » Index and search text from local files using ElasticSearch and sist2." Joosep Parts | Sciencx - Accessed 2024-03-29T13:41:12+00:00. https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/
IEEE
" » Index and search text from local files using ElasticSearch and sist2." Joosep Parts | Sciencx [Online]. Available: https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/. [Accessed: 2024-03-29T13:41:12+00:00]
rf:citation
» Index and search text from local files using ElasticSearch and sist2 | Joosep Parts | Sciencx | https://www.scien.cx/2022/11/28/index-and-search-text-from-local-files-using-elasticsearch-and-sist2/ | 2024-03-29T13:41:12+00:00
https://github.com/addpipe/simple-recorderjs-demo