This paper introduces Svarna, a free, open-source, web-based corpus workbench designed to address gaps in modern Greek language technology by integrating five distinct databases. The platform consolidates over 507 million words and approximately 29 million sentences from institutional, literary, dialectal, social media, and historical registers into a single interface accessible without login or installation.
- Integrates five databases covering various registers to provide more than 507 million words and around 29 million sentences.
- Offers a concordancer with KWIC marking, frequency analysis with register-by-register normalization, and collocation extraction using mutual information.
- Includes a dictionary of 93 Greek discourse markers, text-level analysis tools for n-grams and variants, and register comparison via log-ratio.
- Features regular expression search and an optional LLM layer for pragmatic annotation and free research mode.
- Built on SQLite FTS5 full-text indexes with a FastAPI backend, deployed as Docker containers on Azure under the MIT license.
Svarna serves as a foundational tool for exploring available Greek data and is expected to support more comprehensive research in the future.