About

Sõnaveeb [WordWeb] is the language portal of the Institute of the Estonian Language (EKI) containing the linguistic information from a growing number of dictionaries and databases. The portal was released in February 2019. The information displayed in Sõnaveeb comes from Ekilex, a Dictionary Writing System maintained and developed by the Institute in collaboration with the software company TripleDev. As of February 2020, Ekilex contains about 70 lexical databases: general as well as specialised dictionaries. Databases are constantly updated and edited, including changes that are made upon receiving feedback from users. As of February 2020, the portal contained about 170,000 words and phrases in Estonian, about 70,000 words and phrases in Russian and 40,000 in English. The versions of Sõnaveeb are updated and archived once a year.

Since 2020 all information from separate databases will be displayed in a unified mode as a single source titled EKI ühendsõnastik [The EKI Combined Dictionary, CombiDic]. The EKI Combined Dictionary displays information from different lexical databases: "The Dictionary of Estonian 2019", "Estonian Collocations Dictionary 2019", "Basic Estonian Dictionary" (2014), "The Estonian Morphological Database of the Institute of the Estonian Language 2019". It displays also information from bilingual lexical databases: "Estonian-Russian orthographic dictionary for students 2018" (1st edition 2011), "Estonian-Russian Dictionary 2018" (1st edition 1997–2009), "The Russian Morphological Database of the Institute of the Estonian Language 2018".

In addition to carefully selected usage examples we display web examples from 'The Corpus of Web Examples for Estonian' via the corpus query system KORP API.

The creation and development of the portal was funded by the Digital Focus Program of the Ministry of Education and Research (2018–2021) and by EKI-ASTRA program (2016–2022).

The creation of the dictionary and terminology database Ekilex was funded by EKI-ASTRA program (2016–2022).

Technical support: OÜ TripleDev.

Copyright: Institute of the Estonian Language

Estonian data

Information about Estonian is displayed when Estonian is selected as the target language. The user can choose between the two modes of information display: Sõnaveeb or Learner's Sõnaveeb.

Sõnaveeb displays all available information on the word that you are looking for.

Learner's Sõnaveeb shows less information, there are fewer words here, and the information is presented in a simpler way: the explanations are shorter, there is less additional material, fewer additional explanations and comments.

Russian data

Sõnaveeb displays all available information on the word that you are looking for.

Attention! Russian data is not the 'real' dictionary. The translation equivalents have been automatically collected by the reversion of the Estonian-Russian dictionary.

Learner's Sõnaveeb

The user can currently choose between two modes of information display: Sõnaveeb (for advanced users) or Learner's Sõnaveeb (for language learners). Sõnaveeb is intended primarily for native speakers. It displays all the information on a word that comes from different sources. The advanced mode is a sophisticated view that might require more options for further filtering. The Learner's Sõnaveeb is intended primarily for learners at the A2–B1 proficiency levels. It shows 5,000 basic Estonian words and information is presented in a simpler way: the definitions are shorter, knowledge is organized using controlled vocabulary, there is explicit information about the most frequent morphological forms, etc.

Pronunciation

Users can listen to the pronunciation of about 5,000 of the most frequent headwords, as well as their most important inflected forms, and of about 7,000 unadapted loan words. The information on pronunciation has been aggregated from different datasets. In the case of unadapted loan words, we used Estonians who speak foreign languages at high proficiency levels. For the pronunciation of the most frequent words and their inflected forms, we used professional actresses.

Users can also listen the usage examples chosen by lexicographers. Text-to-Speech synthesis has been developed by the Institute of the Estonian Language.

Dictate!

Speech recognition, developed by the Department of Cybernetics of the Tallinn Technological University, is used when dictating words. Speech recognition operates in real time. For optimum quality, users have to pronounce the search word clearly and steadily.

Morphological forms

The information on the morphological forms of Estonian comes from "The EKI Estonian Morphological Database 2019".

The information on the morphological forms of Russian comes from "The EKI Russian Morphological Database 2019".

Collocations

Collocations are words that are often used together in a language. Each language has special combinations of words, the knowledge of which is essential in order to be able to speak and write in that language naturally and fluently. The purpose of displaying collocations is to help primarily language learners and assist them in using the language similar to that of native speakers. The information on collocations comes from the dictionary database "Estonian Collocations Dictionary 2018". The collocations have been semi-automatically selected from the Estonian National Corpus (2013 and 2017).

Usage examples

Usage examples have been carefully selected from the Estonian National Corpus (2013 and 2017) and minimally edited by the lexicographers.

Web sentences

In Sõnaveeb, authentic examples from the corpus are displayed. They have been automatically selected and they have not been edited. The examples are queried from the 'The Corpus of Web Examples for Estonian' via the Corpus Query System KORP API.

The examples for Russian are queried from the ruSkELL 1.6 corpus via Sketch Engine JSON API.

Frequency

20,000 most frequently used words are marked (using stars) according to their frequency class. Frequency information comes from the "Estonian National Corpus 2017".

Vocabulary of different language levels according to language proficiency

Considering the needs of language learners, 13,000 words are classified based on language proficiency levels. The data comes from etLex, a database of vocabulary of different proficiency levels compiled in the Institute of the Estonian Language.

The Common European Framework of Reference for Languages distinguishes between the following language proficiency levels: basic language user (levels A1, A2), independent language user (levels B1, B2) and proficient language user (levels C1, C2). In Web of Words the general proficiency levels are marked as follows: A (basic user) and B (independent user). The stars marking the frequency class are in different colours.

Copyrights

The copyright of 'EKI ühendsõnastik' [Combined Dictionary of Estonian] belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2020

The copyright of 'EKI terminibaas Esterm' belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2020

The copyrights of terminological databases belong to the authors
Copyright: authors

The copyright of the language portal Sõnaveeb belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2020

The copyright of the dictionary system Ekilex belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2020

Referencing

A word or a phrase in 'EKI ühendsõnastik 2020' should be referenced as follows:

Word or phrase. EKI ühendsõnastik 2020. Eesti Keele Instituut, Sõnaveeb 2020. https://sonaveeb.ee/sõna või väljend (14.2.2020)

A word or a phrase in the terminological database should be referenced as follows:

Word or phrase. The name of the database. Eesti Keele Instituut, Sõnaveeb 2020. https://sonaveeb.ee/sõna või väljend (14.2.2020)

Versioning

The EKI ühendsõnastik [Combined Dictionary of Estonian] and the terminological databases are constantly updated and edited. The new versions of are created and archived once a year and marked by the date.

All versions are registered in METASHARE, archived in the Centre of Estonian Language Resources and in Ekilex database.