Lingua v0.3.0 Release Notes
Release Date: 2019-01-16 // about 4 years ago-
๐ This major release offers a lot of new features, including new languages. Finally! :-)
Languages
- โ added 18 languages: Arabic, Belarusian, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Hungarian, Latvian, Lithuanian, Polish, Persian, Romanian, Russian, Swedish, Turkish
๐ Features
- Language models can now be cached by MapDB to reduce memory usage and speed up loading times.
๐ Improvements
- In the standalone app, you can now choose which language models to load in order to compare detection accuracy between strongly related languages.
- โ
For test report generation using Maven, you can now select a specific language using the attribute
language
and do not need to run the reports for all languages anymore:mvn test -P accuracy-reports -D detector=lingua -D language=German
.
API changes
- Lingua's package structure has been simplified. The public API intended for end users now lives in
com.github.pemistahl.lingua.api
. Breaking changes herein are tried to keep to a minimum in0.*.*
versions and will not be performed anymore starting from version1.0.0
. All other code is stored incom.github.pemistahl.lingua.internal
and is subject to change without any further notice. - โ added new class
com.github.pemistahl.lingua.api.LanguageDetectorBuilder
which is now responsible for building and configuring instances ofcom.github.pemistahl.lingua.api.LanguageDetector
โ Test Coverage
- โ Test coverage of the public API has been extended from 6% to 23%.
๐ Documentation
- โ
In addition to the test reports, graphical plots have been created in order to compare the detection results between the different classifiers even more easily. The code for the plots has been written in Python and is stored in an IPython notebook under
/accuracy-reports/accuracy-reports-analysis-notebook.ipynb
.