All Versions
15
Latest Version
Avg Release Cycle
50 days
Latest Release
1288 days ago
Changelog History
Page 2
Changelog History
Page 2
-
v0.3.0 Changes
January 16, 2019๐ This major release offers a lot of new features, including new languages. Finally! :-)
Languages
- โ added 18 languages: Arabic, Belarusian, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Hungarian, Latvian, Lithuanian, Polish, Persian, Romanian, Russian, Swedish, Turkish
๐ Features
- Language models can now be cached by MapDB to reduce memory usage and speed up loading times.
๐ Improvements
- In the standalone app, you can now choose which language models to load in order to compare detection accuracy between strongly related languages.
- โ
For test report generation using Maven, you can now select a specific language using the attribute
language
and do not need to run the reports for all languages anymore:mvn test -P accuracy-reports -D detector=lingua -D language=German
.
API changes
- Lingua's package structure has been simplified. The public API intended for end users now lives in
com.github.pemistahl.lingua.api
. Breaking changes herein are tried to keep to a minimum in0.*.*
versions and will not be performed anymore starting from version1.0.0
. All other code is stored incom.github.pemistahl.lingua.internal
and is subject to change without any further notice. - โ added new class
com.github.pemistahl.lingua.api.LanguageDetectorBuilder
which is now responsible for building and configuring instances ofcom.github.pemistahl.lingua.api.LanguageDetector
โ Test Coverage
- โ Test coverage of the public API has been extended from 6% to 23%.
๐ Documentation
- โ
In addition to the test reports, graphical plots have been created in order to compare the detection results between the different classifiers even more easily. The code for the plots has been written in Python and is stored in an IPython notebook under
/accuracy-reports/accuracy-reports-analysis-notebook.ipynb
.
-
v0.2.2 Changes
December 28, 2018โก๏ธ This minor version update provides the following:
๐ Improvements
- The included language model JSON files now use a more efficient formatting, saving approximately 25% disk space in uncompressed format compared to version 0.2.1.
๐ Bug Fixes
- โ The version of the Jacoco test coverage Maven plugin was incorrectly specified, leading to download errors. Now the most current snapshot version of Jacoco is used which provides enhancements for Kotlin test coverage measurement.
-
v0.2.1 Changes
December 20, 2018โก๏ธ This minor version update provides the following:
๐ Performance Improvements
- Lingua's language detection has been speeded up. It is now approximately 25% faster for large data sets.
Comparison with Apache Tika
- Accuracy report test classes have been written for Apache Tika to compare its language detection performance with Lingua's one. Lingua actually outperforms Tika for short paragraphs of text by up to 15% in accuracy. A detailed comparison table can be found in the README.
-
v0.2.0 Changes
December 17, 2018๐ This release provides both new features and bug fixes. It is the first release that has been published to JCenter. Publication on Maven Central will follow soon.
Languages
- โ added detection support for Portuguese
๐ Features
- extended language models for already existing languages to provide for more accurate detection results
- the larger language models are now lazy-loaded to reduce waiting times during start-up, especially when starting the lingua REPL
- โ added some unit tests for the LanguageDetector class that cover the most basic functionality (will be extended in upcoming versions)
- โ added accuracy reports and test data for each supported language, in order to measure language detection accuracy (can be generated with
mvn test -P accuracy-reports
) - โ added accuracy statistics summary of the current implementation to README
API changes
- ๐ renamed method
LanguageDetector.detectLanguageFrom()
toLanguageDetector.detectLanguageOf()
to use the grammatically correct English preposition - in version
0.1.0
, the now called methodLanguageDetector.detectLanguageOf()
returnednull
for strings whose language could not be detected reliably. Now,Language.UNKNOWN
is returned instead in those cases to preventNullPointerException
s especially in Java code.
๐ Bug Fixes
- fixed a bug in lingua's REPL that caused non-ASCII characters to get broken in consoles which do not use UTF-8 encoding by default, especially on Windows systems
-
v0.1.0 Changes
November 16, 2018This is the very first release of Lingua. It aims at accurate language detection results for both long and especially short text. Detection on short text fragments such as Twitter messages is a weak spot of many similar libraries.
๐ Supported languages so far:
- English
- French
- German
- Italian
- Latin
- Spanish