Lingua/CHANGELOG and Lingua Releases

All Versions

Latest Version

1.0.3

Avg Release Cycle

50 days

Latest Release

1288 days ago

Changelog History

Page 2

v0.3.0 Changes
January 16, 2019
🚀 This major release offers a lot of new features, including new languages. Finally! :-)

Languages
- ➕ added 18 languages: Arabic, Belarusian, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Hungarian, Latvian, Lithuanian, Polish, Persian, Romanian, Russian, Swedish, Turkish
🔋 Features
- Language models can now be cached by MapDB to reduce memory usage and speed up loading times.
👌 Improvements
- In the standalone app, you can now choose which language models to load in order to compare detection accuracy between strongly related languages.
- ✅ For test report generation using Maven, you can now select a specific language using the attribute language and do not need to run the reports for all languages anymore: mvn test -P accuracy-reports -D detector=lingua -D language=German.
API changes
- Lingua's package structure has been simplified. The public API intended for end users now lives in com.github.pemistahl.lingua.api. Breaking changes herein are tried to keep to a minimum in 0.*.* versions and will not be performed anymore starting from version 1.0.0. All other code is stored in com.github.pemistahl.lingua.internal and is subject to change without any further notice.
- ➕ added new class com.github.pemistahl.lingua.api.LanguageDetectorBuilder which is now responsible for building and configuring instances of com.github.pemistahl.lingua.api.LanguageDetector
✅ Test Coverage
- ✅ Test coverage of the public API has been extended from 6% to 23%.
📚 Documentation
- ✅ In addition to the test reports, graphical plots have been created in order to compare the detection results between the different classifiers even more easily. The code for the plots has been written in Python and is stored in an IPython notebook under /accuracy-reports/accuracy-reports-analysis-notebook.ipynb.
v0.2.2 Changes
December 28, 2018
⚡️ This minor version update provides the following:

👌 Improvements
- The included language model JSON files now use a more efficient formatting, saving approximately 25% disk space in uncompressed format compared to version 0.2.1.
🐛 Bug Fixes
- ✅ The version of the Jacoco test coverage Maven plugin was incorrectly specified, leading to download errors. Now the most current snapshot version of Jacoco is used which provides enhancements for Kotlin test coverage measurement.
v0.2.1 Changes
December 20, 2018
⚡️ This minor version update provides the following:

🐎 Performance Improvements
- Lingua's language detection has been speeded up. It is now approximately 25% faster for large data sets.
Comparison with Apache Tika
- Accuracy report test classes have been written for Apache Tika to compare its language detection performance with Lingua's one. Lingua actually outperforms Tika for short paragraphs of text by up to 15% in accuracy. A detailed comparison table can be found in the README.
v0.2.0 Changes
December 17, 2018
🚀 This release provides both new features and bug fixes. It is the first release that has been published to JCenter. Publication on Maven Central will follow soon.

Languages
- ➕ added detection support for Portuguese
🔋 Features
- extended language models for already existing languages to provide for more accurate detection results
- the larger language models are now lazy-loaded to reduce waiting times during start-up, especially when starting the lingua REPL
- ➕ added some unit tests for the LanguageDetector class that cover the most basic functionality (will be extended in upcoming versions)
- ➕ added accuracy reports and test data for each supported language, in order to measure language detection accuracy (can be generated with mvn test -P accuracy-reports)
- ➕ added accuracy statistics summary of the current implementation to README
API changes
- 📇 renamed method LanguageDetector.detectLanguageFrom() to LanguageDetector.detectLanguageOf() to use the grammatically correct English preposition
- in version 0.1.0, the now called method LanguageDetector.detectLanguageOf() returned null for strings whose language could not be detected reliably. Now, Language.UNKNOWN is returned instead in those cases to prevent NullPointerExceptions especially in Java code.
🐛 Bug Fixes
- fixed a bug in lingua's REPL that caused non-ASCII characters to get broken in consoles which do not use UTF-8 encoding by default, especially on Windows systems
v0.1.0 Changes
November 16, 2018
This is the very first release of Lingua. It aims at accurate language detection results for both long and especially short text. Detection on short text fragments such as Twitter messages is a weak spot of many similar libraries.

👌 Supported languages so far:
- English
- French
- German
- Italian
- Latin
- Spanish

Lingua changelog

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Changelog History

Page 2

v0.3.0 Changes

Languages

🔋 Features

👌 Improvements

API changes

✅ Test Coverage

📚 Documentation

v0.2.2 Changes

👌 Improvements

🐛 Bug Fixes

v0.2.1 Changes

🐎 Performance Improvements

Comparison with Apache Tika

v0.2.0 Changes

Languages

🔋 Features

API changes

🐛 Bug Fixes

v0.1.0 Changes

Lingua changelog

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Changelog History Page 2

v0.3.0 Changes

Languages

🔋 Features

👌 Improvements

API changes

✅ Test Coverage

📚 Documentation

v0.2.2 Changes

👌 Improvements

🐛 Bug Fixes

v0.2.1 Changes

🐎 Performance Improvements

Comparison with Apache Tika

v0.2.0 Changes

Languages

🔋 Features

API changes

🐛 Bug Fixes

v0.1.0 Changes

Changelog History

Page 2