Lingua v0.5.0 Release NotesRelease Date: 2019-08-12 // over 2 years ago
- ➕ added 12 new languages: Bengali, Chinese (not differentiated between traditional and simplified, as of now), Gujarati, Hebrew, Hindi, Japanese, Korean, Punjabi, Tamil, Telugu, Thai, Urdu
LanguageDetectorBuildernow supports the additional method
withMinimumRelativeDistance()that allows to specify the minimum distance between the logarithmized and summed up probabilities for each possible language. If two or more languages yield nearly the same probability for a given input text, it is likely that the wrong language may be returned. By specifying a higher value for the minimum relative distance,
Language.UNKNOWNis returned instead of risking false positives.
✅ Test report generation can now use multiple CPU cores, allowing to run as many reports as CPU cores are available. This has been implemented as an additional attribute for the respective Gradle task:
./gradlew writeAccuracyReports -PcpuCores=...
The REPL now allows to freely specify the languages you want to try out by entering the desired ISO 639-1 codes. Before, it has only been possible to choose between certain language combinations.
- The overall detection algorithm has been improved, yielding slightly more accurate results for those languages that are based on the Latin alphabet.
🐛 Bug Fixes
🛠 Thanks to the great work of contributor Bernhard Geisberger, two bugs could be fixed.
The fix in pull request #8 solves the problem of not being able to recreate the MapDB cache files automatically in case the data has been corrupted.
The fix in pull request #9 makes the class
LanguageDetectorcompletely thread-safe. Previously, in some rare cases it was possible that two threads mutated one of the internal variables at the same time, yielding inaccurate language detection results.
Thank you, Bernhard.