CoreNLP v4.0.0 release notes (2020-05-04)

« Changelog History

CoreNLP v4.0.0 Release Notes

Release Date: 2020-05-04 // almost 4 years ago

Overview

🚀 The latest release of Stanford CoreNLP includes a major overhaul of tokenization and a large collection of new parsing and tagging models. There are also miscellaneous enhancements and fixes.

✨ Enhancements
- UD v2.0 tokenization standard for English, French, German, and Spanish
- 🆕 New mwt annotator for handling multiword tokens in French, German, and Spanish.
- 🆕 New models with more training data and better performance for tagging and parsing in English, French, German, and Spanish.
- French NER
- 🆕 New Chinese segmentation based off CTB9
- 👌 Improved handling of double codepoint characters
- Easier syntax for specifying language specific pipelines and NER pipeline properties
- 👌 Improved CoNLL-U processing
- 👌 Improved speed and memory performance for CRF training
- 👍 Tregex support in CoreSentence
- ⚡️ Updated library dependencies
🛠 Fixes
- NPE while simultaneously tokenizing on whitespace and sentence splitting on newlines
- NPE in EntityMentionsAnnotator during language check
- NPE in CorefMentionAnnotator while aligning coref mentions with titles and entity mentions
- 🔧 NPE in NERCombinerAnnotator in certain configurations of models on/off
- Incorrect handling of eolonly option in ArabicSegmenterAnnotator
- Apply named entity granularity change prior to coref mention detection
- 🏁 Incorrect handling of keeping newline tokens when using Chinese segmenter on Windows
- Incorrect handling of reading in German treebank files
- 📜 SR parser crashes when given bad training input

CoreNLP v4.0.0

Version Release Notes from May 04, 2020 (almost 4 years ago)

« Changelog History

CoreNLP v4.0.0 Release Notes

Overview

✨ Enhancements

🛠 Fixes