CoreNLP v4.2.0 Release Notes

Release Date: 2020-11-17 // 5 months ago
  • Overview

    ๐Ÿš€ This release features a collection of small bug fixes and updates. It is the first release built directly from the GitHub repo.

    โœจ Enhancements

    • โฌ†๏ธ Upgrade libraries (EJML, JUnit, JFlex)
    • โž• Add character offsets to Tregex responses from server
    • ๐Ÿ‘Œ Improve cleaning of treebanks for English models
    • Speed up loading of Wikidict annotator
    • ๐Ÿ†• New utility for tagging CoNLL-U files in place
    • ๐Ÿ’ป Command line tool for processing TokensRegex

    ๐Ÿ›  Fixes

    • Output single token NER entities in inline XML output format
    • โž• Add currency symbol part of speech training data
    • ๐Ÿ›  Fix issues with tree binarizing

Previous changes from v4.0.0

  • Overview

    ๐Ÿš€ The latest release of Stanford CoreNLP includes a major overhaul of tokenization and a large collection of new parsing and tagging models. There are also miscellaneous enhancements and fixes.

    โœจ Enhancements

    • UD v2.0 tokenization standard for English, French, German, and Spanish
    • ๐Ÿ†• New mwt annotator for handling multiword tokens in French, German, and Spanish.
    • ๐Ÿ†• New models with more training data and better performance for tagging and parsing in English, French, German, and Spanish.
    • French NER
    • ๐Ÿ†• New Chinese segmentation based off CTB9
    • ๐Ÿ‘Œ Improved handling of double codepoint characters
    • Easier syntax for specifying language specific pipelines and NER pipeline properties
    • ๐Ÿ‘Œ Improved CoNLL-U processing
    • ๐Ÿ‘Œ Improved speed and memory performance for CRF training
    • ๐Ÿ‘ Tregex support in CoreSentence
    • โšก๏ธ Updated library dependencies

    ๐Ÿ›  Fixes

    • NPE while simultaneously tokenizing on whitespace and sentence splitting on newlines
    • NPE in EntityMentionsAnnotator during language check
    • NPE in CorefMentionAnnotator while aligning coref mentions with titles and entity mentions
    • ๐Ÿ”ง NPE in NERCombinerAnnotator in certain configurations of models on/off
    • Incorrect handling of eolonly option in ArabicSegmenterAnnotator
    • Apply named entity granularity change prior to coref mention detection
    • ๐Ÿ Incorrect handling of keeping newline tokens when using Chinese segmenter on Windows
    • Incorrect handling of reading in German treebank files
    • ๐Ÿ“œ SR parser crashes when given bad training input