DKPro Core v1.11.0 Release Notes

Release Date: 2019-07-05 // almost 5 years ago
  • ๐Ÿš€ We are pleased to announce the release of

    DKPro Core 1.11.0

    a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.

    https://dkpro.github.io/dkpro-core

    ๐Ÿš€ This is a feature release.

    โฌ†๏ธ Important upgrade notice

    • ๐Ÿ”„ Changed groupIds and artifactIds. The group ID is now org.dkpro.core and the artifact IDs are dkpro-core-...-(asl/gpl)
    • ๐Ÿ”„ Changed package names. The packages are now all starting with org.dkpro.core... - except the packages of UIMA types which remain unchanged for data compatibility.

    Notable changes since DKPro Core 1.10.0

    • ๐Ÿ”„ Changed parts of the brat data conversion code such that it can be more easily used outside a UIMA component
    • ๐Ÿ”„ Changed type mapping such that out-of-tagset types map to the generic type (e.g. an unknown POS tag maps to POS, not to POS_X)
    • ๐Ÿ”„ Changed name of NYTCollectionReader to NitfReader
    • โž• Added types to encode XML document structure in CAS
    • โž• Added new XmlDocumentReader/Writer components using these types
    • โž• Added basic reader for Annotated Gigaword corpus (only reads text so far) (thanks @az79nefy)
    • โž• Added basic support for PubAnnotation JSON format
    • โž• Added Maui component for keyword assignment
    • โž• Added parameter to SfstAnnotator to enable lower-case lookup of first word in a sentence (thanks @rziai)
    • โž• Added "order" feature to Token type
    • โž• Added support for CoNLL-U document and paragraph IDs (thanks @manuelciosici)
    • โž• Added support for CoNLL-U sentence IDs and text
    • โž• Added standardized parameter to disable type mapping
    • โž• Added support for TCF orthography layer using SofaChangeAnnotations
    • โž• Added segmenter for Chinese using jieba (thanks @Horsmann)
    • โž• Added MyStem for Russian
    • โž• Added links to OpenMinTeD categories in type system documentation
    • โž• Added support for the reading/writing the CoreNLP CoNLL flavor
    • โž• Added parameter to configure the Tika buffer size (useful for large documents)
    • โšก๏ธ Updated to OpenNLP 1.9.1
    • โšก๏ธ Updated to CoreNLP 3.9.2
    • โšก๏ธ Updated to ICU4J 64.2
    • โšก๏ธ Updated to Tika 1.19.1
    • โšก๏ธ Updated to LanguageTool 4.3
    • โšก๏ธ Updated to PDFBox 2.0.12
    • โšก๏ธ Updated IllinoisNLP components
    • โšก๏ธ Updated TreeTagger models/binaries in build.xml script (thanks @tilmanbeck)
    • โšก๏ธ Updated LIF dependencies
    • โšก๏ธ Updated dataset descriptions
    • โšก๏ธ Updated various general dependencies (e.g. Apache Commons etc.)
    • ๐Ÿ‘Œ Improved robustness of checksum verification for text files used in datasets (e.g. license files)
    • ๐Ÿ‘Œ Improved error messages in WebAnno TSV3 module
    • ๐Ÿ›  Fixed crash in WebannoTsv3XWriter when annotations do not start/end at token boundaries
    • ๐Ÿ›  Fixed bug in WebAnno TSV3 support causing span annotations with slot features to disappear
    • ๐Ÿ›  Fixed trimming of whitespace in TeiReader
    • ๐Ÿ›  Fixed bug in NifWriter causing named entity identifier not to be written
    • ๐Ÿ›  Fixed crash in BratReader with reading discontinuous segments
    • ๐Ÿ›  Fixed problem in BratWriter when dealing with slot features
    • ๐Ÿ›  Fixed metadata of CoNLL2012Writer
    • ๐Ÿ›  Fixed potential problem of datasets being written outside their target directory
    • โฌ‡๏ธ Dropped the GrAF I/O module since the upstream libraries are outdated and no longer maintained

    ๐Ÿš€ A more detailed overview of the changes in this release can be found here.

    Thanks for contributions go to: @az79nefy, @ramonziai, @manuelciosici, @Horsmann, @tilmanbeck

    โฌ†๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.