DKPro Core v1.11.0 Release Notes
Release Date: 2019-07-05 // almost 5 years ago-
๐ We are pleased to announce the release of
DKPro Core 1.11.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
๐ This is a feature release.
โฌ๏ธ Important upgrade notice
- ๐ Changed groupIds and artifactIds. The group ID is now org.dkpro.core and the artifact IDs are dkpro-core-...-(asl/gpl)
- ๐ Changed package names. The packages are now all starting with org.dkpro.core... - except the packages of UIMA types which remain unchanged for data compatibility.
Notable changes since DKPro Core 1.10.0
- ๐ Changed parts of the brat data conversion code such that it can be more easily used outside a UIMA component
- ๐ Changed type mapping such that out-of-tagset types map to the generic type (e.g. an unknown POS tag maps to POS, not to POS_X)
- ๐ Changed name of NYTCollectionReader to NitfReader
- โ Added types to encode XML document structure in CAS
- โ Added new XmlDocumentReader/Writer components using these types
- โ Added basic reader for Annotated Gigaword corpus (only reads text so far) (thanks @az79nefy)
- โ Added basic support for PubAnnotation JSON format
- โ Added Maui component for keyword assignment
- โ Added parameter to SfstAnnotator to enable lower-case lookup of first word in a sentence (thanks @rziai)
- โ Added "order" feature to Token type
- โ Added support for CoNLL-U document and paragraph IDs (thanks @manuelciosici)
- โ Added support for CoNLL-U sentence IDs and text
- โ Added standardized parameter to disable type mapping
- โ Added support for TCF orthography layer using SofaChangeAnnotations
- โ Added segmenter for Chinese using jieba (thanks @Horsmann)
- โ Added MyStem for Russian
- โ Added links to OpenMinTeD categories in type system documentation
- โ Added support for the reading/writing the CoreNLP CoNLL flavor
- โ Added parameter to configure the Tika buffer size (useful for large documents)
- โก๏ธ Updated to OpenNLP 1.9.1
- โก๏ธ Updated to CoreNLP 3.9.2
- โก๏ธ Updated to ICU4J 64.2
- โก๏ธ Updated to Tika 1.19.1
- โก๏ธ Updated to LanguageTool 4.3
- โก๏ธ Updated to PDFBox 2.0.12
- โก๏ธ Updated IllinoisNLP components
- โก๏ธ Updated TreeTagger models/binaries in build.xml script (thanks @tilmanbeck)
- โก๏ธ Updated LIF dependencies
- โก๏ธ Updated dataset descriptions
- โก๏ธ Updated various general dependencies (e.g. Apache Commons etc.)
- ๐ Improved robustness of checksum verification for text files used in datasets (e.g. license files)
- ๐ Improved error messages in WebAnno TSV3 module
- ๐ Fixed crash in WebannoTsv3XWriter when annotations do not start/end at token boundaries
- ๐ Fixed bug in WebAnno TSV3 support causing span annotations with slot features to disappear
- ๐ Fixed trimming of whitespace in TeiReader
- ๐ Fixed bug in NifWriter causing named entity identifier not to be written
- ๐ Fixed crash in BratReader with reading discontinuous segments
- ๐ Fixed problem in BratWriter when dealing with slot features
- ๐ Fixed metadata of CoNLL2012Writer
- ๐ Fixed potential problem of datasets being written outside their target directory
- โฌ๏ธ Dropped the GrAF I/O module since the upstream libraries are outdated and no longer maintained
๐ A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @az79nefy, @ramonziai, @manuelciosici, @Horsmann, @tilmanbeck
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.