Changelog History
-
v2.1.0 Changes
December 01, 2019🚀 We are pleased to announce the release of
DKPro Core 2.1.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework version 3.
https://dkpro.github.io/dkpro-core
🚀 This is a feature release.
Notable changes since DKPro Core 2.0.0
- ➕ Added option to export XMI using XML 1.1 to avoid issues with certain characters
- ➕ Added option to CoNLL readers to trim off whitespace from field values to avoid users having issues with incidental space characters (default is on)
- ➕ Added support for annotator notes in brat format
- 👌 Improved speed for writing WebAnno TSV format (backported from WebAnno)
- 🛠 Fixed a couple of issues with the CoNLL 2012 format
- 🛠 Fixed default extension for CoNLL-U writer
- 🛠 Fixed problem in CoNLL-U writer when text contains line breaks
- 🛠 Fixed problem that LanguageToolChecker did not fill in suggestions
- 🛠 Fixed setting div type on paragraphs created by CoNLL-U reader
🚀 A more detailed overview of the changes in this release can be found [2].
Thanks to all contributors!
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
🚀 [1] https://github.com/dkpro/dkpro-core/releases/tag/rel%2Fdkpro-core-2.1.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A2.1.0 -
v2.0.0 Changes
September 08, 2019🚀 We are pleased to announce the release of
DKPro Core 2.0.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
🚀 This is a feature release.
⬆️ Important upgrade notice
This version requires UIMA v3.
⬆️ If you are upgrading from DKPro Core 1.10.x or earlier, please read the DKPro Core 1.11.0 upgrade notice [1].
Notable changes since DKPro Core 1.11.1
- Switched to UIMAv3
- ➕ Added filling in suggestions to LanguageToolChecker
- ➕ Added support for notes to BratReader
- ➕ Added basic read support for Perseus XML format
- 👌 Improved error message when StanfordNamedEntityRecognizerTrainer is called without training data
- 🚚 Moved StopwordRemover to tokit module and removed stopwordremover module
- 📇 Renamed lancaster module to smile
- ✂ Removed Tag type from syntax module
- ... and a few additional under-the-hood changes
🚀 A more detailed overview of the changes in this release can be found [2].
Thanks for contributions go to: @alaindesilets, @mischor
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
🚀 [1] https://github.com/dkpro/dkpro-core/releases/tag/rel%2Fdkpro-core-2.0.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A2.0.0 -
v1.12.0 Changes
December 01, 2019🚀 We are pleased to announce the release of
DKPro Core 1.12.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework version 2.
https://dkpro.github.io/dkpro-core
🚀 This is a feature release.
⬆️ Important upgrade notice
⬆️ If you are upgrading from DKPro Core 1.10.x or earlier, please read the DKPro Core 1.11.0 upgrade notice [1].
Notable changes since DKPro Core 1.11.1
- ➕ Added option to export XMI using XML 1.1 to avoid issues with certain characters
- ➕ Added option to CoNLL readers to trim off whitespace from field values to avoid users having issues with incidental space characters (default is on)
- ➕ Added support for annotator notes in brat format
- 👌 Improved speed for writing WebAnno TSV format (backported from WebAnno)
- 🛠 Fixed a couple of issues with the CoNLL 2012 format
- 🛠 Fixed default extension for CoNLL-U writer
- 🛠 Fixed problem in CoNLL-U writer when text contains line breaks
- 🛠 Fixed problem that LanguageToolChecker did not fill in suggestions
🚀 A more detailed overview of the changes in this release can be found [2].
Thanks to all contributors!
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
🚀 [1] https://github.com/dkpro/dkpro-core/releases/tag/rel%2Fdkpro-core-1.12.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A1.12.0 -
v1.11.1 Changes
August 17, 2019🚀 We are pleased to announce the release of
DKPro Core 1.11.1
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
🛠 This is a bugfix release.
⬆️ Important upgrade notice
⬆️ If you are upgrading from DKPro Core 1.10.x or earlier, please read the DKPro Core 1.11.0 upgrade notice [1].
Notable changes since DKPro Core 1.11.0
- 🛠 Fixed trimming of whitespace at the start and end of annotations
- 🛠 Fixed encoding of named entity categories in LIF format
- 🛠 Fixed unescaping of URI-encoded characters when writing files
- ➕ Added parameter to control whitespace normalization in HtmlDocumentReader
- ➕ Added parameters to control indentation and output method in XmlDocumentWriter
- 👌 Improved exception in Stanford CoreNLP NER trainer when no documents have been processed
🚀 A more detailed overview of the changes in this release can be found [2].
Thanks for contributions go to: @az79nefy, @ramonziai, @manuelciosici, @Horsmann, @tilmanbeck, @alaindesilets, @jcklie
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
🚀 [1] https://github.com/dkpro/dkpro-core/releases/tag/dkpro-core-1.11.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A1.11.1 -
v1.11.0 Changes
July 05, 2019🚀 We are pleased to announce the release of
DKPro Core 1.11.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
🚀 This is a feature release.
⬆️ Important upgrade notice
- 🔄 Changed groupIds and artifactIds. The group ID is now org.dkpro.core and the artifact IDs are dkpro-core-...-(asl/gpl)
- 🔄 Changed package names. The packages are now all starting with org.dkpro.core... - except the packages of UIMA types which remain unchanged for data compatibility.
Notable changes since DKPro Core 1.10.0
- 🔄 Changed parts of the brat data conversion code such that it can be more easily used outside a UIMA component
- 🔄 Changed type mapping such that out-of-tagset types map to the generic type (e.g. an unknown POS tag maps to POS, not to POS_X)
- 🔄 Changed name of NYTCollectionReader to NitfReader
- ➕ Added types to encode XML document structure in CAS
- ➕ Added new XmlDocumentReader/Writer components using these types
- ➕ Added basic reader for Annotated Gigaword corpus (only reads text so far) (thanks @az79nefy)
- ➕ Added basic support for PubAnnotation JSON format
- ➕ Added Maui component for keyword assignment
- ➕ Added parameter to SfstAnnotator to enable lower-case lookup of first word in a sentence (thanks @rziai)
- ➕ Added "order" feature to Token type
- ➕ Added support for CoNLL-U document and paragraph IDs (thanks @manuelciosici)
- ➕ Added support for CoNLL-U sentence IDs and text
- ➕ Added standardized parameter to disable type mapping
- ➕ Added support for TCF orthography layer using SofaChangeAnnotations
- ➕ Added segmenter for Chinese using jieba (thanks @Horsmann)
- ➕ Added MyStem for Russian
- ➕ Added links to OpenMinTeD categories in type system documentation
- ➕ Added support for the reading/writing the CoreNLP CoNLL flavor
- ➕ Added parameter to configure the Tika buffer size (useful for large documents)
- ⚡️ Updated to OpenNLP 1.9.1
- ⚡️ Updated to CoreNLP 3.9.2
- ⚡️ Updated to ICU4J 64.2
- ⚡️ Updated to Tika 1.19.1
- ⚡️ Updated to LanguageTool 4.3
- ⚡️ Updated to PDFBox 2.0.12
- ⚡️ Updated IllinoisNLP components
- ⚡️ Updated TreeTagger models/binaries in build.xml script (thanks @tilmanbeck)
- ⚡️ Updated LIF dependencies
- ⚡️ Updated dataset descriptions
- ⚡️ Updated various general dependencies (e.g. Apache Commons etc.)
- 👌 Improved robustness of checksum verification for text files used in datasets (e.g. license files)
- 👌 Improved error messages in WebAnno TSV3 module
- 🛠 Fixed crash in WebannoTsv3XWriter when annotations do not start/end at token boundaries
- 🛠 Fixed bug in WebAnno TSV3 support causing span annotations with slot features to disappear
- 🛠 Fixed trimming of whitespace in TeiReader
- 🛠 Fixed bug in NifWriter causing named entity identifier not to be written
- 🛠 Fixed crash in BratReader with reading discontinuous segments
- 🛠 Fixed problem in BratWriter when dealing with slot features
- 🛠 Fixed metadata of CoNLL2012Writer
- 🛠 Fixed potential problem of datasets being written outside their target directory
- ⬇️ Dropped the GrAF I/O module since the upstream libraries are outdated and no longer maintained
🚀 A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @az79nefy, @ramonziai, @manuelciosici, @Horsmann, @tilmanbeck
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.10.0 Changes
September 10, 2018🚀 We are pleased to announce the release of
DKPro Core 1.10.0
a collection of interoperable software components for natural language
🖨 processing (NLP) based on the Apache UIMA framework.https://dkpro.github.io/dkpro-core
🚀 This is a feature release.
Notable changes since DKPro Core 1.9.3
- ➕ Added support for Arabic to CoreNlpSegmenter (thanks @Jibun)
- ➕ Added support for Token "form" to CoNLL writers (thanks @Jibun)
- ➕ Added ability to provide extra non-standard parameters to CoreNlpSegmenter (thanks @Jibun)
- ➕ Added ArkTreet POS tagger trainer (thanks @schrieveslaach)
- ➕ Added WebAnno TSV3 reader/writer
- ➕ Added reader for Leipzig Corpora Collection
- ⬆️ Upgraded to CoreNLP 3.9.1 (stanfordnlp and corenlp modules)
- ⬆️ Upgraded to OpenNLP 1.9.0
- ⬆️ Upgraded to PDFBox 2.0.9 (io-pdf module)
- ⬆️ Upgraded to LanguageTool 4.2
- ⬆️ Upgraded to CogComp 4.0.7 (lbj module)
- ⬆️ Upgraded to Tika 1.18 (io-tika module)
- 👌 Improved handling of multi-line annotations in brat module (thanks @parisni)
- 🛠 Fix discontinuous annotations crashing the brat reader by reading only the first fragment
- ➕ Added dataset description for GUM 4.1.0 dataset
- Removed PARAM_INTERN_TAGS
- 👌 Improved component metadata
🚀 A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @Jibun, @parisni, @schrieveslaach, @jgrivolla
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.9.3 Changes
July 28, 2018🚀 We are pleased to announce the release of
DKPro Core 1.9.3
a collection of interoperable software components for natural language
🖨 processing (NLP) based on the Apache UIMA framework.🚀 This is a bug-fix and minor feature release.
Notable changes since DKPro Core 1.9.2
- ➕ Added ability to restore Backmapper alignment data after a CAS restore
- ➕ Added ability to specify a cluster resource name for the ArkTweet POS-tagger trainer
- Added PARAM_MODEL_ENCODING to TreeTaggerChunker
- 🛠 Fixed issue that DictionaryAnnotator did not match at the sentence end
- Ensured that all parameters have a description
🚀 A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @nilsreiter, @mjunsilo, @schrieveslaach, @jkirsch
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.9.2 Changes
July 28, 2018🚀 We are pleased to announce the release of
DKPro Core 1.9.2
a collection of interoperable software components for natural language
🖨 processing (NLP) based on the Apache UIMA framework.🚀 This is a bug-fix and minor feature release.
Notable changes since DKPro Core 1.9.1
- 👍 Allow explicitly specifying a model artifact when running a model-based component
- 🛠 Fixed auto-loading of models in CoreNLP module
- 🛠 Fixed issue causing PdfReader to create annotations with leading/trailing whitespace
- ➕ Added more OMTD-SHARE metadata and UIMA capabilities
- Avoid failing when encountering a discontinuous segment in brat files
🚀 A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @nilsreiter, @mjunsilo
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.9.1 Changes
April 05, 2018🚀 We are pleased to announce the release of
DKPro Core 1.9.1
a collection of interoperable software components for natural language
🖨 processing (NLP) based on the Apache UIMA framework.🚀 This is a bug-fix and minor feature release.
Notable changes since DKPro Core 1.9.0
- 📇 Included OMTD-SHARE metadata
- 👌 Improved mapping capabilities and robustness of the BratReader
- ➕ Added option to mark split tokens in CamelCasTokenSegmenter
- 🛠 Fixed hash for CC-BY 4.0 license in dataset API
- 🛠 Fixed NPE in CoNLL 2012 reader
- ⬆️ Upgrade to LanguageTool 4.1
- ⬆️ Upgrade to ICU4J 61.1
- ⬆️ Upgrade to JTok 2.1.18
- ⬆️ Upgrade to OpenNLP 1.8.4
🚀 A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @nilsreiter, @mjunsilo
⬆️ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.