Changelog History
-
v2.1.0 Changes
December 01, 2019๐ We are pleased to announce the release of
DKPro Core 2.1.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework version 3.
https://dkpro.github.io/dkpro-core
๐ This is a feature release.
Notable changes since DKPro Core 2.0.0
- โ Added option to export XMI using XML 1.1 to avoid issues with certain characters
- โ Added option to CoNLL readers to trim off whitespace from field values to avoid users having issues with incidental space characters (default is on)
- โ Added support for annotator notes in brat format
- ๐ Improved speed for writing WebAnno TSV format (backported from WebAnno)
- ๐ Fixed a couple of issues with the CoNLL 2012 format
- ๐ Fixed default extension for CoNLL-U writer
- ๐ Fixed problem in CoNLL-U writer when text contains line breaks
- ๐ Fixed problem that LanguageToolChecker did not fill in suggestions
- ๐ Fixed setting div type on paragraphs created by CoNLL-U reader
๐ A more detailed overview of the changes in this release can be found [2].
Thanks to all contributors!
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
๐ [1] https://github.com/dkpro/dkpro-core/releases/tag/rel%2Fdkpro-core-2.1.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A2.1.0 -
v2.0.0 Changes
September 08, 2019๐ We are pleased to announce the release of
DKPro Core 2.0.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
๐ This is a feature release.
โฌ๏ธ Important upgrade notice
This version requires UIMA v3.
โฌ๏ธ If you are upgrading from DKPro Core 1.10.x or earlier, please read the DKPro Core 1.11.0 upgrade notice [1].
Notable changes since DKPro Core 1.11.1
- Switched to UIMAv3
- โ Added filling in suggestions to LanguageToolChecker
- โ Added support for notes to BratReader
- โ Added basic read support for Perseus XML format
- ๐ Improved error message when StanfordNamedEntityRecognizerTrainer is called without training data
- ๐ Moved StopwordRemover to tokit module and removed stopwordremover module
- ๐ Renamed lancaster module to smile
- โ Removed Tag type from syntax module
- ... and a few additional under-the-hood changes
๐ A more detailed overview of the changes in this release can be found [2].
Thanks for contributions go to: @alaindesilets, @mischor
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
๐ [1] https://github.com/dkpro/dkpro-core/releases/tag/rel%2Fdkpro-core-2.0.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A2.0.0 -
v1.12.0 Changes
December 01, 2019๐ We are pleased to announce the release of
DKPro Core 1.12.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework version 2.
https://dkpro.github.io/dkpro-core
๐ This is a feature release.
โฌ๏ธ Important upgrade notice
โฌ๏ธ If you are upgrading from DKPro Core 1.10.x or earlier, please read the DKPro Core 1.11.0 upgrade notice [1].
Notable changes since DKPro Core 1.11.1
- โ Added option to export XMI using XML 1.1 to avoid issues with certain characters
- โ Added option to CoNLL readers to trim off whitespace from field values to avoid users having issues with incidental space characters (default is on)
- โ Added support for annotator notes in brat format
- ๐ Improved speed for writing WebAnno TSV format (backported from WebAnno)
- ๐ Fixed a couple of issues with the CoNLL 2012 format
- ๐ Fixed default extension for CoNLL-U writer
- ๐ Fixed problem in CoNLL-U writer when text contains line breaks
- ๐ Fixed problem that LanguageToolChecker did not fill in suggestions
๐ A more detailed overview of the changes in this release can be found [2].
Thanks to all contributors!
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
๐ [1] https://github.com/dkpro/dkpro-core/releases/tag/rel%2Fdkpro-core-1.12.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A1.12.0 -
v1.11.1 Changes
August 17, 2019๐ We are pleased to announce the release of
DKPro Core 1.11.1
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
๐ This is a bugfix release.
โฌ๏ธ Important upgrade notice
โฌ๏ธ If you are upgrading from DKPro Core 1.10.x or earlier, please read the DKPro Core 1.11.0 upgrade notice [1].
Notable changes since DKPro Core 1.11.0
- ๐ Fixed trimming of whitespace at the start and end of annotations
- ๐ Fixed encoding of named entity categories in LIF format
- ๐ Fixed unescaping of URI-encoded characters when writing files
- โ Added parameter to control whitespace normalization in HtmlDocumentReader
- โ Added parameters to control indentation and output method in XmlDocumentWriter
- ๐ Improved exception in Stanford CoreNLP NER trainer when no documents have been processed
๐ A more detailed overview of the changes in this release can be found [2].
Thanks for contributions go to: @az79nefy, @ramonziai, @manuelciosici, @Horsmann, @tilmanbeck, @alaindesilets, @jcklie
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
๐ [1] https://github.com/dkpro/dkpro-core/releases/tag/dkpro-core-1.11.0
[2] https://github.com/dkpro/dkpro-core/issues?q=milestone%3A1.11.1 -
v1.11.0 Changes
July 05, 2019๐ We are pleased to announce the release of
DKPro Core 1.11.0
a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
๐ This is a feature release.
โฌ๏ธ Important upgrade notice
- ๐ Changed groupIds and artifactIds. The group ID is now org.dkpro.core and the artifact IDs are dkpro-core-...-(asl/gpl)
- ๐ Changed package names. The packages are now all starting with org.dkpro.core... - except the packages of UIMA types which remain unchanged for data compatibility.
Notable changes since DKPro Core 1.10.0
- ๐ Changed parts of the brat data conversion code such that it can be more easily used outside a UIMA component
- ๐ Changed type mapping such that out-of-tagset types map to the generic type (e.g. an unknown POS tag maps to POS, not to POS_X)
- ๐ Changed name of NYTCollectionReader to NitfReader
- โ Added types to encode XML document structure in CAS
- โ Added new XmlDocumentReader/Writer components using these types
- โ Added basic reader for Annotated Gigaword corpus (only reads text so far) (thanks @az79nefy)
- โ Added basic support for PubAnnotation JSON format
- โ Added Maui component for keyword assignment
- โ Added parameter to SfstAnnotator to enable lower-case lookup of first word in a sentence (thanks @rziai)
- โ Added "order" feature to Token type
- โ Added support for CoNLL-U document and paragraph IDs (thanks @manuelciosici)
- โ Added support for CoNLL-U sentence IDs and text
- โ Added standardized parameter to disable type mapping
- โ Added support for TCF orthography layer using SofaChangeAnnotations
- โ Added segmenter for Chinese using jieba (thanks @Horsmann)
- โ Added MyStem for Russian
- โ Added links to OpenMinTeD categories in type system documentation
- โ Added support for the reading/writing the CoreNLP CoNLL flavor
- โ Added parameter to configure the Tika buffer size (useful for large documents)
- โก๏ธ Updated to OpenNLP 1.9.1
- โก๏ธ Updated to CoreNLP 3.9.2
- โก๏ธ Updated to ICU4J 64.2
- โก๏ธ Updated to Tika 1.19.1
- โก๏ธ Updated to LanguageTool 4.3
- โก๏ธ Updated to PDFBox 2.0.12
- โก๏ธ Updated IllinoisNLP components
- โก๏ธ Updated TreeTagger models/binaries in build.xml script (thanks @tilmanbeck)
- โก๏ธ Updated LIF dependencies
- โก๏ธ Updated dataset descriptions
- โก๏ธ Updated various general dependencies (e.g. Apache Commons etc.)
- ๐ Improved robustness of checksum verification for text files used in datasets (e.g. license files)
- ๐ Improved error messages in WebAnno TSV3 module
- ๐ Fixed crash in WebannoTsv3XWriter when annotations do not start/end at token boundaries
- ๐ Fixed bug in WebAnno TSV3 support causing span annotations with slot features to disappear
- ๐ Fixed trimming of whitespace in TeiReader
- ๐ Fixed bug in NifWriter causing named entity identifier not to be written
- ๐ Fixed crash in BratReader with reading discontinuous segments
- ๐ Fixed problem in BratWriter when dealing with slot features
- ๐ Fixed metadata of CoNLL2012Writer
- ๐ Fixed potential problem of datasets being written outside their target directory
- โฌ๏ธ Dropped the GrAF I/O module since the upstream libraries are outdated and no longer maintained
๐ A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @az79nefy, @ramonziai, @manuelciosici, @Horsmann, @tilmanbeck
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.10.0 Changes
September 10, 2018๐ We are pleased to announce the release of
DKPro Core 1.10.0
a collection of interoperable software components for natural language
๐จ processing (NLP) based on the Apache UIMA framework.https://dkpro.github.io/dkpro-core
๐ This is a feature release.
Notable changes since DKPro Core 1.9.3
- โ Added support for Arabic to CoreNlpSegmenter (thanks @Jibun)
- โ Added support for Token "form" to CoNLL writers (thanks @Jibun)
- โ Added ability to provide extra non-standard parameters to CoreNlpSegmenter (thanks @Jibun)
- โ Added ArkTreet POS tagger trainer (thanks @schrieveslaach)
- โ Added WebAnno TSV3 reader/writer
- โ Added reader for Leipzig Corpora Collection
- โฌ๏ธ Upgraded to CoreNLP 3.9.1 (stanfordnlp and corenlp modules)
- โฌ๏ธ Upgraded to OpenNLP 1.9.0
- โฌ๏ธ Upgraded to PDFBox 2.0.9 (io-pdf module)
- โฌ๏ธ Upgraded to LanguageTool 4.2
- โฌ๏ธ Upgraded to CogComp 4.0.7 (lbj module)
- โฌ๏ธ Upgraded to Tika 1.18 (io-tika module)
- ๐ Improved handling of multi-line annotations in brat module (thanks @parisni)
- ๐ Fix discontinuous annotations crashing the brat reader by reading only the first fragment
- โ Added dataset description for GUM 4.1.0 dataset
- Removed PARAM_INTERN_TAGS
- ๐ Improved component metadata
๐ A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @Jibun, @parisni, @schrieveslaach, @jgrivolla
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.9.3 Changes
July 28, 2018๐ We are pleased to announce the release of
DKPro Core 1.9.3
a collection of interoperable software components for natural language
๐จ processing (NLP) based on the Apache UIMA framework.๐ This is a bug-fix and minor feature release.
Notable changes since DKPro Core 1.9.2
- โ Added ability to restore Backmapper alignment data after a CAS restore
- โ Added ability to specify a cluster resource name for the ArkTweet POS-tagger trainer
- Added PARAM_MODEL_ENCODING to TreeTaggerChunker
- ๐ Fixed issue that DictionaryAnnotator did not match at the sentence end
- Ensured that all parameters have a description
๐ A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @nilsreiter, @mjunsilo, @schrieveslaach, @jkirsch
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.9.2 Changes
July 28, 2018๐ We are pleased to announce the release of
DKPro Core 1.9.2
a collection of interoperable software components for natural language
๐จ processing (NLP) based on the Apache UIMA framework.๐ This is a bug-fix and minor feature release.
Notable changes since DKPro Core 1.9.1
- ๐ Allow explicitly specifying a model artifact when running a model-based component
- ๐ Fixed auto-loading of models in CoreNLP module
- ๐ Fixed issue causing PdfReader to create annotations with leading/trailing whitespace
- โ Added more OMTD-SHARE metadata and UIMA capabilities
- Avoid failing when encountering a discontinuous segment in brat files
๐ A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @nilsreiter, @mjunsilo
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
-
v1.9.1 Changes
April 05, 2018๐ We are pleased to announce the release of
DKPro Core 1.9.1
a collection of interoperable software components for natural language
๐จ processing (NLP) based on the Apache UIMA framework.๐ This is a bug-fix and minor feature release.
Notable changes since DKPro Core 1.9.0
- ๐ Included OMTD-SHARE metadata
- ๐ Improved mapping capabilities and robustness of the BratReader
- โ Added option to mark split tokens in CamelCasTokenSegmenter
- ๐ Fixed hash for CC-BY 4.0 license in dataset API
- ๐ Fixed NPE in CoNLL 2012 reader
- โฌ๏ธ Upgrade to LanguageTool 4.1
- โฌ๏ธ Upgrade to ICU4J 61.1
- โฌ๏ธ Upgrade to JTok 2.1.18
- โฌ๏ธ Upgrade to OpenNLP 1.8.4
๐ A more detailed overview of the changes in this release can be found here.
Thanks for contributions go to: @nilsreiter, @mjunsilo
โฌ๏ธ When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.