Mallet v2.1 Release Notes
Release Date: 2019-06-13 // almost 6 years ago-
๐ This is a serialization-breaking release due to the switch to HPPC, which affects feature alphabets.
โ Added
- Nonnegative Matrix Factorization
- ๐ฏ Word embeddings (word2vec clone)
- ๐ PagedInstanceList supports iteration correctly
- lebiathan added stratified sampling of InstanceList
- This file!
๐ Changed
- ๐ All merging and propagation of sampling statistics for topic modeling is now multi-threaded (if num-threads is more than 1), leading to a 5-10% speed boost.
- ๐ The primitive collections library (for example mapping String to int) has been changed from GNU trove to Carrotlabs HPPC. This change removes all GNU dependencies.
- The license has been changed from CPL to Apache.
- ๐ Use of VMID for unique identifier for serialized objects. (Breaks serialization!)
- ๐ Many small fixes suggested by ErrorProne.
- ๐ Unneeded imports removed.
โ Removed
- ๐ The Matrix2 class has been removed.
- ๐ฆ GRMM has been moved to a separate package.
๐ Fixed
- ๐ Te Rutherford fixed a bug where non-String instance IDs were being cast as Strings.
Previous changes from v2.0.8
-
๐ Changed
- โช The default format for document-topic proportions now prints values for all topics in order. The earlier file format (sparse listing of topic/proportion) can be restored using command line options.