Mallet v2.1 Release Notes

Release Date: 2019-06-13 // almost 5 years ago
  • ๐Ÿš€ This is a serialization-breaking release due to the switch to HPPC, which affects feature alphabets.

    โž• Added

    • Nonnegative Matrix Factorization
    • ๐Ÿ‘ฏ Word embeddings (word2vec clone)
    • ๐Ÿ‘ PagedInstanceList supports iteration correctly
    • lebiathan added stratified sampling of InstanceList
    • This file!

    ๐Ÿ”„ Changed

    • ๐Ÿ”€ All merging and propagation of sampling statistics for topic modeling is now multi-threaded (if num-threads is more than 1), leading to a 5-10% speed boost.
    • ๐Ÿšš The primitive collections library (for example mapping String to int) has been changed from GNU trove to Carrotlabs HPPC. This change removes all GNU dependencies.
    • The license has been changed from CPL to Apache.
    • ๐Ÿ‘‰ Use of VMID for unique identifier for serialized objects. (Breaks serialization!)
    • ๐Ÿ›  Many small fixes suggested by ErrorProne.
    • ๐Ÿšš Unneeded imports removed.

    โœ‚ Removed

    • ๐Ÿšš The Matrix2 class has been removed.
    • ๐Ÿ“ฆ GRMM has been moved to a separate package.

    ๐Ÿ›  Fixed

    • ๐Ÿ›  Te Rutherford fixed a bug where non-String instance IDs were being cast as Strings.

Previous changes from v2.0.8

  • ๐Ÿ”„ Changed

    • โช The default format for document-topic proportions now prints values for all topics in order. The earlier file format (sparse listing of topic/proportion) can be restored using command line options.