DatumBox v0.7.0 Release Notes

Release Date: 2016-03-19 // about 8 years ago
    • Speed & Memory:
      • Added multi-threading support on the majority of algorithms and methods, making the 0.7.0 version several times faster than 0.6.x.
      • Implemented Storage Hints & hybrid strategies which enable the efficient use of LRU cache and faster training for large datasets that don't fit in memory.
      • All the algorithms which require Matrixes now use sparse implementations to reduce the memory footprint.
      • Fixed a limitation on clustering algorithms which forced us to store the list of clusters in memory.
    • Algorithms & Methods:
      • Added L1, L2 and ElasticNet regularization in Logistic, Ordinal and Linear Regression algorithms.
      • The Collaborative Filtering algorithm was modified to support more generic User-user CF models.
      • Updated the NgramsExtractor algorithm to export more keywords and provide better signals for NLP models.
    • Framework Architecture:
      • The framework is now split to separate modules and the main library is renamed to "datumbox-framework-lib".
      • The Dataset class is replaced with the Dataframe class, which implements the Collection interface and enables the processing of the records in parallel.
      • Major changes on the structure of Interfaces and inheritance to simplify the architecture.
      • BaseMLrecommender now inherits from BaseMLmodel.
    • 🛠 Code Improvements & Bug Fixes:
      • Added serialVersionUID in every serializable class.
      • Improved Exceptions and error messages.
      • Fixed a bug on Adaboost which resulted in mapping incorrectly the recordIds.
      • Improved documentation and javadocs comments.
      • Increased the test-coverage.