DatumBox v0.7.0 Release Notes
Release Date: 2016-03-19 // about 8 years ago-
- Speed & Memory:
- Added multi-threading support on the majority of algorithms and methods, making the 0.7.0 version several times faster than 0.6.x.
- Implemented Storage Hints & hybrid strategies which enable the efficient use of LRU cache and faster training for large datasets that don't fit in memory.
- All the algorithms which require Matrixes now use sparse implementations to reduce the memory footprint.
- Fixed a limitation on clustering algorithms which forced us to store the list of clusters in memory.
- Algorithms & Methods:
- Added L1, L2 and ElasticNet regularization in Logistic, Ordinal and Linear Regression algorithms.
- The Collaborative Filtering algorithm was modified to support more generic User-user CF models.
- Updated the NgramsExtractor algorithm to export more keywords and provide better signals for NLP models.
- Framework Architecture:
- The framework is now split to separate modules and the main library is renamed to "datumbox-framework-lib".
- The Dataset class is replaced with the Dataframe class, which implements the Collection interface and enables the processing of the records in parallel.
- Major changes on the structure of Interfaces and inheritance to simplify the architecture.
- BaseMLrecommender now inherits from BaseMLmodel.
- 🛠 Code Improvements & Bug Fixes:
- Added serialVersionUID in every serializable class.
- Improved Exceptions and error messages.
- Fixed a bug on Adaboost which resulted in mapping incorrectly the recordIds.
- Improved documentation and javadocs comments.
- Increased the test-coverage.
- Speed & Memory: