Apache Parquet v1.12.0 Release Notes
-
๐ Release Notes - Parquet - Version 1.12.0
Sub-task
- PARQUET-1228 - parquet-format-structures encryption
- ๐ PARQUET-1229 - parquet-mr code changes for encryption support
- ๐ฆ PARQUET-1286 - Crypto package in parquet-mr
- PARQUET-1328 - [java]Bloom filter read/write implementation
- PARQUET-1391 - [java] Integrate Bloom filter logic
- PARQUET-1516 - Store Bloom filters near to footer.
- PARQUET-1740 - Make ParquetFileReader.getFilteredRecordCount public
- PARQUET-1744 - Some filters throws ArrayIndexOutOfBoundsException
- โ PARQUET-1807 - Encryption: Interop and Function test suite for Java version
- ๐ PARQUET-1884 - Merge encryption branch into master
- PARQUET-1915 - Add null command
๐ Bug
- PARQUET-1438 - [C++] corrupted files produced on 32-bit architecture (i686)
- ๐ PARQUET-1493 - maven protobuf plugin not work properly
- PARQUET-1455 - [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
- โฌ๏ธ PARQUET-1554 - Compilation error when upgrading Scrooge version
- PARQUET-1599 - Fix to-avro to respect the overwrite option
- 0๏ธโฃ PARQUET-1684 - [parquet-protobuf] default protobuf field values are stored as nulls
- PARQUET-1699 - Could not resolve org.apache.yetus:audience-annotations:0.11.0
- ๐ PARQUET-1741 - APIs backward compatibility issues cause master branch build failure
- PARQUET-1765 - Invalid filteredRowCount in InternalParquetRecordReader
- โ PARQUET-1794 - Random data generation may cause flaky tests
- PARQUET-1803 - Could not find FilleInputSplit in ParquetInputSplit
- ๐ PARQUET-1808 - SimpleGroup.toString() uses String += and so has poor performance
- PARQUET-1818 - Fix collision of encryption and bloom filters in format-structure Util
- ๐ PARQUET-1850 - toParquetMetadata method in ParquetMetadataConverter does not set dictionary page offset bit
- ๐ PARQUET-1851 - ParquetMetadataConveter throws NPE in an Iceberg unit test
- PARQUET-1868 - Parquet reader options toggle for bloom filter toggles dictionary filtering
- PARQUET-1879 - Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field
- ๐ PARQUET-1893 - H2SeekableInputStream readFully() doesn't respect start and len
- PARQUET-1894 - Please fix the related Shaded Jackson Databind CVEs
- ๐ PARQUET-1896 - [Maven] parquet-tools build is broken
- PARQUET-1910 - Parquet-cli is broken after TransCompressionCommand was added
- 0๏ธโฃ PARQUET-1917 - [parquet-proto] default values are stored in oneOf fields that aren't set
- PARQUET-1920 - Fix issue with reading parquet files with too large column chunks
- โ PARQUET-1923 - parquet-tools 1.11.0: TestSimpleRecordConverter fails with ExceptionInInitializerError on openjdk 15
- ๐ PARQUET-1928 - Interpret Parquet INT96 type as FIXED[12] AVRO Schema
- PARQUET-1944 - Unable to download transitive dependency hadoop-lzo
- ๐ PARQUET-1947 - DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data
- ๐ PARQUET-1949 - Mark Parquet-1872 with not support bloom filter yet
- PARQUET-1954 - TCP connection leak in parquet dump
- ๐ PARQUET-1963 - DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty
- ๐ PARQUET-1966 - Fix build with JDK11 for JDK8
- ๐ PARQUET-1970 - Make minor releases source compatible
- โ PARQUET-1971 - Flaky test in github action
- โ PARQUET-1975 - Test failure on ARM64 CPU architecture
- PARQUET-1977 - Invalid data_page_offset
- PARQUET-1979 - Optional bloom_filter_offset is filled if no bloom filter is present
- ๐ PARQUET-1984 - Some tests fail on windows
- ๐ PARQUET-1992 - Cannot build from tarball because of git submodules
- PARQUET-1999 - NPE might occur if OutputFile is implemented by the client
๐ New Feature
- PARQUET-41 - Add bloom filters to parquet statistics
- PARQUET-1373 - Encryption key management tools
- PARQUET-1396 - Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory
- PARQUET-1622 - Add BYTE_STREAM_SPLIT encoding
- ๐ง PARQUET-1784 - Column-wise configuration
- PARQUET-1817 - Crypto Properties Factory
- PARQUET-1854 - Properties-Driven Interface to Parquet Encryption
๐ Improvement
- PARQUET-313 - Implement 3 level list writing rule for Parquet-Thrift
- ๐ PARQUET-1528 - Add JSON support to `parquet-tools head`
- PARQUET-1593 - Replace the example usage in parquet-cli's help message with an actually existent subcommand
- PARQUET-1660 - [java] Align Bloom filter implementation with format
- ๐ PARQUET-1666 - Remove Unused Modules
- ๐ PARQUET-1696 - Remove unused hadoop-1 profile
- PARQUET-1710 - Use Objects.requireNonNull
- PARQUET-1723 - Read From Maps Without Using Contains
- PARQUET-1724 - Use ConcurrentHashMap for Cache in DictionaryPageReader
- PARQUET-1725 - Replace Usage of Strings.join with JDK Functionality in ColumnPath Class
- ๐ป PARQUET-1726 - Use Java 8 Multi Exception Handling
- PARQUET-1727 - Do Not Swallow InterruptedException in ParquetLoader
- ๐ PARQUET-1728 - Simplify NullPointerException Handling in AvroWriteSupport
- PARQUET-1729 - Avoid AutoBoxing in EncodingStats
- PARQUET-1730 - Use switch Statement in AvroIndexedRecordConverter for Enums
- PARQUET-1731 - Use JDK 8 Facilities to Simplify FilteringRecordMaterializer
- PARQUET-1732 - Call toArray With Empty Array
- PARQUET-1735 - Clean Up parquet-columns Module
- PARQUET-1736 - Use StringBuilder instead of StringBuffer
- โ PARQUET-1737 - Replace Test Class RandomStr with Apache Commons Lang
- ๐ PARQUET-1738 - Remove unused imports in parquet-column
- PARQUET-1743 - Add equals to BlockSplitBloomFilter
- PARQUET-1749 - Use Java 8 Streams for Empty PrimitiveIterator
- PARQUET-1750 - Reduce Memory Usage of RowRanges Class
- ๐ PARQUET-1751 - Fix Protobuf Build Warning
- ๐ PARQUET-1756 - Remove Dependency on Maven Plugin semantic-versioning
- PARQUET-1759 - InternalParquetRecordReader Use Singleton Set
- โ PARQUET-1763 - Add SLF4J to TestCircularReferences
- PARQUET-1764 - The ParquetProperties constructor parameter list is so long
- ๐ PARQUET-1775 - Deprecate AvroParquetWriter Builder Hadoop Path
- PARQUET-1778 - Do Not Consider Class for Avro Generic Record Reader
- PARQUET-1782 - Use Switch Statement in AvroRecordConverter
- PARQUET-1790 - ParquetFileWriter missing Api for DataPageV2
- PARQUET-1791 - Add 'prune' command to parquet-tools
- ๐ PARQUET-1801 - Add column index support for 'prune' command in Parquet-tools/cli
- PARQUET-1802 - CompressionCodec class not found if the codec class is not in the same defining classloader as the CodecFactory class
- ๐จ PARQUET-1805 - Refactor the configuration for bloom filters
- PARQUET-1821 - Add 'column-size' command to parquet-cli and parquet-tools
- ๐ง PARQUET-1826 - Document hadoop configuration options
- ๐ PARQUET-1827 - UUID type currently not supported by parquet-mr
- PARQUET-1853 - Minimize the parquet-avro fastutil shaded jar
- ๐ PARQUET-1863 - Remove use of add-test-source mojo in parquet-protobuf
- PARQUET-1866 - Replace Hadoop ZSTD with JNI-ZSTD
- โฌ๏ธ PARQUET-1890 - Upgrade to Avro 1.10.0
- ๐ PARQUET-1891 - Encryption-related light fixes
- ๐ PARQUET-1914 - Allow ProtoParquetReader To Support InputFile
- PARQUET-1924 - Do not Instantiate a New LongHashFunction
- ๐ PARQUET-1926 - Add LogicalType support to ThriftType.I64Type
- PARQUET-1929 - Bump Snappy to 1.1.8
- PARQUET-1930 - Bump Apache Thrift to 0.13.0
- PARQUET-1931 - Bump Junit 4.13.1
- PARQUET-1932 - Bump Fastutil to 8.4.2
- PARQUET-1938 - Option to get KMS details from key material (in key rotation)
- PARQUET-1939 - Fix RemoteKmsClient API ambiguity
- ๐ง PARQUET-1940 - Make KeyEncryptionKey length configurable
- PARQUET-1941 - Bump Commons CLI from 1.3.1 to 1.4
- ๐ PARQUET-1951 - Allow different strategies to combine key values when merging parquet files
- โฌ๏ธ PARQUET-1952 - Upgrade Avro to 1.10.1
- PARQUET-1961 - Bump Jackson to 2.11.4
- PARQUET-1964 - Properly handle missing/null filter
- โฌ๏ธ PARQUET-1967 - Upgrade Zstd-jni to 1.4.8-3
- โ PARQUET-1969 - Test by GithubAction
- ๐ PARQUET-1973 - Support ZSTD JNI BufferPool
- โฌ๏ธ PARQUET-1988 - Upgrade to ZSTD 1.4.8-6
- โฌ๏ธ PARQUET-1994 - Upgrade ZSTD JNI to 1.4.9-1
โ Test
- PARQUET-1832 - Travis fails with too long output
- ๐ PARQUET-1980 - Build and test Apache Parquet on ARM64 CPU architecture
Wish
- PARQUET-1717 - parquet-thrift converts Thrift i16 to parquet INT32 instead of INT_16
Task
- ๐ PARQUET-1676 - Remove hive modules
- โก๏ธ PARQUET-1703 - Update API compatibility check
- PARQUET-1796 - Bump Apache Avro to 1.9.2
- โก๏ธ PARQUET-1842 - Update Jackson Databind version to address CVE
- ๐ PARQUET-1844 - Removed Hadoop transitive dependency on commons-lang
- โก๏ธ PARQUET-1895 - Update jackson-databind
- ๐ PARQUET-1898 - Release parquet-mr 1.12.0