Apache Parquet v1.12.0 Release Notes

  • ๐Ÿš€ Release Notes - Parquet - Version 1.12.0

    Sub-task

    ๐Ÿ› Bug

    • PARQUET-1438 - [C++] corrupted files produced on 32-bit architecture (i686)
    • ๐Ÿ”Œ PARQUET-1493 - maven protobuf plugin not work properly
    • PARQUET-1455 - [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
    • โฌ†๏ธ PARQUET-1554 - Compilation error when upgrading Scrooge version
    • PARQUET-1599 - Fix to-avro to respect the overwrite option
    • 0๏ธโƒฃ PARQUET-1684 - [parquet-protobuf] default protobuf field values are stored as nulls
    • PARQUET-1699 - Could not resolve org.apache.yetus:audience-annotations:0.11.0
    • ๐Ÿ— PARQUET-1741 - APIs backward compatibility issues cause master branch build failure
    • PARQUET-1765 - Invalid filteredRowCount in InternalParquetRecordReader
    • โœ… PARQUET-1794 - Random data generation may cause flaky tests
    • PARQUET-1803 - Could not find FilleInputSplit in ParquetInputSplit
    • ๐ŸŽ PARQUET-1808 - SimpleGroup.toString() uses String += and so has poor performance
    • PARQUET-1818 - Fix collision of encryption and bloom filters in format-structure Util
    • ๐Ÿ“‡ PARQUET-1850 - toParquetMetadata method in ParquetMetadataConverter does not set dictionary page offset bit
    • ๐Ÿ“‡ PARQUET-1851 - ParquetMetadataConveter throws NPE in an Iceberg unit test
    • PARQUET-1868 - Parquet reader options toggle for bloom filter toggles dictionary filtering
    • PARQUET-1879 - Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field
    • ๐Ÿ‘€ PARQUET-1893 - H2SeekableInputStream readFully() doesn't respect start and len
    • PARQUET-1894 - Please fix the related Shaded Jackson Databind CVEs
    • ๐Ÿ— PARQUET-1896 - [Maven] parquet-tools build is broken
    • PARQUET-1910 - Parquet-cli is broken after TransCompressionCommand was added
    • 0๏ธโƒฃ PARQUET-1917 - [parquet-proto] default values are stored in oneOf fields that aren't set
    • PARQUET-1920 - Fix issue with reading parquet files with too large column chunks
    • โœ… PARQUET-1923 - parquet-tools 1.11.0: TestSimpleRecordConverter fails with ExceptionInInitializerError on openjdk 15
    • ๐Ÿ›  PARQUET-1928 - Interpret Parquet INT96 type as FIXED[12] AVRO Schema
    • PARQUET-1944 - Unable to download transitive dependency hadoop-lzo
    • ๐Ÿ—„ PARQUET-1947 - DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data
    • ๐Ÿ‘ PARQUET-1949 - Mark Parquet-1872 with not support bloom filter yet
    • PARQUET-1954 - TCP connection leak in parquet dump
    • ๐Ÿ—„ PARQUET-1963 - DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty
    • ๐Ÿ— PARQUET-1966 - Fix build with JDK11 for JDK8
    • ๐Ÿš€ PARQUET-1970 - Make minor releases source compatible
    • โœ… PARQUET-1971 - Flaky test in github action
    • โœ… PARQUET-1975 - Test failure on ARM64 CPU architecture
    • PARQUET-1977 - Invalid data_page_offset
    • PARQUET-1979 - Optional bloom_filter_offset is filled if no bloom filter is present
    • ๐Ÿ PARQUET-1984 - Some tests fail on windows
    • ๐Ÿ— PARQUET-1992 - Cannot build from tarball because of git submodules
    • PARQUET-1999 - NPE might occur if OutputFile is implemented by the client

    ๐Ÿ†• New Feature

    ๐Ÿ‘Œ Improvement

    • PARQUET-313 - Implement 3 level list writing rule for Parquet-Thrift
    • ๐Ÿ‘ PARQUET-1528 - Add JSON support to `parquet-tools head`
    • PARQUET-1593 - Replace the example usage in parquet-cli's help message with an actually existent subcommand
    • PARQUET-1660 - [java] Align Bloom filter implementation with format
    • ๐Ÿšš PARQUET-1666 - Remove Unused Modules
    • ๐Ÿšš PARQUET-1696 - Remove unused hadoop-1 profile
    • PARQUET-1710 - Use Objects.requireNonNull
    • PARQUET-1723 - Read From Maps Without Using Contains
    • PARQUET-1724 - Use ConcurrentHashMap for Cache in DictionaryPageReader
    • PARQUET-1725 - Replace Usage of Strings.join with JDK Functionality in ColumnPath Class
    • ๐Ÿ‘ป PARQUET-1726 - Use Java 8 Multi Exception Handling
    • PARQUET-1727 - Do Not Swallow InterruptedException in ParquetLoader
    • ๐Ÿ‘ PARQUET-1728 - Simplify NullPointerException Handling in AvroWriteSupport
    • PARQUET-1729 - Avoid AutoBoxing in EncodingStats
    • PARQUET-1730 - Use switch Statement in AvroIndexedRecordConverter for Enums
    • PARQUET-1731 - Use JDK 8 Facilities to Simplify FilteringRecordMaterializer
    • PARQUET-1732 - Call toArray With Empty Array
    • PARQUET-1735 - Clean Up parquet-columns Module
    • PARQUET-1736 - Use StringBuilder instead of StringBuffer
    • โœ… PARQUET-1737 - Replace Test Class RandomStr with Apache Commons Lang
    • ๐Ÿšš PARQUET-1738 - Remove unused imports in parquet-column
    • PARQUET-1743 - Add equals to BlockSplitBloomFilter
    • PARQUET-1749 - Use Java 8 Streams for Empty PrimitiveIterator
    • PARQUET-1750 - Reduce Memory Usage of RowRanges Class
    • ๐Ÿ— PARQUET-1751 - Fix Protobuf Build Warning
    • ๐Ÿšš PARQUET-1756 - Remove Dependency on Maven Plugin semantic-versioning
    • PARQUET-1759 - InternalParquetRecordReader Use Singleton Set
    • โœ… PARQUET-1763 - Add SLF4J to TestCircularReferences
    • PARQUET-1764 - The ParquetProperties constructor parameter list is so long
    • ๐Ÿ— PARQUET-1775 - Deprecate AvroParquetWriter Builder Hadoop Path
    • PARQUET-1778 - Do Not Consider Class for Avro Generic Record Reader
    • PARQUET-1782 - Use Switch Statement in AvroRecordConverter
    • PARQUET-1790 - ParquetFileWriter missing Api for DataPageV2
    • PARQUET-1791 - Add 'prune' command to parquet-tools
    • ๐Ÿ‘ PARQUET-1801 - Add column index support for 'prune' command in Parquet-tools/cli
    • PARQUET-1802 - CompressionCodec class not found if the codec class is not in the same defining classloader as the CodecFactory class
    • ๐Ÿ”จ PARQUET-1805 - Refactor the configuration for bloom filters
    • PARQUET-1821 - Add 'column-size' command to parquet-cli and parquet-tools
    • ๐Ÿ”ง PARQUET-1826 - Document hadoop configuration options
    • ๐Ÿ‘ PARQUET-1827 - UUID type currently not supported by parquet-mr
    • PARQUET-1853 - Minimize the parquet-avro fastutil shaded jar
    • ๐Ÿšš PARQUET-1863 - Remove use of add-test-source mojo in parquet-protobuf
    • PARQUET-1866 - Replace Hadoop ZSTD with JNI-ZSTD
    • โฌ†๏ธ PARQUET-1890 - Upgrade to Avro 1.10.0
    • ๐Ÿ›  PARQUET-1891 - Encryption-related light fixes
    • ๐Ÿ‘ PARQUET-1914 - Allow ProtoParquetReader To Support InputFile
    • PARQUET-1924 - Do not Instantiate a New LongHashFunction
    • ๐Ÿ‘ PARQUET-1926 - Add LogicalType support to ThriftType.I64Type
    • PARQUET-1929 - Bump Snappy to 1.1.8
    • PARQUET-1930 - Bump Apache Thrift to 0.13.0
    • PARQUET-1931 - Bump Junit 4.13.1
    • PARQUET-1932 - Bump Fastutil to 8.4.2
    • PARQUET-1938 - Option to get KMS details from key material (in key rotation)
    • PARQUET-1939 - Fix RemoteKmsClient API ambiguity
    • ๐Ÿ”ง PARQUET-1940 - Make KeyEncryptionKey length configurable
    • PARQUET-1941 - Bump Commons CLI from 1.3.1 to 1.4
    • ๐Ÿ”€ PARQUET-1951 - Allow different strategies to combine key values when merging parquet files
    • โฌ†๏ธ PARQUET-1952 - Upgrade Avro to 1.10.1
    • PARQUET-1961 - Bump Jackson to 2.11.4
    • PARQUET-1964 - Properly handle missing/null filter
    • โฌ†๏ธ PARQUET-1967 - Upgrade Zstd-jni to 1.4.8-3
    • โœ… PARQUET-1969 - Test by GithubAction
    • ๐Ÿ‘ PARQUET-1973 - Support ZSTD JNI BufferPool
    • โฌ†๏ธ PARQUET-1988 - Upgrade to ZSTD 1.4.8-6
    • โฌ†๏ธ PARQUET-1994 - Upgrade ZSTD JNI to 1.4.9-1

    โœ… Test

    • PARQUET-1832 - Travis fails with too long output
    • ๐Ÿ— PARQUET-1980 - Build and test Apache Parquet on ARM64 CPU architecture

    Wish

    • PARQUET-1717 - parquet-thrift converts Thrift i16 to parquet INT32 instead of INT_16

    Task