Apache Parquet v1.9.0 Release Notes
Release Date: 2016-10-19 // over 7 years ago-
๐ Bug
- PARQUET-182 - FilteredRecordReader skips rows it shouldn't for schema with optional columns
- PARQUET-212 - Implement nested type read rules in parquet-thrift
- PARQUET-241 - ParquetInputFormat.getFooters() should return in the same order as what listStatus() returns
- ๐ฆ PARQUET-305 - Logger instantiated for package org.apache.parquet may be GC-ed
- PARQUET-335 - Avro object model should not require MAP_KEY_VALUE
- PARQUET-340 - totalMemoryPool is truncated to 32 bits
- PARQUET-346 - ThriftSchemaConverter throws for unknown struct or union type
- ๐ PARQUET-349 - VersionParser does not handle versions like "parquet-mr 1.6.0rc4"
- ๐ PARQUET-352 - Add tags to "created by" metadata in the file footer
- PARQUET-353 - Compressors not getting recycled while writing parquet files, causing memory leak
- PARQUET-360 - parquet-cat json dump is broken for maps
- PARQUET-363 - Cannot construct empty MessageType for ReadContext.requestedSchema
- PARQUET-367 - "parquet-cat -j" doesn't show all records
- PARQUET-372 - Parquet stats can have awkwardly large values
- โ PARQUET-373 - MemoryManager tests are flaky
- PARQUET-379 - PrimitiveType.union erases original type
- ๐ PARQUET-380 - Cascading and scrooge builds fail when using thrift 0.9.0
- PARQUET-385 - PrimitiveType.union accepts fixed_len_byte_array fields with different lengths when strict mode is on
- PARQUET-387 - TwoLevelListWriter does not handle null values in array
- PARQUET-389 - Filter predicates should work with missing columns
- ๐ฒ PARQUET-395 - System.out is used as logger in org.apache.parquet.Log
- ๐ PARQUET-396 - The builder for AvroParquetReader loses the record type
- PARQUET-400 - Error reading some files after PARQUET-77 bytebuffer read path
- PARQUET-409 - InternalParquetRecordWriter doesn't use min/max row counts
- PARQUET-410 - Fix subprocess hang in merge_parquet_pr.py
- โ PARQUET-413 - Test failures for Java 8
- PARQUET-415 - ByteBufferBackedBinary serialization is broken
- ๐ PARQUET-422 - Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter
- PARQUET-425 - Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
- ๐ป PARQUET-426 - Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
- PARQUET-430 - Change to use Locale parameterized version of String.toUpperCase()/toLowerCase
- PARQUET-431 - Make ParquetOutputFormat.memoryManager volatile
- PARQUET-495 - Fix mismatches in Types class comments
- PARQUET-509 - Incorrect number of args passed to string.format calls
- PARQUET-511 - Integer overflow on counting values in column
- PARQUET-528 - Fix flush() for RecordConsumer and implementations
- ๐ท PARQUET-529 - Avoid evoking job.toString() in ParquetLoader
- ๐ PARQUET-540 - Cascading3 module doesn't build when using thrift 0.9.0
- PARQUET-544 - ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract
- ๐ PARQUET-560 - Incorrect synchronization in SnappyCompressor
- ๐ PARQUET-569 - ParquetMetadataConverter offset filter is broken
- PARQUET-571 - Fix potential leak in ParquetFileReader.close()
- PARQUET-580 - Potentially unnecessary creation of large int[] in IntList for columns that aren't used
- PARQUET-581 - Min/max row count for page size check are conflated in some places
- PARQUET-584 - show proper command usage when there's no arguments
- โ PARQUET-612 - Add compression to FileEncodingIT tests
- PARQUET-623 - DeltaByteArrayReader has incorrect skip behaviour
- ๐ PARQUET-642 - Improve performance of ByteBuffer based read / write paths
- PARQUET-645 - DictionaryFilter incorrectly handles null
- PARQUET-651 - Parquet-avro fails to decode array of record with a single field name "element" correctly
- PARQUET-660 - Writing Protobuf messages with extensions results in an error or data corruption.
- PARQUET-663 - Link are Broken in README.md
- PARQUET-674 - Add an abstraction to get the length of a stream
- ๐ PARQUET-685 - Deprecated ParquetInputSplit constructor passes parameters in the wrong order.
- โ PARQUET-726 - TestMemoryManager consistently fails
- PARQUET-743 - DictionaryFilters can re-use StreamBytesInput when compressed
๐ Improvement
- PARQUET-77 - Improvements in ByteBuffer read path
- PARQUET-99 - Large rows cause unnecessary OOM exceptions
- PARQUET-146 - make Parquet compile with java 7 instead of java 6
- ๐ PARQUET-318 - Remove unnecessary objectmapper from ParquetMetadata
- PARQUET-327 - Show statistics in the dump output
- ๐ PARQUET-341 - Improve write performance with wide schema sparse data
- ๐ PARQUET-343 - Caching nulls on group node to improve write performance on wide schema sparse data
- ๐ PARQUET-358 - Add support for temporal logical types to AVRO/Parquet conversion
- ๐ PARQUET-361 - Add prerelease logic to semantic versions
- PARQUET-384 - Add Dictionary Based Filtering to Filter2 API
- ๐ PARQUET-386 - Printing out the statistics of metadata in parquet-tools
- PARQUET-397 - Pig Predicate Pushdown using Filter2 API
- PARQUET-421 - Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop
- PARQUET-427 - Push predicates into the whole read path
- PARQUET-432 - Complete a todo for method ColumnDescriptor.compareTo()
- PARQUET-460 - Parquet files concat tool
- โก๏ธ PARQUET-480 - Update for Cascading 3.0
- PARQUET-484 - Warn when Decimal is stored as INT64 while could be stored as INT32
- ๐ PARQUET-543 - Remove BoundedInt encodings
- PARQUET-585 - Slowly ramp up sizes of int[]s in IntList to keep sizes small when data sets are small
- PARQUET-654 - Make record-level filtering optional
- PARQUET-668 - Provide option to disable auto crop feature in DumpCommand output
- PARQUET-727 - Ensure correct version of thrift is used
- PARQUET-740 - Introduce editorconfig
๐ New Feature
- ๐ PARQUET-225 - INT64 support for Delta Encoding
- PARQUET-382 - Add a way to append encoded blocks in ParquetFileWriter
- PARQUET-429 - Enables predicates collecting their referred columns
- ๐ PARQUET-548 - Add Java metadata for PageEncodingStats
- ๐ PARQUET-669 - Allow reading file footers from input streams when writing metadata files
Task
- ๐ PARQUET-392 - Release Parquet-mr 1.9.0
- PARQUET-404 - Replace [email protected] for HTTPS URL on dev/README.md to avoid permission issues
- ๐ PARQUET-696 - Move travis download from google code (defunct) to github
โ Test
- โ PARQUET-355 - Create Integration tests to validate statistics
- โ PARQUET-378 - Add thoroughly parquet test encodings