Apache Parquet v1.9.0 Release Notes

Release Date: 2016-10-19 // over 7 years ago
  • ๐Ÿ› Bug

    • PARQUET-182 - FilteredRecordReader skips rows it shouldn't for schema with optional columns
    • PARQUET-212 - Implement nested type read rules in parquet-thrift
    • PARQUET-241 - ParquetInputFormat.getFooters() should return in the same order as what listStatus() returns
    • ๐Ÿ“ฆ PARQUET-305 - Logger instantiated for package org.apache.parquet may be GC-ed
    • PARQUET-335 - Avro object model should not require MAP_KEY_VALUE
    • PARQUET-340 - totalMemoryPool is truncated to 32 bits
    • PARQUET-346 - ThriftSchemaConverter throws for unknown struct or union type
    • ๐Ÿ“œ PARQUET-349 - VersionParser does not handle versions like "parquet-mr 1.6.0rc4"
    • ๐Ÿ“‡ PARQUET-352 - Add tags to "created by" metadata in the file footer
    • PARQUET-353 - Compressors not getting recycled while writing parquet files, causing memory leak
    • PARQUET-360 - parquet-cat json dump is broken for maps
    • PARQUET-363 - Cannot construct empty MessageType for ReadContext.requestedSchema
    • PARQUET-367 - "parquet-cat -j" doesn't show all records
    • PARQUET-372 - Parquet stats can have awkwardly large values
    • โœ… PARQUET-373 - MemoryManager tests are flaky
    • PARQUET-379 - PrimitiveType.union erases original type
    • ๐Ÿ— PARQUET-380 - Cascading and scrooge builds fail when using thrift 0.9.0
    • PARQUET-385 - PrimitiveType.union accepts fixed_len_byte_array fields with different lengths when strict mode is on
    • PARQUET-387 - TwoLevelListWriter does not handle null values in array
    • PARQUET-389 - Filter predicates should work with missing columns
    • ๐ŸŒฒ PARQUET-395 - System.out is used as logger in org.apache.parquet.Log
    • ๐Ÿ— PARQUET-396 - The builder for AvroParquetReader loses the record type
    • PARQUET-400 - Error reading some files after PARQUET-77 bytebuffer read path
    • PARQUET-409 - InternalParquetRecordWriter doesn't use min/max row counts
    • PARQUET-410 - Fix subprocess hang in merge_parquet_pr.py
    • โœ… PARQUET-413 - Test failures for Java 8
    • PARQUET-415 - ByteBufferBackedBinary serialization is broken
    • ๐Ÿ“œ PARQUET-422 - Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter
    • PARQUET-425 - Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
    • ๐Ÿ‘ป PARQUET-426 - Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
    • PARQUET-430 - Change to use Locale parameterized version of String.toUpperCase()/toLowerCase
    • PARQUET-431 - Make ParquetOutputFormat.memoryManager volatile
    • PARQUET-495 - Fix mismatches in Types class comments
    • PARQUET-509 - Incorrect number of args passed to string.format calls
    • PARQUET-511 - Integer overflow on counting values in column
    • PARQUET-528 - Fix flush() for RecordConsumer and implementations
    • ๐Ÿ‘ท PARQUET-529 - Avoid evoking job.toString() in ParquetLoader
    • ๐Ÿ— PARQUET-540 - Cascading3 module doesn't build when using thrift 0.9.0
    • PARQUET-544 - ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract
    • ๐Ÿ”€ PARQUET-560 - Incorrect synchronization in SnappyCompressor
    • ๐Ÿ“‡ PARQUET-569 - ParquetMetadataConverter offset filter is broken
    • PARQUET-571 - Fix potential leak in ParquetFileReader.close()
    • PARQUET-580 - Potentially unnecessary creation of large int[] in IntList for columns that aren't used
    • PARQUET-581 - Min/max row count for page size check are conflated in some places
    • PARQUET-584 - show proper command usage when there's no arguments
    • โœ… PARQUET-612 - Add compression to FileEncodingIT tests
    • PARQUET-623 - DeltaByteArrayReader has incorrect skip behaviour
    • ๐ŸŽ PARQUET-642 - Improve performance of ByteBuffer based read / write paths
    • PARQUET-645 - DictionaryFilter incorrectly handles null
    • PARQUET-651 - Parquet-avro fails to decode array of record with a single field name "element" correctly
    • PARQUET-660 - Writing Protobuf messages with extensions results in an error or data corruption.
    • PARQUET-663 - Link are Broken in README.md
    • PARQUET-674 - Add an abstraction to get the length of a stream
    • ๐Ÿ—„ PARQUET-685 - Deprecated ParquetInputSplit constructor passes parameters in the wrong order.
    • โœ… PARQUET-726 - TestMemoryManager consistently fails
    • PARQUET-743 - DictionaryFilters can re-use StreamBytesInput when compressed

    ๐Ÿ‘Œ Improvement

    • PARQUET-77 - Improvements in ByteBuffer read path
    • PARQUET-99 - Large rows cause unnecessary OOM exceptions
    • PARQUET-146 - make Parquet compile with java 7 instead of java 6
    • ๐Ÿ“‡ PARQUET-318 - Remove unnecessary objectmapper from ParquetMetadata
    • PARQUET-327 - Show statistics in the dump output
    • ๐ŸŽ PARQUET-341 - Improve write performance with wide schema sparse data
    • ๐ŸŽ PARQUET-343 - Caching nulls on group node to improve write performance on wide schema sparse data
    • ๐Ÿ‘ PARQUET-358 - Add support for temporal logical types to AVRO/Parquet conversion
    • ๐Ÿš€ PARQUET-361 - Add prerelease logic to semantic versions
    • PARQUET-384 - Add Dictionary Based Filtering to Filter2 API
    • ๐Ÿ“‡ PARQUET-386 - Printing out the statistics of metadata in parquet-tools
    • PARQUET-397 - Pig Predicate Pushdown using Filter2 API
    • PARQUET-421 - Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop
    • PARQUET-427 - Push predicates into the whole read path
    • PARQUET-432 - Complete a todo for method ColumnDescriptor.compareTo()
    • PARQUET-460 - Parquet files concat tool
    • โšก๏ธ PARQUET-480 - Update for Cascading 3.0
    • PARQUET-484 - Warn when Decimal is stored as INT64 while could be stored as INT32
    • ๐Ÿšš PARQUET-543 - Remove BoundedInt encodings
    • PARQUET-585 - Slowly ramp up sizes of int[]s in IntList to keep sizes small when data sets are small
    • PARQUET-654 - Make record-level filtering optional
    • PARQUET-668 - Provide option to disable auto crop feature in DumpCommand output
    • PARQUET-727 - Ensure correct version of thrift is used
    • PARQUET-740 - Introduce editorconfig

    ๐Ÿ†• New Feature

    • ๐Ÿ‘ PARQUET-225 - INT64 support for Delta Encoding
    • PARQUET-382 - Add a way to append encoded blocks in ParquetFileWriter
    • PARQUET-429 - Enables predicates collecting their referred columns
    • ๐Ÿ“‡ PARQUET-548 - Add Java metadata for PageEncodingStats
    • ๐Ÿ“‡ PARQUET-669 - Allow reading file footers from input streams when writing metadata files

    Task

    โœ… Test

    • โœ… PARQUET-355 - Create Integration tests to validate statistics
    • โœ… PARQUET-378 - Add thoroughly parquet test encodings