All Versions
37
Latest Version
Avg Release Cycle
234 days
Latest Release
-

Changelog History
Page 1

  • v1.12.0 Changes

    ๐Ÿš€ Release Notes - Parquet - Version 1.12.0

    Sub-task

    ๐Ÿ› Bug

    • PARQUET-1438 - [C++] corrupted files produced on 32-bit architecture (i686)
    • ๐Ÿ”Œ PARQUET-1493 - maven protobuf plugin not work properly
    • PARQUET-1455 - [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
    • โฌ†๏ธ PARQUET-1554 - Compilation error when upgrading Scrooge version
    • PARQUET-1599 - Fix to-avro to respect the overwrite option
    • 0๏ธโƒฃ PARQUET-1684 - [parquet-protobuf] default protobuf field values are stored as nulls
    • PARQUET-1699 - Could not resolve org.apache.yetus:audience-annotations:0.11.0
    • ๐Ÿ— PARQUET-1741 - APIs backward compatibility issues cause master branch build failure
    • PARQUET-1765 - Invalid filteredRowCount in InternalParquetRecordReader
    • โœ… PARQUET-1794 - Random data generation may cause flaky tests
    • PARQUET-1803 - Could not find FilleInputSplit in ParquetInputSplit
    • ๐ŸŽ PARQUET-1808 - SimpleGroup.toString() uses String += and so has poor performance
    • PARQUET-1818 - Fix collision of encryption and bloom filters in format-structure Util
    • ๐Ÿ“‡ PARQUET-1850 - toParquetMetadata method in ParquetMetadataConverter does not set dictionary page offset bit
    • ๐Ÿ“‡ PARQUET-1851 - ParquetMetadataConveter throws NPE in an Iceberg unit test
    • PARQUET-1868 - Parquet reader options toggle for bloom filter toggles dictionary filtering
    • PARQUET-1879 - Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field
    • ๐Ÿ‘€ PARQUET-1893 - H2SeekableInputStream readFully() doesn't respect start and len
    • PARQUET-1894 - Please fix the related Shaded Jackson Databind CVEs
    • ๐Ÿ— PARQUET-1896 - [Maven] parquet-tools build is broken
    • PARQUET-1910 - Parquet-cli is broken after TransCompressionCommand was added
    • 0๏ธโƒฃ PARQUET-1917 - [parquet-proto] default values are stored in oneOf fields that aren't set
    • PARQUET-1920 - Fix issue with reading parquet files with too large column chunks
    • โœ… PARQUET-1923 - parquet-tools 1.11.0: TestSimpleRecordConverter fails with ExceptionInInitializerError on openjdk 15
    • ๐Ÿ›  PARQUET-1928 - Interpret Parquet INT96 type as FIXED[12] AVRO Schema
    • PARQUET-1944 - Unable to download transitive dependency hadoop-lzo
    • ๐Ÿ—„ PARQUET-1947 - DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data
    • ๐Ÿ‘ PARQUET-1949 - Mark Parquet-1872 with not support bloom filter yet
    • PARQUET-1954 - TCP connection leak in parquet dump
    • ๐Ÿ—„ PARQUET-1963 - DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty
    • ๐Ÿ— PARQUET-1966 - Fix build with JDK11 for JDK8
    • ๐Ÿš€ PARQUET-1970 - Make minor releases source compatible
    • โœ… PARQUET-1971 - Flaky test in github action
    • โœ… PARQUET-1975 - Test failure on ARM64 CPU architecture
    • PARQUET-1977 - Invalid data_page_offset
    • PARQUET-1979 - Optional bloom_filter_offset is filled if no bloom filter is present
    • ๐Ÿ PARQUET-1984 - Some tests fail on windows
    • ๐Ÿ— PARQUET-1992 - Cannot build from tarball because of git submodules
    • PARQUET-1999 - NPE might occur if OutputFile is implemented by the client

    ๐Ÿ†• New Feature

    ๐Ÿ‘Œ Improvement

    • PARQUET-313 - Implement 3 level list writing rule for Parquet-Thrift
    • ๐Ÿ‘ PARQUET-1528 - Add JSON support to `parquet-tools head`
    • PARQUET-1593 - Replace the example usage in parquet-cli's help message with an actually existent subcommand
    • PARQUET-1660 - [java] Align Bloom filter implementation with format
    • ๐Ÿšš PARQUET-1666 - Remove Unused Modules
    • ๐Ÿšš PARQUET-1696 - Remove unused hadoop-1 profile
    • PARQUET-1710 - Use Objects.requireNonNull
    • PARQUET-1723 - Read From Maps Without Using Contains
    • PARQUET-1724 - Use ConcurrentHashMap for Cache in DictionaryPageReader
    • PARQUET-1725 - Replace Usage of Strings.join with JDK Functionality in ColumnPath Class
    • ๐Ÿ‘ป PARQUET-1726 - Use Java 8 Multi Exception Handling
    • PARQUET-1727 - Do Not Swallow InterruptedException in ParquetLoader
    • ๐Ÿ‘ PARQUET-1728 - Simplify NullPointerException Handling in AvroWriteSupport
    • PARQUET-1729 - Avoid AutoBoxing in EncodingStats
    • PARQUET-1730 - Use switch Statement in AvroIndexedRecordConverter for Enums
    • PARQUET-1731 - Use JDK 8 Facilities to Simplify FilteringRecordMaterializer
    • PARQUET-1732 - Call toArray With Empty Array
    • PARQUET-1735 - Clean Up parquet-columns Module
    • PARQUET-1736 - Use StringBuilder instead of StringBuffer
    • โœ… PARQUET-1737 - Replace Test Class RandomStr with Apache Commons Lang
    • ๐Ÿšš PARQUET-1738 - Remove unused imports in parquet-column
    • PARQUET-1743 - Add equals to BlockSplitBloomFilter
    • PARQUET-1749 - Use Java 8 Streams for Empty PrimitiveIterator
    • PARQUET-1750 - Reduce Memory Usage of RowRanges Class
    • ๐Ÿ— PARQUET-1751 - Fix Protobuf Build Warning
    • ๐Ÿšš PARQUET-1756 - Remove Dependency on Maven Plugin semantic-versioning
    • PARQUET-1759 - InternalParquetRecordReader Use Singleton Set
    • โœ… PARQUET-1763 - Add SLF4J to TestCircularReferences
    • PARQUET-1764 - The ParquetProperties constructor parameter list is so long
    • ๐Ÿ— PARQUET-1775 - Deprecate AvroParquetWriter Builder Hadoop Path
    • PARQUET-1778 - Do Not Consider Class for Avro Generic Record Reader
    • PARQUET-1782 - Use Switch Statement in AvroRecordConverter
    • PARQUET-1790 - ParquetFileWriter missing Api for DataPageV2
    • PARQUET-1791 - Add 'prune' command to parquet-tools
    • ๐Ÿ‘ PARQUET-1801 - Add column index support for 'prune' command in Parquet-tools/cli
    • PARQUET-1802 - CompressionCodec class not found if the codec class is not in the same defining classloader as the CodecFactory class
    • ๐Ÿ”จ PARQUET-1805 - Refactor the configuration for bloom filters
    • PARQUET-1821 - Add 'column-size' command to parquet-cli and parquet-tools
    • ๐Ÿ”ง PARQUET-1826 - Document hadoop configuration options
    • ๐Ÿ‘ PARQUET-1827 - UUID type currently not supported by parquet-mr
    • PARQUET-1853 - Minimize the parquet-avro fastutil shaded jar
    • ๐Ÿšš PARQUET-1863 - Remove use of add-test-source mojo in parquet-protobuf
    • PARQUET-1866 - Replace Hadoop ZSTD with JNI-ZSTD
    • โฌ†๏ธ PARQUET-1890 - Upgrade to Avro 1.10.0
    • ๐Ÿ›  PARQUET-1891 - Encryption-related light fixes
    • ๐Ÿ‘ PARQUET-1914 - Allow ProtoParquetReader To Support InputFile
    • PARQUET-1924 - Do not Instantiate a New LongHashFunction
    • ๐Ÿ‘ PARQUET-1926 - Add LogicalType support to ThriftType.I64Type
    • PARQUET-1929 - Bump Snappy to 1.1.8
    • PARQUET-1930 - Bump Apache Thrift to 0.13.0
    • PARQUET-1931 - Bump Junit 4.13.1
    • PARQUET-1932 - Bump Fastutil to 8.4.2
    • PARQUET-1938 - Option to get KMS details from key material (in key rotation)
    • PARQUET-1939 - Fix RemoteKmsClient API ambiguity
    • ๐Ÿ”ง PARQUET-1940 - Make KeyEncryptionKey length configurable
    • PARQUET-1941 - Bump Commons CLI from 1.3.1 to 1.4
    • ๐Ÿ”€ PARQUET-1951 - Allow different strategies to combine key values when merging parquet files
    • โฌ†๏ธ PARQUET-1952 - Upgrade Avro to 1.10.1
    • PARQUET-1961 - Bump Jackson to 2.11.4
    • PARQUET-1964 - Properly handle missing/null filter
    • โฌ†๏ธ PARQUET-1967 - Upgrade Zstd-jni to 1.4.8-3
    • โœ… PARQUET-1969 - Test by GithubAction
    • ๐Ÿ‘ PARQUET-1973 - Support ZSTD JNI BufferPool
    • โฌ†๏ธ PARQUET-1988 - Upgrade to ZSTD 1.4.8-6
    • โฌ†๏ธ PARQUET-1994 - Upgrade ZSTD JNI to 1.4.9-1

    โœ… Test

    • PARQUET-1832 - Travis fails with too long output
    • ๐Ÿ— PARQUET-1980 - Build and test Apache Parquet on ARM64 CPU architecture

    Wish

    • PARQUET-1717 - parquet-thrift converts Thrift i16 to parquet INT32 instead of INT_16

    Task

  • v1.11.0 Changes

    December 06, 2019

    ๐Ÿš€ Release Notes - Parquet - Version 1.11.0

    ๐Ÿ› Bug

    • ๐Ÿ”€ PARQUET-138 - Parquet should allow a merge between required and optional schemas
    • PARQUET-952 - Avro union with single type fails with 'is not a group'
    • โฌ†๏ธ PARQUET-1128 - [Java] Upgrade the Apache Arrow version to 0.8.0 for SchemaConverter
    • PARQUET-1281 - Jackson dependency
    • PARQUET-1285 - [Java] SchemaConverter should not convert from TimeUnit.SECOND AND TimeUnit.NANOSECOND of Arrow
    • ๐Ÿ— PARQUET-1293 - Build failure when using Java 8 lambda expressions
    • ๐Ÿ— PARQUET-1296 - Travis kills build after 10 minutes, because "no output was received"
    • PARQUET-1297 - [Java] SchemaConverter should not convert from Timestamp(TimeUnit.SECOND) and Timestamp(TimeUnit.NANOSECOND) of Arrow
    • PARQUET-1303 - Avro reflect @Stringable field write error if field not instanceof CharSequence
    • ๐Ÿš€ PARQUET-1304 - Release 1.10 contains breaking changes for Hive
    • PARQUET-1305 - Backward incompatible change introduced in 1.8
    • PARQUET-1309 - Parquet Java uses incorrect stats and dictionary filter properties
    • โšก๏ธ PARQUET-1311 - Update README.md
    • ๐Ÿ“‡ PARQUET-1317 - ParquetMetadataConverter throw NPE
    • PARQUET-1341 - Null count is suppressed when columns have no min or max and use unsigned sort order
    • ๐Ÿ— PARQUET-1344 - Type builders don't honor new logical types
    • PARQUET-1368 - ParquetFileReader should close its input stream for the failure in constructor
    • PARQUET-1371 - Time/Timestamp UTC normalization parameter doesn't work
    • PARQUET-1407 - Data loss on duplicate values with AvroParquetWriter/Reader
    • PARQUET-1417 - BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with the different length
    • ๐Ÿ”Š PARQUET-1421 - InternalParquetRecordWriter logs debug messages at the INFO level
    • PARQUET-1440 - Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale
    • ๐Ÿ‘€ PARQUET-1441 - SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
    • PARQUET-1456 - Use page index, ParquetFileReader throw ArrayIndexOutOfBoundsException
    • PARQUET-1460 - Fix javadoc errors and include javadoc checking in Travis checks
    • โšก๏ธ PARQUET-1461 - Third party code does not compile after parquet-mr minor version update
    • PARQUET-1470 - Inputstream leakage in ParquetFileWriter.appendFile
    • PARQUET-1472 - Dictionary filter fails on FIXED_LEN_BYTE_ARRAY
    • PARQUET-1475 - DirectCodecFactory's ParquetCompressionCodecException drops a passed in cause in one constructor
    • PARQUET-1478 - Can't read spec compliant, 3-level lists via parquet-proto
    • ๐Ÿ—„ PARQUET-1480 - INT96 to avro not yet implemented error should mention deprecation
    • PARQUET-1485 - Snappy Decompressor/Compressor may cause direct memory leak
    • PARQUET-1488 - UserDefinedPredicate throw NPE
    • โšก๏ธ PARQUET-1496 - [Java] Update Scala for JDK 11 compatibility
    • PARQUET-1497 - [Java]ย javax annotations dependency missing for Java 11
    • PARQUET-1498 - [Java] Add instructions to install thrift via homebrew
    • PARQUET-1510 - Dictionary filter skips null values when evaluating not-equals.
    • PARQUET-1514 - ParquetFileWriter Records Compressed Bytes instead of Uncompressed Bytes
    • PARQUET-1527 - [parquet-tools] cat command throw java.lang.ClassCastException
    • PARQUET-1529 - Shade fastutil in all modules where used
    • PARQUET-1531 - Page row count limit causes empty pages to be written from MessageColumnIO
    • โœ… PARQUET-1533 - TestSnappy() throws OOM exception with Parquet-1485 change
    • ๐Ÿ PARQUET-1534 - [parquet-cli] Argument error: Illegal character in opaque part at index 2 on Windows
    • PARQUET-1544 - Possible over-shading of modules
    • PARQUET-1550 - CleanUtil does not work in Java 11
    • PARQUET-1555 - Bump snappy-java to 1.1.7.3
    • PARQUET-1596 - PARQUET-1375 broke parquet-cli's to-avro command
    • PARQUET-1600 - Fix shebang in parquet-benchmarks/run.sh
    • PARQUET-1615 - getRecordWriter shouldn't hardcode CREAT mode when new ParquetFileWriter
    • ๐Ÿ— PARQUET-1637 - Builds are failing because default jdk changed to openjdk11 on Travis
    • ๐Ÿ“„ PARQUET-1644 - Clean up some benchmark code and docs.
    • ๐Ÿ— PARQUET-1691 - Build fails due to missing hadoop-lzo

    ๐Ÿ†• New Feature

    ๐Ÿ‘Œ Improvement

    • โฌ†๏ธ PARQUET-1135 - upgrade thrift and protobuf dependencies
    • ๐Ÿ”Œ PARQUET-1280 - [parquet-protobuf] Use maven protoc plugin
    • PARQUET-1321 - LogicalTypeAnnotation.LogicalTypeAnnotationVisitor#visit methods should have a return value
    • PARQUET-1335 - Logical type names in parquet-mr are not consistent with parquet-format
    • PARQUET-1336 - PrimitiveComparator should implements Serializable
    • PARQUET-1365 - Don't write page level statistics
    • โฌ†๏ธ PARQUET-1375 - Upgrade to supported version of Jackson
    • PARQUET-1383 - Parquet tools should indicate UTC parameter for time/timestamp types
    • โฌ†๏ธ PARQUET-1390 - [Java] Upgrade to Arrow 0.10.0
    • ๐Ÿšš PARQUET-1399 - Move parquet-mr related code from parquet-format
    • ๐Ÿ”จ PARQUET-1410 - Refactor modules to use the new logical type API
    • PARQUET-1414 - Limit page size based on maximum row count
    • โœ… PARQUET-1418 - Run integration tests in Travis
    • PARQUET-1435 - Benchmark filtering column-indexes
    • PARQUET-1444 - Prefer ArrayList over LinkedList
    • ๐Ÿšš PARQUET-1445 - Remove Files.java
    • ๐Ÿš€ PARQUET-1462 - Allow specifying new development version in prepare-release.sh
    • โฌ†๏ธ PARQUET-1466 - Upgrade to the latest guava 27.0-jre
    • ๐ŸŒฒ PARQUET-1474 - Less verbose and lower level logging for missing column/offset indexes
    • โš  PARQUET-1476 - Don't emit a warning message for files without new logical type
    • PARQUET-1487 - Do not write original type for timezone-agnostic timestamps
    • ๐Ÿ“š PARQUET-1489 - Insufficient documentation for UserDefinedPredicate.keep(T)
    • PARQUET-1490 - Add branch-specific Travis steps
    • ๐Ÿ— PARQUET-1492 - Remove protobuf install in travis build
    • PARQUET-1499 - [parquet-mr] Add Java 11 to Travis
    • ๐Ÿšš PARQUET-1500 - Remove the Closables
    • PARQUET-1502 - Convert FIXED_LEN_BYTE_ARRAY to arrow type in logicalTypeAnnotation if it is not null
    • ๐Ÿšš PARQUET-1503 - Remove Ints Utility Class
    • PARQUET-1504 - Add an option to convert Parquet Int96 to Arrow Timestamp
    • PARQUET-1505 - Use Java 7 NIO StandardCharsets
    • ๐Ÿ”Œ PARQUET-1506 - Migrate from maven-thrift-plugin to thrift-maven-plugin
    • PARQUET-1507 - Bump Apache Thrift to 0.12.0
    • โšก๏ธ PARQUET-1509 - Update Docs for Hive Deprecation
    • PARQUET-1513 - HiddenFileFilter Streamline
    • PARQUET-1518 - Bump Jackson2 version of parquet-cli
    • ๐Ÿšš PARQUET-1530 - Remove Dependency on commons-codec
    • ๐Ÿ”€ PARQUET-1542 - Merge multiple I/O to one time I/O when read footer
    • ๐Ÿ—„ PARQUET-1557 - Replace deprecated Apache Avro methods
    • โœ… PARQUET-1558 - Use try-with-resource in Apache Avro tests
    • โฌ†๏ธ PARQUET-1576 - Upgrade to Avro 1.9.0
    • ๐Ÿšš PARQUET-1577 - Remove duplicate license
    • PARQUET-1578 - Introduce Lambdas
    • PARQUET-1579 - Add Github PR template
    • PARQUET-1580 - Page-level CRC checksum verification for DataPageV1
    • ๐Ÿ‘ PARQUET-1601 - Add zstd support to parquet-cli to-avro
    • PARQUET-1604 - Bump fastutil from 7.0.13 to 8.2.3
    • ๐Ÿ”Œ PARQUET-1605 - Bump maven-javadoc-plugin from 2.9 to 3.1.0
    • โœ… PARQUET-1606 - Fix invalid tests scope
    • ๐Ÿšš PARQUET-1607 - Remove duplicate maven-enforcer-plugin
    • PARQUET-1616 - Enable Maven batch mode
    • โœ… PARQUET-1650 - Implement unit test to validate column/offset indexes
    • ๐Ÿ— PARQUET-1654 - Remove unnecessary options when building thrift
    • โฌ†๏ธ PARQUET-1661 - Upgrade to Avro 1.9.1
    • โฌ†๏ธ PARQUET-1662 - Upgrade Jackson to version 2.9.10
    • โฌ†๏ธ PARQUET-1665 - Upgrade zstd-jni to 1.4.0-1
    • ๐Ÿ— PARQUET-1669 - Disable compiling all libraries when building thrift
    • โฌ†๏ธ PARQUET-1671 - Upgrade Yetus to 0.11.0
    • PARQUET-1682 - Maintain forward compatibility for TIME/TIMESTAMP
    • ๐Ÿšš PARQUET-1683 - Remove unnecessary string converting in readFooter method
    • PARQUET-1685 - Truncate the stored min and max for String statistics to reduce the footer size

    โœ… Test

    • โœ… PARQUET-1536 - [parquet-cli] Add simple tests for each command

    Wish

    • โฌ†๏ธ PARQUET-1552 - upgrade protoc-jar-maven-plugin to 3.8.0
    • โฌ†๏ธ PARQUET-1673 - Upgrade parquet-mr format version to 2.7.0

    Task

    • ๐Ÿ‘ PARQUET-968 - Add Hive/Presto support in ProtoParquet
    • ๐Ÿš€ PARQUET-1294 - Update release scripts for the new Apache policy
    • ๐Ÿš€ PARQUET-1434 - Release parquet-mr 1.11.0
    • PARQUET-1436 - TimestampMicrosStringifier shows wrong microseconds for timestamps before 1970
    • ๐Ÿ—„ PARQUET-1452 - Deprecate old logical types API
    • ๐Ÿ‘ PARQUET-1551 - Support Java 11 - top-level JIRA
    • PARQUET-1570 - Publish 1.11.0 to maven central
    • โšก๏ธ PARQUET-1585 - Update old external links in the code base
    • PARQUET-1645 - Bump Apache Avro to 1.9.1
    • PARQUET-1649 - Bump Jackson Databind to 2.9.9.3
    • ๐Ÿš€ PARQUET-1687 - Update release process
  • v1.11.0-rc7

    November 13, 2019
  • v1.11.0-rc6

    March 19, 2019
  • v1.10.1 Changes

    January 28, 2019

    ๐Ÿš€ Release Notes - Parquet - Version 1.10.1

    ๐Ÿ› Bug

    • PARQUET-1510 - Dictionary filter skips null values when evaluating not-equals.
    • PARQUET-1309 - Parquet Java uses incorrect stats and dictionary filter properties
  • v1.10.0 Changes

    April 05, 2018

    ๐Ÿš€ Release Notes - Parquet - Version 1.10.0

    ๐Ÿ› Bug

    • PARQUET-196 - parquet-tools command to get rowcount & size
    • PARQUET-357 - Parquet-thrift generates wrong schema for Thrift binary fields
    • โฌ†๏ธ PARQUET-765 - Upgrade Avro to 1.8.1
    • ๐Ÿ‘€ PARQUET-783 - H2SeekableInputStream does not close its underlying FSDataInputStream, leading to connection leaks
    • PARQUET-786 - parquet-tools README incorrectly has 'java jar' instead of 'java -jar'
    • PARQUET-791 - Predicate pushing down on missing columns should work on UserDefinedPredicate too
    • ๐Ÿ“œ PARQUET-1005 - Fix DumpCommand parsing to allow column projection
    • PARQUET-1028 - [JAVA] When reading old Spark-generated files with INT96, stats are reported as valid when they aren't
    • ๐Ÿ—„ PARQUET-1065 - Deprecate type-defined sort ordering for INT96 type
    • PARQUET-1077 - [MR] Switch to long key ids in KEYs file
    • ๐Ÿ“‡ PARQUET-1141 - IDs are dropped in metadata conversion
    • PARQUET-1152 - Parquet-thrift doesn't compile with Thrift 0.9.3
    • PARQUET-1153 - Parquet-thrift doesn't compile with Thrift 0.10.0
    • PARQUET-1156 - dev/merge_parquet_pr.py problems
    • โœ… PARQUET-1185 - TestBinary#testBinary unit test fails after PARQUET-1141
    • PARQUET-1191 - Type.hashCode() takes originalType into account but Type.equals() does not
    • โœ… PARQUET-1208 - Occasional endless loop in unit test
    • PARQUET-1217 - Incorrect handling of missing values in Statistics
    • PARQUET-1246 - Ignore float/double statistics in case of NaN
    • โšก๏ธ PARQUET-1258 - Update scm developer connection to github

    ๐Ÿ†• New Feature

    • ๐Ÿ‘ PARQUET-1025 - Support new min-max statistics in parquet-mr

    ๐Ÿ‘Œ Improvement

    • โš  PARQUET-220 - Unnecessary warning in ParquetRecordReader.initialize
    • 0๏ธโƒฃ PARQUET-321 - Set the HDFS padding default to 8MB
    • ๐Ÿ“‡ PARQUET-386 - Printing out the statistics of metadata in parquet-tools
    • PARQUET-423 - Make writing Avro to Parquet less noisy
    • PARQUET-755 - create parquet-arrow module with schema converter
    • PARQUET-777 - Add new Parquet CLI tools
    • PARQUET-787 - Add a size limit for heap allocations when reading
    • PARQUET-801 - Allow UserDefinedPredicates in DictionaryFilter
    • PARQUET-852 - Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder
    • ๐Ÿ‘ PARQUET-884 - Add support for Decimal datatype to Parquet-Pig record reader
    • ๐Ÿ‘ PARQUET-969 - Decimal datatype support for parquet-tools output
    • ๐Ÿ“œ PARQUET-990 - More detailed error messages in footer parsing
    • PARQUET-1024 - allow for case insensitive parquet-xxx prefix in PR title
    • PARQUET-1026 - allow unsigned binary stats when min == max
    • ๐Ÿ”€ PARQUET-1115 - Warn users when misusing parquet-tools merge
    • โฌ†๏ธ PARQUET-1135 - upgrade thrift and protobuf dependencies
    • PARQUET-1142 - Avoid leaking Hadoop API to downstream libraries
    • โฌ†๏ธ PARQUET-1149 - Upgrade Avro dependancy to 1.8.2
    • ๐Ÿ”Š PARQUET-1170 - Logical-type-based toString for proper representeation in tools/logs
    • ๐Ÿ— PARQUET-1183 - AvroParquetWriter needs OutputFile based Builder
    • ๐ŸŒฒ PARQUET-1197 - Log rat failures
    • PARQUET-1198 - Bump java source and target to java8
    • PARQUET-1215 - Add accessor for footer after a file is closed
    • ๐Ÿ— PARQUET-1263 - ParquetReader's builder should use Configuration from the InputFile

    Task

  • v1.9.0 Changes

    October 19, 2016

    ๐Ÿ› Bug

    • PARQUET-182 - FilteredRecordReader skips rows it shouldn't for schema with optional columns
    • PARQUET-212 - Implement nested type read rules in parquet-thrift
    • PARQUET-241 - ParquetInputFormat.getFooters() should return in the same order as what listStatus() returns
    • ๐Ÿ“ฆ PARQUET-305 - Logger instantiated for package org.apache.parquet may be GC-ed
    • PARQUET-335 - Avro object model should not require MAP_KEY_VALUE
    • PARQUET-340 - totalMemoryPool is truncated to 32 bits
    • PARQUET-346 - ThriftSchemaConverter throws for unknown struct or union type
    • ๐Ÿ“œ PARQUET-349 - VersionParser does not handle versions like "parquet-mr 1.6.0rc4"
    • ๐Ÿ“‡ PARQUET-352 - Add tags to "created by" metadata in the file footer
    • PARQUET-353 - Compressors not getting recycled while writing parquet files, causing memory leak
    • PARQUET-360 - parquet-cat json dump is broken for maps
    • PARQUET-363 - Cannot construct empty MessageType for ReadContext.requestedSchema
    • PARQUET-367 - "parquet-cat -j" doesn't show all records
    • PARQUET-372 - Parquet stats can have awkwardly large values
    • โœ… PARQUET-373 - MemoryManager tests are flaky
    • PARQUET-379 - PrimitiveType.union erases original type
    • ๐Ÿ— PARQUET-380 - Cascading and scrooge builds fail when using thrift 0.9.0
    • PARQUET-385 - PrimitiveType.union accepts fixed_len_byte_array fields with different lengths when strict mode is on
    • PARQUET-387 - TwoLevelListWriter does not handle null values in array
    • PARQUET-389 - Filter predicates should work with missing columns
    • ๐ŸŒฒ PARQUET-395 - System.out is used as logger in org.apache.parquet.Log
    • ๐Ÿ— PARQUET-396 - The builder for AvroParquetReader loses the record type
    • PARQUET-400 - Error reading some files after PARQUET-77 bytebuffer read path
    • PARQUET-409 - InternalParquetRecordWriter doesn't use min/max row counts
    • PARQUET-410 - Fix subprocess hang in merge_parquet_pr.py
    • โœ… PARQUET-413 - Test failures for Java 8
    • PARQUET-415 - ByteBufferBackedBinary serialization is broken
    • ๐Ÿ“œ PARQUET-422 - Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter
    • PARQUET-425 - Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
    • ๐Ÿ‘ป PARQUET-426 - Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
    • PARQUET-430 - Change to use Locale parameterized version of String.toUpperCase()/toLowerCase
    • PARQUET-431 - Make ParquetOutputFormat.memoryManager volatile
    • PARQUET-495 - Fix mismatches in Types class comments
    • PARQUET-509 - Incorrect number of args passed to string.format calls
    • PARQUET-511 - Integer overflow on counting values in column
    • PARQUET-528 - Fix flush() for RecordConsumer and implementations
    • ๐Ÿ‘ท PARQUET-529 - Avoid evoking job.toString() in ParquetLoader
    • ๐Ÿ— PARQUET-540 - Cascading3 module doesn't build when using thrift 0.9.0
    • PARQUET-544 - ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract
    • ๐Ÿ”€ PARQUET-560 - Incorrect synchronization in SnappyCompressor
    • ๐Ÿ“‡ PARQUET-569 - ParquetMetadataConverter offset filter is broken
    • PARQUET-571 - Fix potential leak in ParquetFileReader.close()
    • PARQUET-580 - Potentially unnecessary creation of large int[] in IntList for columns that aren't used
    • PARQUET-581 - Min/max row count for page size check are conflated in some places
    • PARQUET-584 - show proper command usage when there's no arguments
    • โœ… PARQUET-612 - Add compression to FileEncodingIT tests
    • PARQUET-623 - DeltaByteArrayReader has incorrect skip behaviour
    • ๐ŸŽ PARQUET-642 - Improve performance of ByteBuffer based read / write paths
    • PARQUET-645 - DictionaryFilter incorrectly handles null
    • PARQUET-651 - Parquet-avro fails to decode array of record with a single field name "element" correctly
    • PARQUET-660 - Writing Protobuf messages with extensions results in an error or data corruption.
    • PARQUET-663 - Link are Broken in README.md
    • PARQUET-674 - Add an abstraction to get the length of a stream
    • ๐Ÿ—„ PARQUET-685 - Deprecated ParquetInputSplit constructor passes parameters in the wrong order.
    • โœ… PARQUET-726 - TestMemoryManager consistently fails
    • PARQUET-743 - DictionaryFilters can re-use StreamBytesInput when compressed

    ๐Ÿ‘Œ Improvement

    • PARQUET-77 - Improvements in ByteBuffer read path
    • PARQUET-99 - Large rows cause unnecessary OOM exceptions
    • PARQUET-146 - make Parquet compile with java 7 instead of java 6
    • ๐Ÿ“‡ PARQUET-318 - Remove unnecessary objectmapper from ParquetMetadata
    • PARQUET-327 - Show statistics in the dump output
    • ๐ŸŽ PARQUET-341 - Improve write performance with wide schema sparse data
    • ๐ŸŽ PARQUET-343 - Caching nulls on group node to improve write performance on wide schema sparse data
    • ๐Ÿ‘ PARQUET-358 - Add support for temporal logical types to AVRO/Parquet conversion
    • ๐Ÿš€ PARQUET-361 - Add prerelease logic to semantic versions
    • PARQUET-384 - Add Dictionary Based Filtering to Filter2 API
    • ๐Ÿ“‡ PARQUET-386 - Printing out the statistics of metadata in parquet-tools
    • PARQUET-397 - Pig Predicate Pushdown using Filter2 API
    • PARQUET-421 - Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop
    • PARQUET-427 - Push predicates into the whole read path
    • PARQUET-432 - Complete a todo for method ColumnDescriptor.compareTo()
    • PARQUET-460 - Parquet files concat tool
    • โšก๏ธ PARQUET-480 - Update for Cascading 3.0
    • PARQUET-484 - Warn when Decimal is stored as INT64 while could be stored as INT32
    • ๐Ÿšš PARQUET-543 - Remove BoundedInt encodings
    • PARQUET-585 - Slowly ramp up sizes of int[]s in IntList to keep sizes small when data sets are small
    • PARQUET-654 - Make record-level filtering optional
    • PARQUET-668 - Provide option to disable auto crop feature in DumpCommand output
    • PARQUET-727 - Ensure correct version of thrift is used
    • PARQUET-740 - Introduce editorconfig

    ๐Ÿ†• New Feature

    • ๐Ÿ‘ PARQUET-225 - INT64 support for Delta Encoding
    • PARQUET-382 - Add a way to append encoded blocks in ParquetFileWriter
    • PARQUET-429 - Enables predicates collecting their referred columns
    • ๐Ÿ“‡ PARQUET-548 - Add Java metadata for PageEncodingStats
    • ๐Ÿ“‡ PARQUET-669 - Allow reading file footers from input streams when writing metadata files

    Task

    โœ… Test

    • โœ… PARQUET-355 - Create Integration tests to validate statistics
    • โœ… PARQUET-378 - Add thoroughly parquet test encodings
  • v1.8.3

    May 03, 2018
  • v1.8.2

    January 19, 2017
  • v1.8.1 Changes

    July 17, 2015

    ๐Ÿ› Bug

    • ๐Ÿ”€ PARQUET-331 - Merge script doesn't surface stderr from failed sub processes
    • PARQUET-336 - ArrayIndexOutOfBounds in checkDeltaByteArrayProblem
    • PARQUET-337 - binary fields inside map/set/list are not handled in parquet-scrooge
    • PARQUET-338 - Readme references wrong format of pull request title

    ๐Ÿ‘Œ Improvement

    • PARQUET-279 - Check empty struct in the CompatibilityChecker util

    Task