Hazelcast Jet allows you to write modern Java code that focuses purely on data transformation while it does all the heavy lifting of getting the data flowing and computation running across a cluster of nodes. It supports working with both bounded (batch) and unbounded (streaming) data.
These are some of the concerns Jet handles well:
Scale Up and Out: Parallelize a computation across all CPU cores and cluster nodes Auto-Rescale: Scale out to newly added nodes and recover from nodes that left or failed Correctness Guarantee: at-least-once and exactly-once processing in the face of node failures Jet integrates out of the box with many popular data storage systems such as Apache Kafka, Hadoop, relational databases, message queues and many more.
Jet supports a rich set of data transformations, such as windowed aggregations. For example, if your data is GPS location reports from millions of users, Jet can compute every user's velocity vector by using a sliding window and just a few lines of code.
Jet also comes with a fully-featured, in-memory key-value store. Use it to cache results, store reference data or as a data source itself.
Hazelcast Jet alternatives and similar libraries
Based on the "Distributed Applications" category
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest. Visit our partner's website for more details.
Do you think we are missing an alternative of Hazelcast Jet or a related project?
Hazelcast Jet is an open-source, cloud-native, distributed stream and batch processing engine.
Jet is simple to set up. The nodes you start discover each other and form a cluster automatically. You can do the same locally, even on the same machine (your laptop, for example). This is great for quick testing.
With Jet it's easy to build fault-tolerant and elastic data processing pipelines. Jet keeps processing data without loss even if a node fails, and you can add more nodes that immediately start sharing the computation load.
You can embed Jet as a part of your application, it's just a single JAR without dependencies. You can also deploy it standalone, as a stream-processing cluster.
Jet also provides a highly available, distributed in-memory data store. You can cache your reference data and enrich the event stream with it, store the results of a computation, or even store the input data you're about to process with Jet.
Start using Jet
Add this to your
pom.xml to get the latest Jet as your project
<dependency> <groupId>com.hazelcast.jet</groupId> <artifactId>hazelcast-jet</artifactId> <version>4.0</version> </dependency>
Since Jet is embeddable, this is all you need to start your first Jet instance! Read on for a quick example of your first Jet program.
Batch Processing with Jet
Use this code to start an instance of Jet and tell it to perform some computation:
String path = "books"; JetInstance jet = Jet.bootstrappedInstance(); Pipeline p = Pipeline.create(); p.readFrom(Sources.files(path)) .flatMap(line -> Traversers.traverseArray(line.toLowerCase().split("\\W+"))) .filter(word -> !word.isEmpty()) .groupingKey(word -> word) .aggregate(AggregateOperations.counting()) .writeTo(Sinks.logger()); jet.newJob(p).join();
When you run this, point the
path variable to some directory with text
files in it. Jet will analyze all the files and give you the word
frequency distribution in the log output (for each word it will say how
many times it appears in the files).
The above was an example of processing data at rest (i.e., batch processing). It's conceptually simpler than stream processing so we used it as our first example.
Stream Processing with Jet
For stream processing you need a streaming data source. A simple example is watching a folder of text files for changes and processing each new appended line. Here's the code you can try out:
String path = "books"; JetInstance jet = Jet.bootstrappedInstance(); Pipeline p = Pipeline.create(); p.readFrom(Sources.fileWatcher(path)) .withIngestionTimestamps() .setLocalParallelism(1) .flatMap(line -> Traversers.traverseArray(line.toLowerCase().split("\\W+"))) .filter(word -> !word.isEmpty()) .groupingKey(word -> word) .window(WindowDefinition.tumbling(1000)) .aggregate(AggregateOperations.counting()) .writeTo(Sinks.logger()); jet.newJob(p).join();
Before running this make an empty directory and point the
variable to it. While the job is running copy some text files into it
and Jet will process them right away.
- Constant low latency - predictable latency is a design goal
- Zero dependencies - single JAR which is embeddable (minimum JDK 8)
- Cloud Native - with Docker images and Kubernetes support including Helm Charts.
- Elastic - Jet can scale jobs up and down while running
- Fault Tolerant - At-least-once and exactly-once processing guarantees
- In-memory storage - Jet provides robust distributed in-memory storage for caching, enrichment or storing job results
- Sources and sinks for Apache Kafka, Hadoop, Hazelcast IMDG, sockets, files
- Dynamic node discovery for both on-premise and cloud deployments.
You can download the distribution package which includes command-line tools from https://jet-start.sh.
Getting Started and Documentation
See the Hazelcast Jet Getting Started Guide.
See [examples folder](examples) for some examples.
See the following architecture pages for more insight into the internals of Jet:
You can see a full list of connectors at the (Sources and Sink)[https://jet-start.sh/docs/api/sources-sinks] section of the docs. A summary is below:
|Amazon S3||A connector that allows AWS S3 read/write support for Hazelcast Jet.|
|Apache Avro||Source and sink connector for Avro files.|
|Apache Hadoop||A connector that allows Apache Hadoop read/write support for Hazelcast Jet.|
|Apache Kafka||A connector that allows consuming/producing events from/to Apache Kafka.|
|Debezium||A Hazelcast Jet connector for Debezium which enables Hazelcast Jet pipelines to consume CDC events from various databases.|
|Elasticsearch||A Hazelcast Jet connector for Elasticsearch for querying/indexing objects from/to Elasticsearch.|
|Files||Connector for local filesystem.|
|Hazelcast Cache Journal||Connector for change events on caches in local and remote Hazelcast clusters.|
|Hazelcast Cache||Connector for caches in local and remote Hazelcast clusters.|
|Hazelcast List||Connector for lists in local and remote Hazelcast clusters.|
|Hazelcast Map Journal||Connector for change events on maps in local and remote Hazelcast clusters.|
|Hazelcast Map||Connector for maps in local and remote Hazelcast clusters.|
|InfluxDb||A Hazelcast Jet Connector for InfluxDb which enables pipelines to read/write data points from/to InfluxDb.|
|JDBC||Connector for relational databases via JDBC.|
|JMS||Connector for JMS topics and queues.|
|Kafka Connect||A generic Kafka Connect source provides ability to plug any Kafka Connect source for data ingestion to Jet pipelines.|
|MongoDB||A Hazelcast Jet connector for MongoDB for querying/inserting objects from/to MongoDB.|
|Redis||Hazelcast Jet connectors for various Redis data structures.|
|Socket||Connector for TCP sockets.|
|A Hazelcast Jet connector for consuming data from Twitter stream sources in Jet pipelines.|
See hazelcast-jet-contrib repository for more detailed information on community supported connectors and tools.
Start Developing Hazelcast Jet
Use Latest Snapshot Release
You can always use the latest snapshot release if you want to try the features currently under development.
<repositories> <repository> <id>snapshot-repository</id> <name>Maven2 Snapshot Repository</name> <url>https://oss.sonatype.org/content/repositories/snapshots</url> <snapshots> <enabled>true</enabled> <updatePolicy>daily</updatePolicy> </snapshots> </repository> </repositories> <dependencies> <dependency> <groupId>com.hazelcast.jet</groupId> <artifactId>hazelcast-jet</artifactId> <version>4.1-SNAPSHOT</version> </dependency> </dependencies>
Build From Source
- JDK 8 or later
To build on Linux/MacOS X use:
./mvnw clean package -DskipTests
for Windows use:
mvnw clean package -DskipTests
We encourage pull requests and process them promptly.
- Complete the Hazelcast Contributor Agreement
- If you're not familiar with Git, see the Hazelcast Guide for Git for our Git process
You are also encouraged to join the hazelcast-jet mailing list if you are interested in community discussions
Source code in this repository is covered by one of two licenses:
- [Apache License 2.0](licenses/apache-v2-license.txt)
- [Hazelcast Community License](licenses/hazelcast-community-license.txt).
The default license throughout the repository is Apache License 2.0 unless the
header specifies another license. Please see the Licensing section for more information.
Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Visit www.hazelcast.com for more info.
*Note that all licence references and agreements mentioned in the Hazelcast Jet README section above are relevant to that project's source code only.