Popularity

6.1

Growing

Activity

1.3

Growing

Stars 1,090

Watchers 79

Forks 207

Last Commit about 2 months ago

Description

Hazelcast Jet allows you to write modern Java code that focuses purely on data transformation while it does all the heavy lifting of getting the data flowing and computation running across a cluster of nodes. It supports working with both bounded (batch) and unbounded (streaming) data.

These are some of the concerns Jet handles well:

Scale Up and Out: Parallelize a computation across all CPU cores and cluster nodes Auto-Rescale: Scale out to newly added nodes and recover from nodes that left or failed Correctness Guarantee: at-least-once and exactly-once processing in the face of node failures Jet integrates out of the box with many popular data storage systems such as Apache Kafka, Hadoop, relational databases, message queues and many more.

Jet supports a rich set of data transformations, such as windowed aggregations. For example, if your data is GPS location reports from millions of users, Jet can compute every user's velocity vector by using a sliding window and just a few lines of code.

Jet also comes with a fully-featured, in-memory key-value store. Use it to cache results, store reference data or as a data source itself.

Programming language: Java

License: GNU General Public License v3.0 or later

Tags: Distributed Applications Messaging Stream Processing Data Processing

Latest version: v4.3

Hazelcast Jet alternatives and similar libraries

Based on the "Distributed Applications" category.
Alternatively, view Hazelcast Jet alternatives based on common mentions on social networks and blogs.

Hystrix

9.8 2.7 L2 Hazelcast Jet VS Hystrix

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.
Redisson

9.7 9.9 L1 Hazelcast Jet VS Redisson

Redisson - Easy Redis Java client and Real-Time Data Platform. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

Apache ZooKeeper

9.6 8.3 L1 Hazelcast Jet VS Apache ZooKeeper

Apache ZooKeeper
Pinpoint

9.5 9.8 L2 Hazelcast Jet VS Pinpoint

APM, (Application Performance Management) tool for large-scale distributed systems.
Akka

9.4 9.4 Hazelcast Jet VS Akka

Build highly concurrent, distributed, and resilient message-driven applications on the JVM
Vert.x

9.4 9.5 L1 Hazelcast Jet VS Vert.x

Vert.x is a tool-kit for building reactive applications on the JVM
Zuul

9.4 8.9 Hazelcast Jet VS Zuul

Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more.
Apache Storm

9.2 9.3 L1 Hazelcast Jet VS Apache Storm

Apache Storm
Hazelcast

8.8 9.9 L3 Hazelcast Jet VS Hazelcast

Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Ribbon

8.7 2.3 Hazelcast Jet VS Ribbon

Ribbon is a Inter Process Communication (remote procedure calls) library with built in software load balancers. The primary usage model involves REST calls with various serialization scheme support.
Quasar

8.2 0.0 L1 Hazelcast Jet VS Quasar

Fibers, Channels and Actors for the JVM
Lagom

7.7 5.7 Hazelcast Jet VS Lagom

Reactive Microservices for the JVM
Atomix

7.4 2.2 L4 Hazelcast Jet VS Atomix

A Kubernetes toolkit for building distributed applications using cloud native principles
Bt

7.3 1.1 Hazelcast Jet VS Bt

BitTorrent library and client with DHT, magnet links, encryption and more
Orbit

6.6 0.0 L5 Hazelcast Jet VS Orbit

Orbit - Virtual actor framework for building distributed systems
JGroups

6.5 9.4 L2 Hazelcast Jet VS JGroups

The JGroups project
Copycat

5.2 0.0 L4 Hazelcast Jet VS Copycat

DISCONTINUED. Fault-tolerant state machine replication framework.
ScaleCube

5.1 5.0 Hazelcast Jet VS ScaleCube

Microservices library - scalecube-services is a high throughput, low latency reactive microservices library built to scale. it features: API-Gateways, service-discovery, service-load-balancing, the architecture supports plug-and-play service communication modules and features. built to provide performance and low-latency real-time stream-processing
Dropwizard Circuit Breaker

2.2 0.0 Hazelcast Jet VS Dropwizard Circuit Breaker

A circuit breaker design pattern for dropwizard
kite

1.7 0.0 Hazelcast Jet VS kite

Lightweight service-based PubSub, RPC and public APIs in Java
Axon Framework

- Hazelcast Jet VS Axon Framework

Framework for creating CQRS applications.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Hazelcast Jet or a related project?

Add another 'Distributed Applications' Library

Popular Comparisons

README

Note on Hazelcast 5

With the release of Hazelcast 5.0, development of Jet has been moved to the core Hazelcast Repository - please follow the repository for details on how to use Hazelcast for building data pipelines.

Hazelcast 5 also comes with extensive documentation, replacing the existing Jet docs: https://docs.hazelcast.com/hazelcast/latest/index.html

What is Jet

Jet is an open-source, in-memory, distributed batch and stream processing engine. You can use it to process large volumes of real-time events or huge batches of static datasets. To give a sense of scale, a single node of Jet has been proven to aggregate 10 million events per second with latency under 10 milliseconds.

It provides a Java API to build stream and batch processing applications through the use of a dataflow programming model. After you deploy your application to a Jet cluster, Jet will automatically use all the computational resources on the cluster to run your application.

If you add more nodes to the cluster while your application is running, Jet automatically scales up your application to run on the new nodes. If you remove nodes from the cluster, it scales it down seamlessly without losing the current computational state, providing exactly-once processing guarantees.

For example, you can represent the classical word count problem that reads some local files and outputs the frequency of each word to console using the following API:

JetInstance jet = Jet.bootstrappedInstance();

Pipeline p = Pipeline.create();
p.readFrom(Sources.files("/path/to/text-files"))
 .flatMap(line -> traverseArray(line.toLowerCase().split("\\W+")))
 .filter(word -> !word.isEmpty())
 .groupingKey(word -> word)
 .aggregate(counting())
 .writeTo(Sinks.logger());

jet.newJob(p).join();

and then deploy the application to the cluster:

bin/jet submit word-count.jar

Another application which aggregates millions of sensor readings per second with 10-millisecond resolution from Kafka looks like the following:

Pipeline p = Pipeline.create();

p.readFrom(KafkaSources.<String, Reading>kafka(kafkaProperties, "sensors"))
 .withTimestamps(event -> event.getValue().timestamp(), 10) // use event timestamp, allowed lag in ms
 .groupingKey(reading -> reading.sensorId())
 .window(sliding(1_000, 10)) // sliding window of 1s by 10ms
 .aggregate(averagingDouble(reading -> reading.temperature()))
 .writeTo(Sinks.logger());

jet.newJob(p).join();

Jet comes with out-of-the-box support for many kinds of data sources and sinks, including:

Apache Kafka
Local Files (Text, Avro, JSON)
Apache Hadoop (Azure Data Lake, S3, GCS)
Apache Pulsar
Debezium
Elasticsearch
JDBC
JMS
InfluxDB
Hazelcast
Redis
MongoDB
Twitter

When Should You Use Jet

Jet is a good fit when you need to process large amounts of data in a distributed fashion. You can use it to build a variety of data-processing applications, such as:

Low-latency stateful stream processing. For example, detecting trends in 100 Hz sensor data from 100,000 devices and sending corrective feedback within 10 milliseconds.
High-throughput, large-state stream processing. For example, tracking GPS locations of millions of users, inferring their velocity vectors.
Batch processing of big data volumes, for example analyzing a day's worth of stock trading data to update the risk exposure of a given portfolio.

Key Features

Predictable Latency Under Load

Jet uses a unique execution model with cooperative multithreading and can achieve extremely low latencies while processing millions of items per second on just a single node:

The engine is able to run anywhere from tens to thousands of jobs concurrently on a fixed number of threads.

Fault Tolerance With No Infrastructure

Jet stores computational state in a distributed, replicated in-memory store and does not require the presence of a distributed file system nor infrastructure like Zookeeper to provide high-availability and fault-tolerance.

Jet implements a version of the Chandy-Lamport algorithm to provide exactly-once processing under the face of failures. When interfacing with external transactional systems like databases, it can provide end-to-end processing guarantees using two-phase commit.

Advanced Event Processing

Event data can often arrive out of order and Jet has first-class support for dealing with this disorder. Jet implements a technique called distributed watermarks to treat disordered events as if they were arriving in order.

How Do I Get Started

Follow the Get Started guide to start using Jet.

Download

You can download Jet from https://jet-start.sh.

Alternatively, you can use the latest docker image:

docker run -p 5701:5701 hazelcast/hazelcast-jet

Use the following Maven coordinates to add Jet to your application:

<groupId>com.hazelcast.jet</groupId>
<artifactId>hazelcast-jet</artifactId>
<version>4.2</version>

Tutorials

See the tutorials for tutorials on using Jet. Some examples:

Reference

Jet supports a variety of transforms and operators. These include:

Stateless transforms such as mapping and filtering.
Stateful transforms such as aggregations and stateful mapping.

Community

Hazelcast Jet team actively answers questions on Stack Overflow and Hazelcast Community Slack.

You are also encouraged to join the hazelcast-jet mailing list if you are interested in community discussions

How Can I Contribute

Thanks for your interest in contributing! The easiest way is to just send a pull request. Have a look at the issues marked as good first issue for some guidance.

Building From Source

To build, use:

./mvnw clean package -DskipTests

Use Latest Snapshot Release

You can always use the latest snapshot release if you want to try the features currently under development.

Maven snippet:

<repositories>
    <repository>
        <id>snapshot-repository</id>
        <name>Maven2 Snapshot Repository</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
        </snapshots>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>com.hazelcast.jet</groupId>
        <artifactId>hazelcast-jet</artifactId>
        <version>4.3-SNAPSHOT</version>
    </dependency>
</dependencies>

Trigger Phrases in the Pull Request Conversation

When you create a pull request (PR), it must pass a build-and-test procedure. Maintainers will be notified about your PR, and they can trigger the build using special comments. These are the phrases you may see used in the comments on your PR:

verify - run the default PR builder, equivalent to mvn clean install
run-nightly-tests - use the settings for the nightly build (mvn clean install -Pnightly). This includes slower tests in the run, which we don't normally run on every PR
run-windows - run the tests on a Windows machine (HighFive is not supported here)
run-cdc-debezium-tests - run all tests in the extensions/cdc-debezium module
run-cdc-mysql-tests - run all tests in the extensions/cdc-mysql module
run-cdc-postgres-tests - run all tests in the extensions/cdc-postgres module

Where not indicated, the builds run on a Linux machine with Oracle JDK 8.

License

Source code in this repository is covered by one of two licenses:

[Apache License 2.0](licenses/apache-v2-license.txt)
[Hazelcast Community License](licenses/hazelcast-community-license.txt)

The default license throughout the repository is Apache License 2.0 unless the header specifies another license. Please see the Licensing section for more information.

Credits

We owe (the good parts of) our CLI tool's user experience to picocli.

Copyright

Visit www.hazelcast.com for more info.

*Note that all licence references and agreements mentioned in the Hazelcast Jet README section above are relevant to that project's source code only.

Hazelcast Jet

Distributed Stream and Batch Processing