Confluent is a data streaming platform based on Apache Kafka - a full-scale streaming platform, capable of not just publish-and-subscribe, but also storage and processing of data within the stream.

Confluent offers three flavors of its product - a free, open-source streaming platform which makes it easy to get started with real-time data streams; an enterprise-grade version of its product with additional administration, ops and monitoring features; and a paid cloud-based version.

Confluent platform

Image source:

Confluent History - Roots in Apache Kafka

Confluent was founded by three LinkedIn Engineers - Jay Kreps, Neha Narkhede and Jun Rao, based on the open source software they developed, Apache Kafka.

Connect all your data sources to any data warehouse

The Kafka project, started by LinkedIn in 2010, set out to develop a real-time messaging technology. Kafka is a central repository of streams, where events are stored in Kafka before they are routed elsewhere in a data cluster for further processing and analysis. Among users of Kafka are LinkedIn, Twitter, Netflix, Airbnb, Uber, Cisco, Goldman Sachs and Wal-Mart. Kafka is also widely adopted by Financial services firms to reduce Credit Card fraud.

Confluent is a more complete distribution of Apache Kafka, which expands Kafka’s integration capabilities, adding tools to optimize and manage Kafka clusters, and methods to ensure the streams are secure. Confluent Platform makes Kafka easier to build and easier to operate.

Confluent raised $80 million in 3 rounds, most recently $50 million in 2017.

Confluent Open Source

Confluent Open Source is a developer-optimized distribution of Apache Kafka. Its key benefits compared to the original Apache Kafka are:

  • More languages - the majority of teams use two or more languages with Kafka. Confluent Open Source supports multiple programming languages, including Java, C/C++, .NET, JMS and Python.
  • Better data management - Confluent offers a schema registry that sets schemas for the data in your streams, ensuring consistency as you scale.
  • Improved reliability - Confluent Open Source is packaged by the Kafka experts at Confluent, and is the most tested, reliable distribution of Kafka.
  • Connectors - Confluent and partner vendors offer tested and secure connectors for many data systems, including Hadoop HDFS, JDBC, ElasticSearch, DataStax, Attunity and Amazon S3 (see all connectors). Connectors make it quick and easy to start setting up reliable data pipelines with Kafka.
  • REST proxy - Confluent Open Source provides a RESTful interface to your Kafka cluster, making it easy to produce and consume messages, view the state of your cluster, and perform administrative actions.

Confluent Enterprise

Cofluent Enterprise is a distribution of Apache Kafka for production environments. It simplifies operations and administration of Kafka clusters, with administration, monitoring and management tools.

In addition to the capabilities of Confluence Open Source, Confluence Enterprise provides:

  • Confluent Control Center - Monitors and manages Kafka clusters from a rich user interface. Enables quickly scanning through a cluster for anomalies and track down messages to their source. Allows managing data pipelines without a line of code.
  • Auto Data Balancer - Optimize resource utilization and reliability. Uses a rack-aware rebalance algorithm that optimizes for disk space utilization. Partition movements are executed with minimal impact to your traffic.
  • Multi-Datacenter Replication - An easy and reliable way to run Kafka across data centers or with multiple clusters. Confluent includes Multi-datacenter Replication, an optional, licensed feature that manages your streaming pipelines across data centers. It enables synchronizing two active data centers (active/active), and replicating to a global data center. For improved security, replication uses Kafka’s SASL for Kerberos, and Active Directory SSL encryption between datacenters.

Confluent Cloud

Confluent Cloud offers Apache Kafka as a Service. It is hosted and fully managed Apache Kafka in the public cloud. Deployed in minutes, it is a streaming data service for cloud-first developers or cloud-based enterprises.

Confluent Cloud enables:

  • Using the Rich Kafka ecosystem - unlike proprietary streaming services from cloud providers, Confluent Cloud offers the same open-source Apache Kafka APIs. This makes it possible to leverage existing clients, connectors and tools supported by the Kafka and Confluent communities.
  • No lock in - Confluent Cloud is vendor agnostic, so you can “lift and shift” your Kafka applications from any location into or out of Confluent Cloud.
  • Increased development velocity - Confluent Cloud takes away the operational burden of running Kafka and lets developers focus on building streaming applications, with resilience, security and performance built in.
  • No administration overhead - even as your Kafka cluster grows, no need to manage Kafka brokers, support the application, handle high availability and failure scenarios.

Connect all your data sources to any data warehouse