StreamSets

StreamSets is a cloud native collection of products to control data drift; the problem of changes in data, data sources, data infrastructure and data processing. The company calls its applications a data operations platform. Included features are a living data map, performance management indices and smart pipelines providing a similar level of control to common business operations systems.

What is StreamSets?

StreamSets provides two products, the Data Collector and the Dataflow Performance Manager. The Data Collector is an open source application which allows users to build platform agnostic data pipelines. They are optimized for continuous ingestion and no data latency. The user can build batch and streaming dataflows with minimal coding. The Dataflow Performance Manager controls multiple dataflows within the visual user interface. Baselines and Key Performance Indicators (KPIs) are measured through the DPM.

Connect all your data sources to any data warehouse

StreamSets provides two products, the Data Collector and the Dataflow Performance Manager. The Data Collector is an open source application which allows users to build platform agnostic data pipelines. They are optimized for continuous ingestion and no data latency. The user can build batch and streaming dataflows with minimal coding. The Dataflow Performance Manager controls multiple dataflows within the visual user interface. Baselines and Key Performance Indicators (KPIs) are measured through the DPM.

StreamSets DPM

Image source: https://streamsets.com/products/

StreamSets and ETL

As with many new products, StreamSets’ flexibility extends beyond traditional ETL. The DPM and Data Collector are useful for a variety of data management applications – real time data mapping, maintenance of corporate architecture documentation. StreamSets easily integrates with Agile processes with its real-time dataflow measurements.

StreamSets approaches ETL differently, providing ETL upon ingest services. This means that the ETL process occurs upon ingestion of streaming data, rather than as a separate three-step process. Creating a pipeline does not require coding. Pipeline data is parsed into a StreamSets Data Collector Record. This common record format does not require custom data transformations. StreamSets uses general purpose connectors to avoid custom coding. In addition, search results can be ingested; providing further segmentation of data. StreamSets easily loads data from a relational database, flat files like Excel, or a CRM.

Pipelines in StreamSets are not single purpose artifacts. Each has a single data origin, but can have multiple destinations. This streamlines the ETL process by allowing pipelines to be used for multiple applications. Multiple uses increases analytics and ETL flexibility.

StreamSets Use Cases

  • Large corporations with multiple data silos can create ingestion frameworks to load and analyze data from subsidiaries, back office systems and suppliers.
  • Large corporations with multiple data silos can create ingestion frameworks to load and analyze data from subsidiaries, back office systems and suppliers.
  • Corporations can ingest data from legacy systems and cloud storage without large investments in custom code and data transformations.

Learn More


Connect all your data sources to any data warehouse