March 2, 2020

MuleSoft vs. Google Cloud Dataflow vs. Stitch

Most businesses have data stored in a variety of locations, from in-house databases to SaaS platforms. To get a full picture of their finances and operations, they pull data from all those sources into a data warehouse or data lake and run analytics against it. But they don't want to build and maintain their own data pipelines.

Fortunately, it’s not necessary to build everything in-house. We put together this ETL tool comparison guide to help you choose the product that’s the best fit for your business

Overview

MuleSoft, Google Cloud Dataflow, and Stitch are all popular platforms. Here's a side-by-side look at how they stack up against each other.

Transformations

MuleSoft

MuleSoft specializes in application integration — moving data from one application or platform to another. Each application has defined data structures, so MuleSoft often has to transform data from a source to fit the destination schema. MuleSoft comes with more than 20 prepackaged "transformers," and gives developers the ability to write their own custom processors in scripting languages such as JavaScript and Groovy.

Google Cloud Dataflow

Cloud Dataflow provides a serverless architecture that can shard and process large batch datasets or high-volume data streams. The software supports any kind of transformation via Java and Python APIs with the Apache Beam SDK.

Stitch

Stitch is an ELT product. Within the pipeline, Stitch does only transformations that are required for compatibility with the destination, such as translating data types or denesting data when relevant.

Stitch is part of Talend, which also provides tools for transforming data either within the data warehouse or via external processing engines such as Spark and MapReduce. Transformations can be defined in SQL, Python, Java, or via graphical user interface.

Connectors: Data sources and destinations

Each of these tools supports a variety of data sources and destinations.

MuleSoft

The MuleSoft Anypoint Platform includes several components:

  • Anypoint Design Center provides tools to build connectors and implement data and application flows, including the Anypoint Studio desktop IDE.
  • Anypoint Security defends the APIs and integrations users build.
  • Mule runtime engine powers the platform's connections to applications, data, and devices.
  • Anypoint Management Center lets users manage APIs and users, analyze traffic, monitor SLAs, and fix integration flows.
  • Like other application integration platforms, MuleSoft typically replicates data changes one at a time between multiple systems, as events happen, rather than pushing batches of data to a single central repository.
  • MuleSoft supports almost 300 connectors to databases, SaaS platforms, storage resources, and network services. It supports Amazon S3 data lakes, but no cloud data warehouses. For more info Mulesoft Online Training

Google Cloud Dataflow

Cloud Dataflow supports both batch and streaming ingestion. For batch, it can access both GCP-hosted and on-premises databases.

For streaming, it uses PubSub. Cloud Dataflow doesn't support any SaaS data sources. It can write data to Google Cloud Storage or BigQuery.

Stitch

Stitch supports more than 100 database and SaaS integrations as data sources, and eight data warehouse and data lake destinations. Customers can contract with Stitch to build new sources, and anyone can add a new source to Stitch by developing it according to the standards laid out in Singer, an open-source toolkit for writing scripts that move data.

Singer integrations can be run independently, regardless of whether the user is a Stitch customer.

Running Singer integrations on Stitch’s platform allows users to take advantage of Stitch's monitoring, scheduling, credential management, and autoscaling features.

Support, documentation, and training

Data integration tools can be complex, so vendors offer several ways to help their customers. Online documentation is the first resource users often turn to, and support teams can answer questions that aren't covered in the docs. Vendors of the more complicated tools may also offer training services.

MuleSoft

MuleSoft provides online, email, and telephone support. Documentation is comprehensive. Digital training materials are available. For more additional info Mule Training

Google Cloud Dataflow

Google provides several support plans for Google Cloud Platform, which Cloud Dataflow is part of. Documentation is comprehensive. Google offers both digital and in-person training.

Stitch

Stitch provides in-app chat support to all customers, and phone support is available for Enterprise customers. Support SLAs are available. Documentation is comprehensive and is open source — anyone can contribute additions and improvements or repurpose the content. Stitch does not provide training services.

Pricing

MuleSoft

MuleSoft provides a 30-day free trial. Pricing isn't disclosed.

Google Cloud Dataflow

Cloud Dataflow is priced per second for CPU, memory, and storage resources.

Stitch

Stitch has pricing that scales to fit a wide range of budgets and company sizes. All new users get an unlimited 14-day trial. After the trial, there's a free plan for smaller organizations and nonproduction workloads. Standard plans range from $100 to $1,250 per month depending on scale, with discounts for paying annually. Enterprise plans for larger organizations and mission-critical use cases can include custom features, data volumes, and service levels, and are priced individually.