Scalable solutions start from first principles. To scale, we need easy collaboration, reliability, and low technical debt. Scaling means working together to build something great, efficiently. A scalable solution must be simple while allowing for complexity when necessary.
What’s a Node?
At Coalesce, we start from first principles, too. That’s why all of our transformations are defined as nodes—atomic, reusable pieces of code that are designed to help data teams standardize and govern how data is transformed as they build pipelines collaboratively. Nodes are the building blocks of DAGs, directed acyclic graphs, that define an analytics workflow.
Chaining together nodes and edges (relationships between nodes) in Coalesce, you can create data pipelines that are built in the same way they’re visualized: an interactive UI that shortens the learning curve to data transformation.
Deploying nodes is simple and intuitive. Here, we stage raw tables and JOIN to create a dimension in seconds.
Exploring Node Types
While nodes are the atomic components of transformation in Coalesce, node types represent the ability to build custom nodes using any data pattern. Node types allow you to define your own transformations using SQL, YAML, and Jinja, which can then be implemented by your team. Put simply, customizable node types combine the simplicity of a node and the flexibility of code.
YAML provides easy templating, Jinja variable interpolation, and SQL: the time-tested language of choice for manipulating relational data. The possibilities of creating your own node types are endless, but more than anything, they provide efficiency, governance, and accessibility.
While powerful, Coalesce node types are simple to manipulate and make your own—you can create a new pattern in three clicks.
Incremental processing and selective materialization are not new concepts. However, few, if any current solutions are optimized for efficiency in the Snowflake ecosystem. Most are wasteful and expensive, fully querying and completely rewriting tables on every run (DROP & CREATE).
Coalesce processes changes as UPDATEs allowing your team to build quicker, move faster, and run more projects. Built exclusively for the Snowflake Data cloud, Coalesce also leverages Snowflake-native optimizations, like zero copy clones, to bring the latest & greatest to your data pipelines.
Node types are also a simple way to manage table metadata. Thanks to Coalesce’s column-aware architecture, you now have the power to introspect the columns of your data model and build powerful templates at the most granular level possible.
Improved data governance means easier documentation, which improves data discoverability and enables concepts like Data Mesh at scale. It also lowers the time to recovery (TTR) for data teams—no more digging around for data that’s gone askew. With a column-level view, you can detect and triage issues in a fraction of the time, reduce the maintenance of your data systems, and make the most of the data you already have.
Difficult, cumbersome processes are the ones that are often ignored—you might notice a lack of metadata in your current transformation solution for that reason. With Coalesce, adding documentation is frictionless, simple, and collaborative.
Column-level lineage is one feature that helps to make documentation seamless.
While legacy transformation solutions have brought engineering best-practices to data, they’re often reserved for just that: engineers. Complex, code-only solutions require extensive local development environments and knowledge of a host of related technologies.
The cognitive load of configuring and pushing code to these systems detracts from the work of creating value from data and creates a barrier to entry for less technically-oriented teammates. With Coalesce, fully managed prod/dev environments are available out of the box. Our UI creates a flexible, but full-featured, environment where anyone can contribute to data pipelines.
Because nodes are no-code, only a knowledge of relational data is required—but for those who do know SQL/Jinja, the possibilities are endless. Coalesce’s node types let users configure templates and deploy complex frameworks at scale. Making use of a Data Architecture as a Service platform or Data Mesh org, a data and analytics team could simply and effectively deploy a self-service transformation framework every team is able to contribute to.
Using Different Node Types
Adding a new node type is just as simple as adding a stage.
Node types create endless possibilities. As code-first elements, node types are intended to solve any transformation problem your team might encounter. Here are a few examples of patterns you can leverage with node types
Data Architecture as a Service (DAaaS)
DAaaS is a term we associate with the democratization of data transformation. Under DAaaS, a data architect/engineer defines analytics patterns that can be implemented downstream.
Thanks to templating, node patterns can be designed to enable seamless implementation by consumers by simply selecting a few variables. Want to create stage tables from raw sources? With stage templates, you can ensure a consistent implementation. Have a common pattern for certain nodes? Use CREATE templates.
With templating, you can be sure your patterns and definitions are seamless to implement. Lowering the friction and technical barriers required to implement a standard makes it more likely to be followed. How often are shortcuts taken with difficult, cumbersome processes (like documentation in your current table-level lineage solution)?
Unlock Common Table Patterns
Coalesce node types go beyond simple stage templates. By capitalizing on particular node types, you can define complex table patterns for fact tables that allow your teams to asynchronously develop your data warehouse.
Using CREATE templates, for example, you might define a logical pattern for SCD-2 tables. Then, any teammate who knows how to fill in the proper variables for their data source can implement an SCD-2 table, once reserved for data engineering savants.
Using RUN templates, the same possibility is extended to incremental data. That means you can create patterns for the entire organization with incremental materialization logic.
With templates, node types open the possibility of democratizing complex data structures once reserved for data and analytics engineers. This reduction of complexity means that data teams will no longer be the bottleneck for slightly complex, but repetitive tasks.
Build Accessible ML Forecasts
Nodes extend beyond simple SQL. Leveraging the power of Snowflake, we can use node types to create nodes capable of machine learning—for things like anomaly detection and forecasting.
Creating a node type with one of Snowflake’s ML functions is no more complex than creating any other. Historically, forecasting has been a complex and arduous task, reserved only for data scientists. With Snowflake and Coalesce, we can forecast at scale and reduce the complexity to implementation. For a full implementation, check out this example. This functionality extends to the entire Snowpark environment. See another example here.
Coalesce unites the art and science of data transformation by making it accessible to all data stakeholders.
Using nodes and edges, we’ve reimagined the DAG to be a collaborative, friendly experience while maintaining flexibility and technical capability. User-defined nodes unlock complex data patterns and new paradigms that enable data teams to rapidly build a value-driven data warehouse, at scale.