The modern data stack (MDS) is the umbrella of systems, tools, and platforms data teams use for manipulating data. In the past decade, there has been tremendous advancement in data infrastructure, enabling stepwise changes in productivity for data teams.
While this has been a boon for productivity in some regards, the transformation and metadata components of the MDS have lagged behind. The goal of the MDS is to simplify and democratize access to insight, delivering value to the business. To be useful, data must be transformed from its raw state.
Unfortunately, efficient transformation workflows are few and far between. This commonly leads to bottlenecks—analytics and engineering teams are overwhelmed by a seeming lack of resources. In reality, it’s transformation workflows that are paramount for accurate, timely data.
While leadership might not peek behind the scenes, it’s crucial for them to understand the transformation process. Accurate data can be a boon; inaccurate data a calamity.
Challenges in Data Transformation
Current challenges in data transformation stem from a poor developer experience (DevEx)—the workflows and friction developers face in building useful products.
Relative to data, front-/backend software engineering are more mature fields. Their developers have a suite of tools and best practices for building robust, production-level systems. These tools maximize flow state while minimizing cognitive load and context switching. This comes not only with a boost in productivity, but happier developers!
As the volume and velocity of data accelerate, many are realizing that data and analytics engineers lack such an environment. Solutions are presently constrained to code- or GUI-only (Graphical User Interface) platforms, which pigeonhole developers into a solution with no freedom or too much freedom. Additionally, most tools lack granular metadata, leading to high-maintenance systems with long times to recovery and a steep learning curve.
Data transformation and the data DevEx are hindered by the myriad of complexity in the MDS, while simultaneously limited by the lack of hybrid tooling. We need a solution that automates manual tasks, consolidates tooling, and reduces the friction to development.
Automating Data Transformation
Today, a typical data transformation workflow might involve developing across storage layers and switching tools, with a minimal amount of metadata tracking or testing. Teams are frequently siloed, duplicating work and having little business context about the data they’re developing.
This poses a challenge to individual contributors looking to learn new techniques, develop their skills, and advance towards their goals. The solution is to automate the transformation layer. Not the process of building transformations—that still requires a human touch but, rather, building patterns and systems that scale transformation workflows.
Data Architecture as a Service
Data Architecture as a Service (DAaaS) is the high-level solution to automation. DAaaS mandates architects and engineers creating templates, or “data patterns,” that can be implemented downstream by those with a range of technical skills.
With the requisite patterns, everyone can work in parallel to build data systems and triage issues without the need to create tickets for the data team. Under DAaaS, the most skilled practitioners only need to review solutions, not create them. This enables asynchronous development and shared resources, eliminating bottlenecks and democratizing the transformation process… and data itself.
Along with DAaaS, the data DevEx mandates a metadata-first approach. Imagine your data warehouse is a city, making data architects the city planners. Our builders and inhabitants want to be as productive as possible by making repairs, building cool stuff, and improving inhabitants’ quality of life, but it’s easy to get lost during work. The solution? A map!
A map is the key to the details of the city—its conventions, nuances, and intricacies. Imagine trying to get around without Google or Apple Maps! Most maps show us streets/addresses… and that’s just fine day-to-day. This is like table-level lineage.
The problem? What happens when a building is in need of repair? Or a new city planner joins the team? Getting directions to a building is a start, but it won’t help you much if you’re building and evolving (i.e. transforming) the city. Simply knowing a building’s location won’t suffice—we need individual schematics: a map that contains every brick and concrete slab, the most atomic units possible will be the most useful. Hence, column-level lineage is our imaginary map-of-the-future.
Column-level metadata gives our team the most granular insight possible. It enhances their ability to triage errors, simplifies workflows, and improves productivity… and the experience of working on data.
Hybrid Development Environments
We’ve already discussed the issue with solely code- or GUI-centric solutions: they lack the rigor and structure necessary to scale well (at least not without tremendous investment) or they have a narrow amount of problems they can truly solve.
An automated transformation environment is one that has the friendliness and ease of a GUI while allowing the flexibility, versioning, and best practices allowed only by code. Hybrid development environments democratize the transformation process: they bring development best practices, like version control and code reuse, to all users, not just those who can configure git, install an IDE, and stand-up complex development workflows.
Snowflake Data Transformation with Coalesce
Transformation and metadata are tightly linked— the best time to map out something new is when you create it. By linking your object metadata to the processes that maintain it, you can be certain you have a brick-level map of your data city.
Coalesce builds lineage at the column level during the transformation step—directly at the source, giving your city-planners and data-builders the most granular view of their domain possible, enabling them to do the best work they can.
Hybrid Development Environment
Coalesce’s hybrid development environment combines the best of code and a GUI, allowing your most technical teammates to write complex models while making the most of our developer-friendly UI. Our source and target configurations mean data integration is only a one-time setup.
Combining the flexibility of code with the responsiveness of a GUI streamlines the transformation development process. By leveraging nodes as a building block, Coalesce is able to automate the repetitive side of data transformation.
The power of code + GUI development: rapid iteration.
With a gentle learning curve to transformation frameworks, hybrid development maximizes the number of contributors and allows business users to build their own logic or implement DAaaS patterns, eliminating bottlenecks and scaling possibilities.
Elimination of Bottlenecks
Bottlenecks are created when a centralized data team serves as a gatekeeper to data. As a result, requests accumulate and transformation stagnates.
With Coalesce, business and data teams can develop asynchronously. Data architects and engineers design organizational data patterns, while business users and analysts implement them, in parallel. This brings those with the most context closest to the data, while allowing engineers to do what they do best: build frameworks that embody efficiency and simplicity.