Modern data warehousing workloads don’t just operate on the scale of thousands of database tables; they run on hundreds of thousands–even millions–of columns. In a typical implementation, the data engineering team may be tasked with building pipelines for the migration and maintenance of hundreds of terabytes of data: a project that might take months to complete and significant resources to keep up and running.
A number of tools have emerged in recent years that enable engineering teams to automate data transformations, but they tend to fall in one of two categories: code-first tools that require deeply specialized engineering work, or GUI tools that do not perform well at scale. And this isn’t an area where a truly data-driven organization should have to compromise. To ensure the strongest possible data foundation for your organization, you should have an automation platform that delivers essential data to your business at scale, and with granularity and speed.
We architected Coalesce from the ground up to address this challenge in a unique way–with a column-aware architecture.
Simply put, column-aware is an approach to managing data transformation with an understanding of columns and how they are connected. This automates column-level lineage, as well as the creation and maintenance of database objects at scale, enabling unprecedented scalability and agility in data warehousing workloads.
There are multiple benefits to the column-aware approach; below, we cover three areas where column-aware has the biggest impact:
Data patterns are essential to improving the organization and accessibility of data, especially at enterprise scale. Column-aware architecture is the key to standardizing how transformations are applied, how tables are structured, and how columns are logically connected. Coalesce provides a platform to rapidly implement these data patterns by leveraging metadata at the column level.
Take, for example, a simple slowly changing dimension (type 2)—an industry standard for tracking historical data by creating multiple records for a given natural key, such as a customer’s current address and what it was six years ago, and every change in-between. Writing the complex SQL to deliver this functionality could take hundreds of lines of code.
With column-aware metadata, the data architect can build a single reusable data pattern that can then be applied to the hundreds of fields that need to be tracked.
This can be extended to other data patterns, such as data deduplication, which may otherwise involve dozens of steps of complex algorithms to achieve the desired goal.
Data engineers can then apply column-aware data patterns to what would otherwise be time-consuming, complex manual coding. This would reduce the time to value of data and allow them to focus on the big picture, applying the reusable logic across the data warehouse with confidence.
Impact Analysis & Lineage
However, the work of a data engineer is far from done after they publish a new data pipeline. Source structures change, the business changes requirements, and data consumers constantly question data quality. Column awareness enables complete impact analysis and lineage at the column level. Data engineers can work with confidence and clarity by understanding the downstream effects of their changes before implementing them.
Coalesce builds tools specifically to inspect and anticipate impact analysis with an understanding of the lowest granularity of the data—the column. With column awareness, data teams can perform bulk operations directly at the column granularity and across the data warehouse, making source data changes or business logic changes easy and straightforward to implement. For a data consumer, Coalesce serves documentation for understanding–again, at the column level–where the data came from and how it was calculated. This is key to building trust within the organization.
Scale and Governance
Being column-aware can enable efficiency and accuracy in data management across hundreds of thousands of columns, multiple data teams, multiple environments, and across a business.
Coalesce’s column-aware state management can prevent data loss and data errors by efficiently supporting in-place column-level modifications instead of table recreates. It also provides column-level visibility into change management over time, which is a critical component of data ops and governance.
For example, in an implementation with thousands of columns, rather than building an architecture to recreate those columns every time something changes, Coalesce handles in-place edits out of the box. That reduces costs, improves deployment visibility and planning, and builds trust with the data consumers across the organization.
Coalesce’s column-aware architecture can be the key that unlocks data teams to deliver data faster, and with more confidence than ever before.