To Code or Not to Code – Is that really the question?
Recently there has been a train of thought suggesting we should have everything as code. Infrastructure as code, data transformations as code, business rules as code. The idea is that with code, you can basically do anything and at the same time keep it governed and versioned using something like git.
While that is certainly true, this is a classic case of “just because you can, doesn’t mean you should.” These ideas, and related concerns, are a direct outgrowth of the agile software development and DevOps movements. There are of course good use cases when you should use code, but when it comes to designing and engineering large scale enterprise data and analytic platforms, everything as code may not be the right answer.
#1 – Not everyone is a coder! Or wants to be one. Or has the skills to be one. As organizations scale up the demand for new data and new data types, they just can’t keep up with the required engineering to build and expand their data pipeline. There really is a skills shortage in this area. Becoming an expert coder usually takes years of experience but there may not be time to wait for that.
#2 – Hand coding takes time, not only to develop but to properly test. And depending on the skill of the coder (see #1) the quality, supportability, and sustainability of the code may be suspect. Simple things like syntax errors can crash a system or at least prevent a move to production from happening on time.
What that means is that organizations are seeing bottlenecks in the coding of their data pipeline. They may be able to quickly on-board data into their data lake, but then the lake quickly becomes a swamp full of unused data that is waiting to b transformed to useful information for the business.
Data analysts and architects should not have to think like software developers and coders to get value from their data. They need to be able to think in terms of business problems and solutions to deliver value faster, maintain flexibility, and be agile.
So, what is the answer?
Use a tool with a well-engineered GUI that provides a level of abstraction to allow less technical people, who are knowledgeable about the organization’s data and their analytics needs, to easily define the data pipeline, transformations, calculations, and business rules that will turn that raw data into useful information.
Some may say code is an abstraction. Sure, it is an abstraction from machine language and binary! But code, even SQL (which I love), is not a useful abstraction for non-technical users. You still have to know the proper syntax, function calls, etc. to be productive. So, the challenge is getting to the right level of abstraction. A well designed UI can do that. It is all about finding the balance to support multiple user personas and still have the power to solve complex business problems.
What do I mean by “well-engineered?”
The GUI should not be a black box. It should generate code under the covers – code that is accessible and viewable (and versioned!). In most cases, it should generate SQL – arguably, the universal language for data. And it should generate the correct dialect of SQL for the target platform (such as Snowflake). The beauty of this is that the analyst does not have to be a SQL expert – they are a business and data expert.
The GUI should have templates for all the basic tasks and types of transformations the users might need to apply. This is one of the best ways to enforce standards and best practice too – all the code generated will look the same, regardless of the “programmer.” Templates also help reduce the learning curve for new comers.
They don’t have to memorize some arcane coding standards document or guess which of the n-possible ways something can be done is the best way. They get out of-the-box productivity. Which means faster time to value.
With a template framework, there also needs to be the ability to create new templates for user extensions. This provides the ability to define custom templates for an enterprise specific implementation in a standard, repeatable fashion. Plus, this allows the expert coders you do have a way to help solve really complex transformations and business rule implementations is a sustainable and reusable way.
Along with templates and code generation you get quality – there are no syntax errors. The code produced is therefore easier to test. Testing can focus on the effect of the code rather than if it simply runs.
The GUI of course needs an intuitive interface to support non-technical users but allow viewing of the code for those that are more technical. Primarily this means it should support several types of visual diagramming in order to support the entire end-to-end workflow so that it is understandable and visible to users and their business counterparts. A good diagram can go a long way in reviewing and documenting what has happened and what is supposed to happen. As the say – a picture is worth a thousand words.
From a documentation perspective, a well-engineered GUI has a repository underneath which means it has everything documented. The interface therefore also needs an excellent search feature so everyone can see what they need to see from definitions to data lineage.
With all of these characteristics in place, the result is a GUI data engineering tool that enforces standards, embeds best practices and provides the flexibility to engineer complex data pipelines in an agile, governed manner without requiring everyone to have highly evolved and specialized coding skills to provide value to their organizations.
Coding has its place, but nothing really beats a well-engineered, code-generating GUI when it comes to agility at scale.