Integrating machine learning (ML) into your data pipelines often feels like more work than it’s worth: complex code, long development cycles, and endless handoffs between data teams. What if you could skip all that?
In this guide, you’ll learn how to operationalize ML forecasts and take advantage of powerful large language model (LLM) functions directly within your data pipelines using Coalesce and Snowflake Cortex. This method is fast, approachable, and production-ready. The data set for this exercise can be found here. Let’s dive in.
Laying the groundwork
To build this type of pipeline, you’ll leverage the latest advancements in the Snowflake ecosystem that enable you to operationalize AI/ML. If you’re unfamiliar with it, Cortex gives you access to pre-built functions for things such as forecasting and sentiment analysis. These are ready to go, and they run natively on the data you already have inside Snowflake. That means no extra tools, no data movement, and no complicated hand-offs.
Coalesce works directly with these functions through the use of the Cortex Package, available on Coalesce Marketplace. It enables you to leverage all of the powerful functionality of Cortex within a standardized framework that allows you to get AI/ML capabilities up and running in no time.
In your Coalesce Workspace, go to Build Settings, then Packages. Click Browse to open Coalesce Marketplace, and locate the Cortex Package first. This package gives you access to pre-built nodes for working with Snowflake Cortex, including forecasting and sentiment analysis.
Once you find the package, copy its Package ID and paste it into the install prompt back in Coalesce. Assign an alias to give it a descriptive name in your workspace. This package will unlock powerful capabilities in your pipeline, and installation takes just a few clicks.
There is not a lot of preparation necessary for the data included in the data set. However, in this walkthrough, you’ll add Stage Nodes to the data sources to act as the transformation layer to help prepare data and implement best practices.
Join the STG_ORDER_DETAIL
and STG_ORDER_HEADER
tables together using a Stage Node. You can do this by using the shift key and selecting both nodes, then right clicking on either node and selecting Join Nodes -> Stage. Navigate to the Join tab and configure the join condition on ORDER_ID
. Call the node STG_ORDER_MASTER
and remove any duplicate columns.
Once this is done, you can jump right into modeling the data for your AI/ML use cases.
Modeling with Dimension and Fact Nodes
With your staging layer in place, you are now ready to begin modeling your data. To start, create a Dimension Node from your STG_CUSTOMER
table. Dimensions store descriptive information about the key entities in your business, such as customers, products, and locations. Double-click into the node and specify CUSTOMER_ID
as the business key.
With the business key configured, let’s set this table up as a Type 2 Slowly Changing Dimension (SCD). To do this, select the FIRST_NAME, LAST_NAME, EMAIL,
and PHONE_NUMBER
columns under the change tracking selector. This automatically enables the node as a Type 2 SCD. Once your Dimension Node is configured, click Create and Run to generate and populate the dimension table in Snowflake. Follow this same process for the STG_MENU
Node, where you’ll create a Type 1 Dimension, which means you are not supplying change tracking columns.
Next, create a Fact Node from your STG_ORDER_MASTER
table. Facts store metrics and numeric data that can be analyzed across different dimensions. Again, double-click into the node and set ORDER_DETAIL_ID
as the business key. After configuring your Fact Node, click Create and Run. Coalesce will create the fact table in Snowflake and populate it with data.
Now you have your dimensions and facts in place and can begin working with the Cortex Package.
Adding CortexML for sentiment analysis
With your foundational data model in place, you’re ready to introduce ML into the pipeline. The first use case is sentiment analysis.
Let’s start with the STG_REVIEWS
Node. To analyze the sentiment of each review, you’ll add a CortexML Function Node to the pipeline. Next, open the newly created node and duplicate the REVIEW_TEXT
column and rename the duplicate column to REVIEW_SENTIMENT
. This is the column that will store the sentiment score returned by Snowflake Cortex.
Inside the node’s configuration panel, locate the Sentiment toggle and enable it. Then, in the output column field, select REVIEW_SENTIMENT
as the target. This tells Coalesce to send the data to the Snowflake Cortex sentiment analysis function and store the result in the REVIEW_SENTIMENT
column.
Once configured, click Create to generate the node, and Run to execute the process. Coalesce calls the Cortex sentiment analysis function behind the scenes, scores each review, and writes the results back to your Snowflake object. No model training, no deployment overhead, and no custom coding required.
Preparing data for ML forecasting
Now, let’s discover how to implement forecasting into your pipeline. To do this, you need to prepare the data first.
In this example, you’ll focus on forecasting sales volume for individual menu items. Start by joining your fact table (order data) with the menu dimension table. You can do this by selecting your FACT_ORDER
Node and DIM_MENU
Node, right-clicking, and choosing Join Nodes and then Stage. Rename the node STG_ORDER_ITEMS_MASTER
.
Navigate to the join tab and join the nodes on MENU_ITEM_ID
.
Once the join is complete, open the new Stage Node and review the available columns. Let’s reduce this data set to the three fields required for forecasting:
- The series column, representing what you’re forecasting. In this case, that’s
MENU_ITEM_NAME
. - The timestamp column, representing the time at which each event occurred. Here, it’s
ORDER_TS
. - The target column, which is the numeric value you want to predict. For this example, it’s
ORDER_AMOUNT
.
With the three columns remaining, you need to make sure the timestamp field is in the format you want to predict on. Currently, the timestamp field captures timestamps down to the millisecond. You want to predict sales at the daily level, so you need to format the field accordingly. You can do this by editing the Transform field for the column and using the function CAST({{SRC}} AS DATE)
.
Because you changed the granularity of the timestamp, you now need to obtain all of the sales at that granularity as well. To do this, use the SUM function to sum all of the sales at the daily level.
Finally, because you’re using an aggregate function, you need to use a GROUP BY. You can apply this in the Join tab by using GROUP BY ALL. With your data prepared, you can now move on to forecasting your sales.
Adding the ML Forecast Node
To add forecasting to your pipeline, right-click the stage node you just prepared and select Add Node, then choose your Cortex Package and select the Forecast Node. Coalesce adds the new node to your graph and connects it to its input. Start by double-clicking the node to open it.
The first thing you’ll do is name your model instance. A name like ORDER_ITEM_FORECAST
works well if you’re forecasting future sales for menu items.
Next, map your three required columns:
- Series column: Select MENU_ITEM_NAME. This tells Cortex which series to forecast—in this case, each menu item.
- Timestamp column: Select ORDER_TS, which denotes the timeline for the forecast.
- Target column: Select ORDER_AMOUNT, which is the value you want to predict.
By default, Coalesce enables multi-series forecasting. This means you’ll get individual forecasts for each unique menu item in your data. You can leave this setting as is, or toggle it off if you only need a single series forecast.
There’s an option to include exogenous variables—additional fields that might influence the forecast. If you have promotional data or seasonal factors that impact sales, you can add them here. For this example, let’s leave it off and focus on the core three columns.
Once your configurations are set, click Create to build the node and Run to execute it. Coalesce calls Snowflake Cortex behind the scenes, which runs the forecasting model and writes the results back to the Snowflake table you just created.
The output includes three key fields for each forecasted record:
FORECAST:
The predicted value.FORECAST_LOWER_BOUND:
The lower end of the confidence interval.FORECAST_UPPER_BOUND:
The upper end of the confidence interval.
These outputs allow you to present not just a point estimate, but also a range of potential outcomes. That’s useful for planning, budgeting, and managing uncertainty in your forecasts.
Wrapping up and next steps
At this point, you’ve built a complete, production-ready data pipeline in Coalesce that combines traditional data modeling with ML features powered by Snowflake Cortex. You’ve applied sentiment analysis to customer reviews, forecasted future sales with time series models, and managed it all through Coalesce’s intuitive interface.
From here, you have a few options for next steps:
- Start experimenting with your own data sets. Replace the sample data with data from your business to explore how Coalesce and Cortex can drive value.
- Explore Git integration and CI/CD workflows in Coalesce to move your pipelines into production with full version control and automated deployments.
- Dive deeper into Snowflake Cortex’s ML capabilities. Beyond forecasting and sentiment analysis, Cortex offers a growing set of functions that you can leverage.
- Explore Coalesce Marketplace for additional packages that extend the platform’s functionality even further.
If you’re ready to see more, check out Coalesce’s documentation and video tutorials. And if you want to discuss how Coalesce can support your data team’s goals, reach out to us directly for a demo.