2025 Data Trends and Predictions With Cloudwerx’s John Cosgrove

Massive knowledge pipelines, a rise in semantic web metadata, and more in the year ahead

John “JC” Cosgrove is a partner at Cloudwerx and a pioneer in seamlessly embedding data into businesses. From the early days of “big data” hype to today’s cutting-edge innovations, his mission has always been clear: bring data to life, make businesses smarter, and push the boundaries.

John recently shared his thoughts with us on the trends that will define data this year and beyond. You can read all about his top 2025 data predictions—what he calls the “monumental mainstreaming of massive knowledge pipelines”—in our report, The Top Data Trends for 2025. Below are the rest of John’s predictions for the year ahead.

 

Q: What’s the most important trend in our industry, especially as we head into 2025, and why does it matter?

John Cosgrove: Hands down, my biggest prediction—and I admit it’s a bit of a cheat because I’m already working on it—is the monumental mainstreaming of what I’m calling massive knowledge pipelines. These pipelines refer to the industrialization of RAG (retrieval augmented generation) into AI. We’re at this incredible intersection, a “perfect storm” if you will. If we were still talking about conveyor-belting unstructured data like we have been over the past five years with traditional machine learning, it wouldn’t be that remarkable. But that’s not the case anymore.

What we’re seeing now is a new type of pipeline, not concerned with traditional relational structures like tuples but with semantic structures. >>Continued in full in our report, The Top Data Trends for 2025.<<

2025 Data Trends

 

Q: What’s something surprising you’ve noticed in 2024 that you predict will continue in the new year?

John: Honestly, what surprises me is the stubbornness of people who remain skeptical about generative AI’s future potential, even though to me it’s so clearly accelerating—moving faster than we thought. It surprises me how stubborn that skepticism has been. It’s also related to the fact that we’re seeing, in real time, the issue with the diffusion of information out of Silicon Valley. As some of the great luminaries from the Valley have said, “The future is here, it’s just not evenly distributed yet,” and that has been very clearly on display for me.

I don’t go more than a week, maybe two, without doing a keynote, a boardroom presentation, or a live demo about generative AI. It’s something I’m very much throwing myself into—helping educate and advocate, because I believe that we urgently need every citizen of planet Earth to level up their awareness and understanding of this technology. I think that’s the most powerful way to ensure we use it well. But it has been genuinely shocking for me to realize how big the gap is, in such a short space of time, between what I know about the reality of the capabilities of these tools today and what even people working in tech might know, let alone people more broadly in the enterprise business space. That delta is, in real time, fundamentally the most challenging thing for me in my history of communicating science and technology. I’m not used to this kind of gap being so prevalent, especially on a topic that seemed to have a certain level of awareness—AI.

The second surprising thing is how this technology is literally speeding up, not slowing down. This conversation about a plateau—what plateau? What are we talking about? I was telling people that we might see mass-scale readiness of voice-to-voice agents potentially as early as March 2025. Well, I look a bit silly now—they’re live and replacing major call center operations, and they’re only going to keep getting more accepted. The rate at which developments are coming six or even twelve months earlier than even those branded as aggressive optimists predicted is breathtaking. I think we’re still going to wrestle with this issue in 2025. If you thought keeping up over the last 18 months was hard, the next 18 months are going to blow your mind.

This is already a problem for those of us trying to sell and engage with technology in enterprise because it’s breaking down what I’ve been calling the “iPhone-ification” of tech. We’ve grown used to a comfortable, predictable progression: every year, the 14 becomes a 15, then a 16, and so on. You could kind of see that with GPT-3, GPT-3.5, GPT-4, but that’s misleading. This is not a linear progression; it’s an aggressive, explosive, and incredibly innovative frontier. That makes it a very unenviable task for someone like a CIO or CTO, who has to navigate annual or three-year budget cycles for investment.

I think this disconnect is paralyzing boards. Maybe “paralyzing” is the wrong word, but there’s a huge gap between seeing the obvious strategic benefit and figuring out how to move, steer, and change the organization underneath them. I think 2025 is going to be a deeply destabilizing year for everyone’s sense of comfort with technology. All of us in this industry are going to have to grapple with the fact that our jobs are changing. By 2026, I think everyone else will have to as well.

“If you thought keeping up over the last 18 months was hard, the next 18 months are going to blow your mind.”

 

Q: What’s something you wish you’d see less of in 2025, and do you think it will happen?

John: I don’t know why, but I’m continuing to see some really silly behavior in the Modern Data Stack ecology. It’s like we’re still trying desperately to cling to certain narratives or stories from the MDS boom, or even worse, totally reject them and invent something new, and I’m like, guys, this is actually hugely validating. Like I said in the previous answer, we now really need all of our infrastructure and pipelining, in a way we’ve never needed more of it before, to power this incredible moment.

I think there’s a process whereby the people who are deep in the weeds of data engineering need to deepen their understanding of this new beast. The people who have gravitated to this as a new hashtag, they need to maybe learn some humility, and before you start telling people that you’re an ontology expert because you saw that topic trend for the last six months, maybe go and read some books on the topic, maybe some long-form articles. Check out Juan Sequeda and his incredible, incredible body of work over the years and research that he and his team have been doing for decades.

This is the third great AI explosion, and we didn’t actually throw out the previous two. They were categorically necessary. Symbolic AI, Neuro 1, and Neuro 2 were all necessary to get to this point. So I’d like to think we can do a bit better than this as a technology industry, but I don’t think I’ll see it. I think the snake oil will still flow strong in 2025, all the more reason for high-performance technology teams and CTOs to stay very grounded, very focused on what they know, because it’s still going to be about managing knowledge.

 

Q: What’s the biggest misconception about the industry you hope gets put to rest in 2025?

John: Let me come out swinging here: this idea of not calling it the Modern Data Stack (MDS) is nonsense. I’m genuinely angry about this. It’s not up to any organization to decide when we abandon a label, a paradigm, or an identity that we’ve chosen to adopt.

First of all, this move is a massive destruction of equity. If I were a board member or major shareholder watching people try to dismantle it, I’d be wondering what on earth we were doing.

Secondly, it’s not our fault that capital instruments all blew up. Sorry, but a desperate attempt at rebranding isn’t going to fix the fact that we all indulged during the ZIRP era. It’ll sort itself out.

Thirdly, the most important part of MDS for me has been the community. It’s been an incredible resurgence in the data world, and it was overdue. Some of us have been in this field for decades, and we love seeing new faces, new ideas, and new perspectives coming in—even if we don’t always show it. We want to support that.

Fourthly, it is indeed the Modern Data Stack. It’s the right label. Data has been central to computing since its inception, going back 50 or 60 years. This is not a label that will come and go.

And finally, by moving away from MDS, we’ve actually proven the naysayers right. One of the criticisms was that this was a craven cash grab—unbundling and rebranding things you already have, like selling people Spark over and over for a decade. We’ve tried to argue that MDS was more than that, that it was the next evolution, a persistent shift we wanted to embrace. But instead, some are ready to throw it all away and make excuses to jump on the generative AI bandwagon as soon as funding gets tough. I hope we get over that nonsense.

Rebranding isn’t going to spontaneously improve valuations. We need the Modern Data Stack to power generative AI. If your investors can’t see that unless you rebrand as the “modern gen AI stack,” then get better investors. Next year, we’re going to see savvier investors as everyone gets more informed. And I have no intention of changing how I refer to it—it’s still the Modern Data Stack. If we’re going to change it, maybe we’ll call it the Knowledge Stack. But I’m not arrogant enough to think I can rebrand something millions of people have embraced as their community, unlike some others in this space. There’s your hot take.

 

Q: What do you see coming around the corner that’s most exciting for you, even if it won’t come to fruition in 2025?

John: This is the most exciting time I’ve ever experienced in technology. I can’t get over it. These models are astounding. For someone who has always been passionate about ontologies, natural language processing, semantics, and all this rich, qualitative stuff, these machines are breathtaking. Every new white paper, every major announcement, it just keeps blowing me away. So, what do I see coming? Another 52 weeks of 100 white papers and mind-blowing revelations each week—and I’m here for it.

 

Q: What process-oriented trends are you observing?

John: Absolutely, the meteoric rise of semantic knowledge pipelines to feed RAG. We’ve been calling them RAG pipelines, but I’m trying to broaden that perspective. People need to realize that RAG is the objective, not the process. It’s incredible to see such an explosion in what is actually a nuanced and potentially complex discipline.

 

Q: What trends are emerging in data management?

John: We’re going to see a massive rise in metadata, but not for the reasons we’ve seen previously. In the past, metadata was driven by compliance, PII, and the efficiencies unlocked by columnar metadata awareness. Now, we’re talking about metadata in a more foundational sense—semantic web metadata. This means tagging, reusable identifiers, importable identifiers, and interoperable identifiers to allow rich, many-to-many contextual labeling of different elements and entities, both within and between ontologies. That’s going to explode.

 

Q: What about data culture? What trends do you see in how leaders are thinking about building data teams and using data?

John: This is a really interesting one. I think we’ve reached a point where leaders are wondering, “Do I need a whole new team for this?” With generative AI, the roles within a Modern Data Stack team—analytics engineer, data scientist, data engineer, business analyst—were working really well. But now, with gen AI, it’s thrown them for a loop. I believe the answer is that we’re all going to have to do this. If you’re working in data, you’re going to have to become deeply robust in your understanding of how to leverage large language models (LLMs). And I don’t mean just dabbling with “prompt engineering.” I mean really understanding how to tweak and calibrate LLMs, why understanding the context window is critical, and how much power you can derive from it.

In data engineering, we haven’t just been about writing APIs—we’ve always cared about the content. The future of this technology is about the content, not just the API calls. We’re nearing the end of struggling with basic orchestration—that’s no longer the issue. If we have multi-million token context windows that are trivial to populate, it will all come down to how you query, structure, tag, and supply millions of tokens of knowledge. I don’t see this creating an entirely new industry of people; I see it as the beginning of the “great qualitative era” of data engineering if I could coin it that way. I could be wrong.

“This is the most exciting time I’ve ever experienced in technology. For someone who has always been passionate about ontologies, natural language processing, semantics, and all this rich, qualitative stuff, these machines are breathtaking.”

For more insights and predictions for the data industry in 2025 and beyond, download our report featuring contributions from top industry executives and thought leaders.

Start Building Data
Projects 10x Faster

Experience the power of Coalesce with a free 14-day trial.