The Ontology Shortcut That Costs You Tomorrow: Why Custom Schemas Create New Data Silos

Written by Eric Scholz | 03.11.2025

This article was written by Eric Scholz, Group Product Manager at SPREAD AI. With a background in Computational Engineering and experience leading product teams, Eric combines deep technical expertise with strategic product thinking. He drives SPREAD’s product portfolio and roadmap, ensuring that engineering teams build scalable, impactful solutions grounded in real-world customer needs.

TL;DR: Embracing an industry-standard knowledge graph requires upfront mapping, but it’s the only way to break the cycle of siloed data and unlock true interoperability for your digital twins.

Why This Topic Matters to Me

During my studies of Computational Engineering Sciences, and before joining SPREAD, I spent several years as software developer, living in the world of relational databases. They’re great for many things: structured data, transactions, consistency. As enthusiastic observer of the emerging LLM revolution and GenAI reasonings, I felt especially attracted by the “new” promise of graph databases: embracing relationships as first-class citizens.

One year ago I then joined SPREAD as Product Manager for Data and AI to find out what it's all about. Here I learned, when you try to model the real-world complexity of an industrial product (its systems, its parts, its lifecycle) a relational database quickly starts to feel like a straightjacket.

In contrast to that, graph databases prove to be incredibly powerful when used for digital twins, or to prepare product data for use in LLMs: Products are networks of interconnected systems with physical and logical dependencies, and graphs mirror that reality. They allow you to query not just what a component is, but how it relates to everything else: requirements, functions, signals, even historical changes.

This shift opened my eyes. At SPREAD, I entered the world of applied knowledge graphs, and I realized: the real challenge isn’t just storing data, it’s making it understandable and reusable across teams and domains.

The Pattern I’ve Seen Too Often

As a developer and later as a Product Manager, I noticed a recurring pattern: teams tend to prefer starting from scratch rather than understanding and reusing existing solutions. It feels faster in the moment, but in reality, it creates more fragmentation.

The same applies to ontologies. Defining a custom schema might feel like the easy way forward but every new ontology is another silo waiting to happen. And without understanding what already exists, you can’t truly build something sustainable.

Why the Shortcut Backfires

High-quality data pipelines remain the holy grail of digitization. Everyone knows the saying “Garbage in, garbage out” and it couldn’t be more true. The quality of your underlying data determines the success of every digital use case.

In my previous life as a software developer, I often had to map data structures from one format to another. This happens everywhere; whether integrating systems, building APIs, or creating analytics pipelines. Too often, I found myself starting from scratch each time, writing custom scripts to perform the transformation “magic.” These ad-hoc solutions worked in the moment but quickly became unmaintainable for others.

When I joined SPREAD’s Data Team, I was introduced to the Data Mapper, a core component of our rapid ingestion layer. To me, it felt like the missing link: A way to standardize and reuse data mapping recipes (YAML files), defining how data should be transformed and aligned with a shared ontology.

The Data Mapper is the tool that transforms big data into smart data. It takes raw, messy engineering data and maps it into SPREAD’s ontology. Along the way, it validates, enriches, and cleanses the data so that downstream applications, from analytics dashboards to AI agents, can work with consistent and trustworthy inputs.

What impressed me most was how the Data Mapper isn’t just about processing data. It’s about helping people understand and reuse the existing ontology. Recently, we even embedded an AI assistant into it, suggesting mappings and preprocessing steps. That means less schema expertise is needed up front, and more focus can go into actual engineering value.

You can see in the embedded video how the AI agent helped me to map sw_modules from my source file to the right entity type SoftwareModule of the SPREAD ontology. Moreover, it showed me how to do the correct deduplication for these types so that also for future ingestions the data remains unique and concise - including a machine-readable data provenance for knowing always its origin and how it was processed.

Animated gif: The agent suggests a mapping and adds the logic for deduplication

For me, creating the YAML mapping like this was the perfect example of how smart tooling can make reuse easier than reinvention.

MY Closing Thoughts

Having seen both sides: the quick-and-dirty shortcuts and the investment in reusable ontologies, I’m convinced: starting fresh every time is the real time-waster.

At SPREAD, we invest in upfront mapping because it ensures that every new use case built on top of the product twin, whether in requirements engineering, simulation, or quality management, we can reuse the same ontology. That’s not just efficient; it’s the only way to truly break data silos.

This is why I’m passionate about what we do. The Data Mapper, the ontology, the Engineering Intelligence Network together, they embody a philosophy: take the “more comprehensive” road now, and you’ll find the real shortcut tomorrow. Solutions to further use cases will basically almost come for free!

Closing Thoughts
If there’s one thing my journey from relational databases to graph-based digital twins has taught me, it’s this: understanding and reusing what exists is the most underrated superpower in tech.

That’s why we don’t sell shortcuts. We sell sustainability. Because in engineering, the fastest path to the future is building on shared knowledge - not recreating it over and over again.

If you’re ready to break the cycle of custom silos and start reusing knowledge at scale, let’s talk.

View full post