AI in Systems Engineering: Why Most Pilots Fail and What Top OEMs Do Instead

ALL Blogs

AI in Systems Engineering: Why Most Pilots Fail and What Top OEMs Do Instead

by SPREAD Team on July 24, 2025

AI in Systems Engineering: Productivity lags while complexity accelerates

Automotive complexity is rising at an unsustainable pace. Software-defined architectures, electrified powertrains, and globally distributed platforms are pushing the boundaries of what engineering teams can handle. According to McKinsey’s 2025 report Software-defined hardware in the age of AI, software complexity in vehicles is growing by over 40 percent annually. Engineering productivity, by contrast, is improving at just 6 percent.

This gap creates structural pressure. Teams are expected to deliver more functionality across more configurations, with fewer resources and shorter timelines. Traditional workflows, manual reviews, siloed systems, static documentation, can no longer absorb the load without delays, cost overruns, or quality risks.

In this environment, AI is often seen as the way out. Leadership teams deploy large language models to automate tasks like test plan creation, ticket analysis, and specification reviews. Yet most pilots stall. Studies from BCG and TechRadar confirm that nearly three-quarters of AI initiatives fail to move beyond isolated use cases.

This article breaks down why AI pilots in engineering consistently underperform, and what to do instead. If you’re under pressure to scale AI beyond the pilot stage, this article might be your starting point.

Why AI in Systems Engineering pilots stall (and what to do instead)

AI isn’t failing in engineering because of weak models. It’s failing because most pilots ignore how engineering actually works.

Executives invest in generative AI expecting faster specs, automated documentation, and better decision support. But engineering complexity, distributed systems, evolving requirements, and messy operational data, doesn’t yield to off-the-shelf AI. The result? Most initiatives never move beyond the lab.

Here’s where most Engineering teams go wrong, repeatedly, and at scale.

1. Misusing generative models

Facing pressure to “do something with AI,” many engineering teams turned to generic language models as quick fixes. The result: isolated scripts, internal chatbots, and experimental copilots patched onto legacy workflows.

These setups often look productive at first, summarizing emails, drafting responses, querying documents. But underneath, they miss the core challenge: engineering logic isn’t just language. It’s systems, variants, specs, and dependencies that LLMs don’t understand unless structured explicitly.

When these workarounds are applied to critical tasks, like requirements harmonization, test plan drafting, or error analysis - they produce answers that feel plausible but carry hidden cost: wrong tolerances, broken logic, or mismatched parts.

What works: Treat LLMs as assistants, not substitutes. Use them where language dominates (summaries, overviews, translations), but never let them own decisions that rely on system integrity or traceable dependencies. For those, structure comes first.

2. Feeding raw documents into the model

A common misstep: uploading hundreds of PDFs, scans, Excel sheets, and diagrams into a model without preprocessing. The goal is simplicity. The result is failure.

LLMs have limited context windows. Unstructured input breaks relevance, creates hallucinations, and drives up costs. More importantly, it hides critical relationships: which part connects to which system, which spec version applies to which variant, what constraint is violated.

What works: Clean and structure the data first. Build logical links between components, functions, and requirements. Organize by product, variant, and release. The model can only reason if the input reflects how engineers think.

3. Using fine-tuning as a shortcut

When LLMs disappoint, some teams jump straight to fine-tuning on proprietary data. It feels like tailoring. But in practice, it hardcodes knowledge into opaque weights.

You lose flexibility. You lose traceability. Every change in the product or process requires expensive retraining. And when something breaks, you can’t explain why the model answered the way it did.

What works: Keep the model general-purpose and inject context dynamically. Retrieval-based methods (like RAG) pull in live data at query time - so updates are instant, and logic stays visible.

4. Prototyping in ideal conditions

Many pilots succeed in the lab, where inputs are clean, edge cases are excluded, and performance is predictable. But real engineering isn’t a lab.

Live systems involve late-breaking change requests, multi-language documentation, vendor-specific file formats, and real-time decisions with zero margin for error. Lab models collapse under that pressure.

What works: Test in real workflows. Run actual engineering change notices, support tickets, and cross-variant diagnostics through your pilot. Capture failure cases. Measure latency and override rates. Iterate in production, not in theory.

The pattern behind the failures

These aren’t isolated mistakes. They stem from one root problem: assuming AI performance depends on model size. It doesn’t.

It depends on data structure.

Engineering data is contextual, fragmented, and full of implicit logic. No mode (regardless of size) can reconstruct that logic from raw text. Without structure, even state-of-the-art AI produces unreliable output.

AI in Systems Engineering: What the Best Teams Do Differently

Engineering organizations that succeed with AI follow a different approach. They do not start with models. They start with systems. Their execution is methodical, grounded in operational priorities, and focused on durable outcomes rather than one-off pilots.

These organizations apply six key principles.

1. Start with an operational bottleneck

Rather than asking where AI could theoretically help, successful teams identify where engineers are already losing time or quality. The most effective starting points are repetitive, high-friction tasks with measurable downstream impact. These include comparing system variants, resolving conflicting specifications, or triaging field complaints. The focus is not on breakthrough innovation but on eliminating waste in day-to-day work.

2. Use existing data

Waiting for a clean, centralized dataset often leads to paralysis. High-performing teams begin with what is already available. That includes structured BOMs, requirements in Excel or REQIF, KBL wiring files, or logs from test environments. Even if the data is fragmented or inconsistent, progress starts by mapping what is already in use. Structure is prioritized over completeness.

3. Make relationships explicit before using models

Large language models are not a substitute for structured reasoning. They need context to operate effectively. Smart teams begin by encoding relationships between components, functions, test cases, and incidents. Even a lightweight graph of these dependencies improves interpretability and performance. It also ensures that the logic remains accessible and that the model can be replaced without reengineering the system.

4. Test in production environments

AI pilots that are built in isolation rarely survive deployment. High-performing teams expose their systems to real operational complexity from the beginning. That includes change requests with missing data, inconsistent supplier inputs, or multilingual tickets from support centers. These conditions are not edge cases. They are the baseline. Testing under these conditions provides the only reliable signal of readiness.

5. Measure outcomes from the first iteration

Executives need more than model accuracy. They need evidence that AI improves how the business operates. That means tracking measurable deltas such as review cycle time, error resolution speed, or expert time saved. These metrics create the foundation for a scale-out decision. They also provide a feedback loop for refining the solution based on how it performs in live workflows.

6. Build modular, replaceable systems

Embedding knowledge directly into model weights creates long-term maintenance risks. Leading teams separate data structure from model logic. They use APIs for data access, open formats for interoperability, and retrieval-based methods for flexibility. This modularity protects against vendor lock-in and enables upgrades without rewriting core workflows.

This playbook does not rely on perfect data, advanced models, or high-risk investments. It relies on sequencing and discipline. The structure comes first. The models come second. What results is not a one-off pilot, but an infrastructure layer that improves continuously and scales sustainably.

How SPREAD enables scalable AI in Systems Engineering

Engineering leaders face a core paradox: AI promises exponential gains, yet most pilots stall at the prototype phase. SPREAD resolves this by focusing not on model performance, but on data structure and reusability. Its architecture embeds product knowledge once, then compounds value across use cases.

SPREAD transforms engineering data into a durable intelligence layer. It reduces the cost of launching AI use cases, improves trust in model outputs, and accelerates time-to-impact—without compromising traceability, modularity, or domain depth. That’s how engineering organizations move beyond pilots and into platform-scale AI adoption. Read our understand the tech document.

From isolated pilots to a shared intelligence layer

SPREAD is not a collection of tools. It is a shared infrastructure for engineering intelligence. Each deployment encodes knowledge, about products, systems, and logic, into a common graph. Every use case, from specification review to issues analysis, builds on this shared layer. There is no duplication of logic, no reinvention of schemas, no fragmentation of data pipelines.

This enables a fundamentally different trajectory: Every new workflow deployed becomes faster and cheaper to implement than the last.

A domain ontology as a strategic asset

At the center of SPREAD’s architecture is a canonical domain ontology, built specifically for the automotive and industrial context. It standardizes terminology across mechanical, electrical, software, and service domains. Ambiguities are resolved. Variant-specific IDs are mapped. Relationships are explicit.

This ontology is not a side asset. It is the backbone that aligns engineering data with AI reasoning. It ensures that “control unit 5A,” “rear MCU,” and “ECU 375R” all refer to the same system node, regardless of author, format, or lifecycle stage.

The result is consistency across use cases and teams, reducing both onboarding time and interpretation error.

Engineering logic for reuse across the lifecycle

Every logic structure SPREAD builds, component hierarchies, test-result mappings, fault correlations, feeds into the graph and becomes reusable across applications. A fault identified during R&D testing can inform triage suggestions in aftersales. A configuration pattern flagged during production can be traced back to conflicting specs.

This is not post-hoc reporting. It is active reuse. Engineers no longer start from a blank slate. Knowledge compounds.

Structured before and after AI

SPREAD does not rely on raw document ingestion. It indexes, maps, and structures data prior to AI invocation. This minimizes hallucination, improves model reliability, and enables precise, context-aware responses. When AI is used, whether LLMs, retrieval-based systems, or hybrid agents, outputs are validated, logged, and grounded back into the structured layer for traceability.

This creates a closed loop where outputs are both actionable and auditable.

Modular architecture, not model lock-in

SPREAD decouples product knowledge from model logic. Its API-first design allows organizations to plug in different models, tools, or custom agents without rewriting workflows. The ontology remains stable. Relationships remain consistent. AI becomes a service on top of a structured core, not the foundation itself.

Deep dive: real-world lessons for Systems Engineers

For a firsthand look at how top engineering organizations are applying these principles, watch the full session Purpose-built AI for Engineering Excellence. In this executive webinar, SPREAD’s Chief Product Officer Shane Connelly, Head of Solution Consulting Oliver J. Blauth, and VP Marketing & People Alexander Matthey share practical takeaways from front-line deployments across automotive and defense.

The session covers:

Why most generative LLM pilots stall in complex engineering contexts
What to do instead: proven practices for structured data and contextual AI
A live demo of SPREAD’s AI-Mapper and graph-based inference in action
How leading OEMs saw measurable gains, in weeks, not quarters

Discover how leading OEMs are successfully implementing AI to overcome complex engineering challenges and deliver tangible ROI.

See it in action

Looking to move beyond pilots and apply AI where it delivers real operational impact? Our solution consultants work directly with OEMs and suppliers to design scalable, production-ready AI workflows, grounded in your data, systems, and engineering priorities. Talk to an Expert at SPREAD.

Sources & Links:

McKinsey & Company, January 2025, Software-defined hardware in the age of AI

https://www.mckinsey.com/industries/automotive-and-assembly/our-insights/software-defined-hardware-in-the-age-of-ai
Boston Consulting Group (BCG), Scaling AI: Lessons from the Leaders (2025)

https://www.bcg.com/publications/2025/scaling-ai-lessons-from-the-leaders
TechRadar, Why 75% of AI projects never scale (2025)

https://www.techradar.com/news/why-75-percent-of-ai-projects-never-scale