The SDV Complexity Problem Is a Data Problem in Disguise
The shift to software-defined vehicles (SDVs) is one of the biggest architectural changes in automotive history. Today's vehicles rely on up to 150 electronic control units (ECUs) across distributed architectures, running more than 100 million lines of software code. The SDV transition consolidates these distributed ECUs into fewer, more powerful compute platforms: domain controllers, zone controllers, and central vehicle computers. This doesn't reduce complexity. It concentrates it. Software that used to run in isolation on dedicated hardware now shares centralized platforms, creating deep interdependencies across functions that never had to interact before, spanning infotainment, ADAS, powertrain, OTA updates, and regulatory compliance. A single software update can affect hundreds of dependent features in ways that are hard to predict without a full picture of the underlying dependencies.
The fundamental challenge this introduces is one of systemic visibility. Traditional engineering organizations were architected around specialized, domain-bounded teams, each exercising sovereign ownership over a discrete functional area. That organizational paradigm was adequate when vehicles operated as loosely coupled collections of independent subsystems. It becomes structurally untenable when a single software revision propagates through shared compute platforms, triggering cascading effects across hundreds of interdependent features and functions, spanning hardware, software, and system-of-systems boundaries, in ways that no individual team possesses the contextual breadth to fully anticipate or characterize.
Establishing true traceability and comprehending the full downstream impact of any engineering decision, whether a requirements evolution, a component-level modification, or a software deployment, demands a unified, holistic view that simultaneously spans all of these domains and their intricate interdependencies. In the vast majority of organizations today, that connected, cross-domain vantage point simply does not exist.
The problem is not a shortage of data. Automotive OEMs produce vast volumes of engineering information across every phase of the product lifecycle. The problem is fragmentation. That data resides in dozens of disconnected systems of record, each owned by a different team, each governed by its own terminology, identifiers, and data structures. No unifying model connects a component as it is described in an R&D requirements document to that same component generating repeat failures in workshops around the world, or traces a diagnostic trouble code (DTC) back to the original design decisions that produced it. The consequence is staggering inefficiency: engineers spend 50 to 70 percent of their time searching for information and manually cross-referencing data across tools. Not because they lack discipline or rigor, but because the interdependencies between systems, components, and decisions are simply not visible anywhere in the organization. The data exists. What does not exist is a connected, contextualized knowledge architecture that makes those interdependencies transparent, navigable, and actionable.
The consequences show up across the entire product lifecycle. In R&D, teams miss SOP milestones because the downstream impact of a change on dependent functions, software modules, or hardware components only becomes visible during late-stage vehicle integration testing, precisely when remediation is slowest and most expensive. In aftersales, technicians confront escalating repair complexity as software-defined features make fault diagnosis increasingly difficult without access to the underlying engineering context. Warranty costs climb. Customer satisfaction erodes.
AI can change this. It can make interdependencies visible, surface the right context at the right moment, and eliminate the overhead coordination that consumes engineering capacity today. But AI cannot solve a problem it cannot see. Without a connected, contextualized data foundation, even the most capable model operates on the same disconnected fragments engineers already struggle with. It can retrieve information that sounds relevant, but it cannot reason about how things fundamentally relate. Its output may appear plausible, but in a safety-regulated industry, plausible is not sufficient. This is precisely why so many AI pilots in automotive engineering stall at the prototype stage. The missing ingredient is not a more powerful model. It is the domain-specific data foundation that model requires to operate reliably.
This article explains why engineering-specific ontologies and knowledge graphs from SPREAD provide that foundation, how an architecture purpose-built on AWS enables enterprise scale, and what one global OEM achieved by putting it into production.
Why Generic AI Infrastructure Is Not Enough
Many organizations have responded to the data challenge by investing in general-purpose AI infrastructure: data lakes and large language models. These are necessary starting points. But they do not solve the deeper problem.
Consider a question central to automotive engineering operations: "Which vehicle variants are affected by this software update?" Answering it correctly means traversing typed relationships across requirements, functional architecture, logical design, physical components, software versions, and production configurations. A general-purpose AI system can surface relevant-sounding content. What it can't do is reason about what those relationships mean: it has no model of the domain, no understanding that a software component depends on a specific ECU, that the ECU is part of a functional chain tracing back to a set of requirements, or that a change to any one of them ripples across variants. Without that, answers are structurally unreliable regardless of how much data the model has access to.
The missing layer isn't more data or a more powerful model. It's an engineering-specific ontology: a formal, machine-readable model of the concepts and relationships that govern how automotive products are designed, built, and maintained. Building that layer on top of scalable cloud infrastructure is what the architecture below is designed to do.
A Reference Architecture for Applied AI Across the Product Lifecycle
Scaling AI from isolated pilots to enterprise-wide impact takes a layered architecture designed for the realities of industrial product development. The architecture below, informed by deployments at leading automotive OEMs, is organized from foundational infrastructure at the bottom to AI-powered workflows at the top. SPREAD's Engineering Intelligence Platform sits in the critical middle layers.
The architecture consists of four distinct layers.
Layer 1: Infrastructure Layer
AWS provides the foundation: compute, networking, and enterprise-grade security controls built for the compliance and data sovereignty requirements of global automotive manufacturers. It also can provide various services that support the layers above: object storage and data lake services for data consolidation, ETL services for ingesting and harmonizing data from heterogeneous source systems, managed graph database services for storing and querying the knowledge graph, governance and access management for data quality and compliance at scale, and foundation model access for generative AI.
Layer 2: Data Layer
The data layer covers the core systems of record teams use daily: ALM, PLM, ERP, MES, project management tools, and others. The architecture connects these systems. It doesn't displace them. Where manufacturers have already consolidated these sources into a centralized data lake, this layer also serves as the integration and normalization foundation. Not every organization has reached that point yet. Where a central data lake doesn't exist, the ontology layer can connect directly to source systems, though with more integration effort.
Layer 3: Semantic Layer
SPREAD provides this key layer through a semantic ontology built for engineering. It formally defines the concepts, attributes, and relationships that matter in the automotive domain: requirements, functions, ECUs, software components, signals, connectors, variants, failure codes, and the typed relationships between them. Data from the layer below is mapped onto this ontology, turning isolated records into a connected, contextualized knowledge graph. Built on seven years of field experience across leading manufacturers, it isn't rebuilt from scratch for each customer. Each customer's data maps onto the same proven model, cutting time to value.
This connected multi-layer foundation makes possible a class of queries and workflows that fragmented tool landscapes can't support: identifying every vehicle variant affected by a specific ECU software change, tracing a field DTC back to the requirements and design decisions that produced it, or assessing the downstream impact of a requirements change before it reaches vehicle integration testing.
The ontology isn't a closed island either. It's designed to connect and align with other semantic models in the enterprise, from supply chain to aftersales and finance. That openness is deliberate: it lets the knowledge graph grow in scope without re-architecting the foundation.
Layer 4: Use Case Layer
The top layer is where business value is realized, in two tiers. SPREAD's domain-specific apps and agents deliver productized workflows grounded in the full engineering knowledge graph, covering requirements management, change and architecture management, error and ticket management, and custom use cases organizations build themselves, delivered as SaaS applications on AWS. Above them, overarching agents operate across all domain workflows at once. Both tiers are open: third-party apps and agents can connect directly to the ontology layer, and the architecture accommodates whatever tools best fit each organization's context.
Amazon’s Bedrock service can power the generative AI across this layer: translating natural language questions into structured queries against the knowledge graph, extracting entities and relationships from unstructured engineering documents, and letting AI agents act on engineering context autonomously. The knowledge graph is what makes all of this trustworthy and auditable. Outputs trace back to verified engineering data rather than model inference alone.
From Architecture to Impact: A Global OEM's Aftermarket Transformation
A leading global automotive manufacturer deployed this architecture to modernize its aftermarket operations. As its portfolio shifted toward software-defined architectures, repair speed and quality worsened: technicians across a global workshop network were diagnosing electrical/electronic (E/E) and software-related failures using static documentation that couldn't keep pace with modern vehicle complexity. With no reliable way to trace a trouble code to the relevant ECUs, wiring diagrams, or variant-specific repair steps, repeat repairs were common and warranty costs were rising.
The solution was an engineering knowledge graph powered by SPREAD and deployed on AWS, connecting the OEM's R&D data with field service and warranty data. Technicians got VIN-exact access to the vehicle's actual architecture: the specific ECUs installed, the wiring diagram, component-level diagnosis down to the connector and pin, and repair paths ranked by that vehicle's variant configuration and historical outcomes. The impact was measurable across three dimensions:
Cost: Higher first-time-right rates take thousands of repeat repairs off the books each year. Fewer misdiagnoses mean less wasted workshop capacity and lower labor, parts, and administration costs per claim.
Speed: Fault tracing that previously required hotline escalation, taking days and tying up scarce experts, now resolves in minutes at the workshop. Technicians go from trouble code to confirmed repair path without leaving the platform.
Quality: VIN-exact guidance reduces misdiagnosis and over-repairs, improving consistency across thousands of workshops and driving higher customer satisfaction.
Three Key Takeaways for Teams From This Solution Journey
Invest in Ontology: AI output quality is bounded by the quality of the knowledge it works from. Domain-specific ontology design is the highest-leverage investment in an engineering AI program. A strong semantic layer also gives you the flexibility to swap tools over time without rebuilding the foundation or creating new lock-in.
Start with a high-value, bounded use case: Aftermarket fault diagnosis is a practical starting point: it has a clear business metric, a well-defined data scope, and a user group with genuine motivation to adopt. The ontology built for it becomes the foundation for R&D and production use cases as scope expands.
Design for open APIs from day one: The value of the knowledge graph compounds as more data sources connect to it. Every source system must expose data through open APIs. Proprietary integrations recreate the silos the architecture is designed to eliminate, and they become an increasing liability as the number of connected use cases grows.
Conclusion
The SDV is not only a software challenge. It's a data architecture challenge. OEMs that build an engineering-specific semantic layer, deploy it on scalable cloud infrastructure, and ground their AI in verified engineering knowledge will be better placed to cut warranty costs, shorten development cycles, and run aftermarket at the quality levels SDV complexity demands. That requires the right foundation: AWS for scalable, secure cloud infrastructure and AI services; SPREAD for the Engineering Intelligence layer that makes it meaningful.
To learn more about how SPREAD and AWS are helping automotive manufacturers build their Engineering Intelligence foundation, contact Alexander Matthey (alexander.matthey@spread.ai), SVP GTM at SPREAD or reach out to your AWS account team.