Bernard Sonnenschein

29.4.2026

Data architecture: What it is, which components it has and how SMEs set it up

Close-up of a server rack overlaid with projected program code and digital data structures

Many companies are sitting on a paradox: they have been collecting more and more data for years — and they can barely turn it into value. According to the Bitkom Data Economy Study 2025, only six percent of German companies fully exploit the potential of their data. Around 60 percent use it little or not at all. At the same time, more and more companies expect data to become the central success factor within the next few years.

The answer to this paradox rarely lies in more tools or more data. It lies in a well-designed data architecture. Without a clear structure, even the best data assets stay what they often are: scattered, inconsistent, inaccessible. And therefore unsuitable for AI projects, data-driven decisions or any kind of scaling.

This article shows what data architecture actually is, which building blocks it consists of, and how SMEs build it pragmatically — without immediately thinking in enterprise dimensions. A modern data architecture isn't just an IT topic. It's a business decision.

What is data architecture?

Data architecture is a sub-discipline of IT architecture. It describes the overarching structure used by a company to collect, store, process, use and protect data. The English term "data architecture" has long since become standard in German usage too.

At its core, three layers interlock: the technical infrastructure of databases, data lakes and data warehouses; the organisational rules around responsibilities and standards; and the functional models that describe how information is structured and related. A good data architecture connects these three layers into a consistent framework and defines the principles by which data is managed across the company.

The distinction from a data model matters. A data model describes individual data objects, data structures and their relationships. Data architecture sets the overarching frame in which these models exist. The Data Management Body of Knowledge (DMBOK) describes data architecture as the plan by which an organisation's data assets are managed — data modelling is one tool within this plan.

Put differently: data architecture is the blueprint, data modelling the structural engineering of individual rooms. Set up properly, it lets growing data volumes deliver real value — for analytics, machine learning or operational processes.

‍

The five building blocks of a modern data architecture

Every modern data architecture comes back to five core components. The specific technologies vary from company to company; the structure stays remarkably constant. Knowing these five elements helps you decide which architecture fits your own company.

Data sources and data ingestion: the start that determines everything

At the start of every data architecture sit the data sources. In an average mid-sized company, that quickly adds up to 20 to 30 different systems:

ERP, CRM, inventory management
Production and machine data, sensors
Web shop, marketing tools, analytics platforms
HR and payroll systems
And often countless Excel files and shared drives

Data is generated in a wide range of formats, structures and types — from classic databases through log files to sensor data from production.

Data ingestion integrates these sources. Classic ETL processes (Extract, Transform, Load) extract data, bring it into a unified format and load it into a central store. Modern variants work with ELT or with streaming approaches that move data in near real time. For many SMEs, a clean ETL process is the first concrete step towards consolidated company data.

The key point: without clean ingestion, no reliable foundation emerges. This is exactly where most data silos form, slowing down analytics and AI projects later. The earlier an organisation tackles this in a structured way, the easier everything that follows becomes.

Data storage: data warehouse, data lake or lakehouse

The second building block is storage. Three basic models have established themselves — and in practice, they are often combined.

The data warehouse stores structured data in a clearly defined schema. It's the classic foundation for business intelligence and reporting — proven for decades, still the backbone of many companies' data architecture. The strengths: fast query performance, clean data quality, mature tools. The weakness: a classic warehouse is inflexible when it comes to unstructured data, large data volumes or new data types.

The data lake takes the opposite approach. It stores data in its raw format — structured, semi-structured or unstructured. That makes it particularly suited to big data, machine learning and exploratory analytics. A data lake can bring together data volumes from a wide range of sources without forcing them into a rigid schema first. The flip side: without good governance, a data lake quickly turns into a data swamp where no one finds what they need. A lake without structure is worse than no central storage at all.

The data lakehouse combines both worlds. It pairs the flexibility of a data lake with the structure and performance of a data warehouse. For many companies, the lakehouse is the most pragmatic choice today because it scales while staying analytics-ready. Particularly in architectures meant to support both classic reporting and AI applications, the lakehouse principle has gained significant ground in recent years.

Which option is the right one depends on the actual requirements. A B2B services company with clear reporting needs often gets along with a cloud data warehouse. An industrial company with production and sensor data tends to benefit more from a data lake or lakehouse. Many companies opt for combinations — a warehouse for structured reporting, a lake for exploratory analysis.

Data processing and analysis: from raw material to information

Stored data alone delivers no value. Only processing turns raw data into usable information. That includes transformation, cleansing, enrichment and aggregation — all the steps that turn distributed raw data sets into consistent data assets.

The actual applications sit on top of this: business intelligence for reports and dashboards, data analysis for deeper questions — and increasingly machine learning and artificial intelligence for predictions, pattern recognition and automation. How SMEs make the leap from a mountain of data to data-driven decisions is one of the central challenges in this space.

Data access and data mesh: who gets to see what?

The fourth building block governs who can access which data. That sounds trivial but is one of the most common stumbling blocks in practice. When every department maintains its own views, parallel truths emerge. When access is too restrictive, teams block each other.

Modern data architectures rely here on role-based access controls and clearly defined data products. The data mesh approach takes this logic to its conclusion. Data mesh is less a technology than an organisational principle: every business domain is responsible for its own data, provides it as a data product, and ensures its quality. The central platform delivers the technical foundation; functional ownership sits in the teams. Data mesh shifts the question from "Who hosts the data?" to "Who is functionally responsible for it?".

Data mesh suits larger companies with complex structures and many data owners. For smaller SMEs, the approach is often oversized — a centrally managed warehouse or lakehouse usually works better. The comparison is still worth making: data mesh makes visible how important functional data ownership is, even when the architecture is ultimately built centrally.

Data governance and security: the foundation that carries everything

The fifth building block is the foundation that holds everything together. Data governance regulates responsibilities, quality standards, security measures and compliance with legal requirements. It is relevant for companies in every industry, regardless of which technical architecture has been chosen.

Without governance, even the best architecture remains a patchwork. Data quality drops, data security becomes patchy, and GDPR audits get uncomfortable. Good governance assigns data owners in the business, documents data flows and gives data models, processes and data management a binding frame. As data volumes grow, governance becomes a bottleneck quickly if it isn't built in from the start.

‍

Data architecture in practice: three examples from the Mittelstand

What does this look like concretely? From conversations with mid-sized companies, three typical patterns emerge.

Mechanical engineering: predictive maintenance on a lakehouse foundation

A mid-sized mechanical engineering company connects production data, ERP information and customer data in a central data lakehouse. Predictive maintenance models running on machine learning sit on top, predicting failures before they happen.

The architecture wasn't built in one big push but step by step — first the production data, then the ERP integration, then the machine learning models. Development continues. Each new application connects to the existing structure.

Retail: consistent numbers across all stores

A retail company with multiple stores had wrestled for years with inconsistent sales data. Every store, every sales channel, every tool delivered its own numbers.

Building a shared data architecture with a clear data warehouse and defined data models didn't just radically simplify reporting; it also created the foundation for personalised customer engagement and data-driven product decisions. Business intelligence shifted from time-eater to decision tool.

B2B services: deliberately lean

A B2B services company kept its data architecture deliberately small: a cloud data warehouse, clean ETL processes from CRM and billing systems, plus BI dashboards for the management team. No data mesh, no complex lakehouse, no AI labs.

The architecture fits the size and requirements of the company — and that is exactly the point. A modern data architecture doesn't have to be large to be effective.

‍

How SMEs start pragmatically

The most common mistake when building a data architecture is taking enterprise frameworks as the reference. What SAP, IBM or AWS describe for large corporations can overwhelm mid-sized structures — and distract from the essentials.

Start with business requirements, not technology

A pragmatic approach starts with the business requirements, not the technology. Which decisions should improve with data? Which processes should be automated? Which applications are concretely on the agenda? The answers determine which data sources are relevant, which storage approach fits and which analytics components make sense.

Three steps for getting started

A sensible entry point consists of three steps:

Inventory. Which data sources exist, what data volumes are generated, which systems store what? An honest inventory often delivers surprising insights.
Define the target picture. What does the data architecture look like in two or three years? Which business and data strategy should it support?
Step-by-step build. Which use case can be implemented in the next six months? Start there — and learn from it before the next building block follows.

The mindset matters: a data architecture isn't a project with an end date. It's a continuous evolution that adapts to business requirements and technologies. Anyone who has understood that builds differently — less in perfection, more in iterations.

These exact build paths get shared by data leads and AI leads from the Mittelstand at the d:u27 Festival on 13 & 14 April 2027 in Münster. In dedicated round-table formats and on the SME Stage, the question on the table is how companies build their data platforms step by step so they can carry analytics, AI models and scaling processes at the same time.

‍

Typical pitfalls

Three mistakes keep coming up — and they all share the same root: too much technology, too little strategy.

Technology-driven build. A company picks a tool before it's clear which problems should be solved. The result: an expensive platform that no one uses because it doesn't match actual requirements.

Missing governance from the start. The architecture gets built; the rules come later — under stress, when the first data protection incidents happen or data quality starts to crumble. Governance has to be planned in before the first data pipeline runs.

Silos through the back door. Every department builds its own little data architecture because the central initiative is too slow. The result: new data silos, just with more modern technology this time. Avoiding this requires clear responsibilities and a binding framework for the whole organisation.

‍

Conclusion: data architecture is the precondition for everything else

Data architecture isn't an IT niche topic. It's the precondition for analytics to work, for AI projects to deliver results, and for companies to make data-driven decisions. Without it, data stays what it often is: scattered, inconsistent — and therefore worthless.

For the Mittelstand, that doesn't mean burying the next two years in a foundational architecture project. It means starting pragmatically, knowing the building blocks, and step by step building a foundation that grows with you. Data sources, storage, processing, access, governance — anyone who addresses these five dimensions cleanly has the foundation for everything else.

The path forward becomes most tangible in direct exchange with companies that are already on it: at the d:u27 Festival on 13 & 14 April 2027 in Münster. Six stages, 350+ speakers, 80+ masterclasses and over 250 exhibitors — and every question on the table around data platforms, agentic AI and the scaling of data-driven products, with data and AI leads who have already been through the painful phases. Get your ticket for d:u27 now.

Weitere Blogs