
ChatGPT, Gemini, Claude: The best-known AI tools are proprietary, cost ongoing fees and process company data on external servers. At the same time, an open-source AI landscape is growing, which represents a strategically useful alternative for SMEs in many areas. Open open source AI models offer three advantages that are decisive for companies: full control over their own data, no ongoing license costs and the ability to adapt models to their own requirements.
According to one Linux Foundation analysis 89 percent of organizations using AI are already using open source components in their infrastructure, and around 63 percent work directly with open models. Open-source AI is no longer a niche topic, but an integral part of modern AI strategies.
This article shows which open-source AI models decision makers should know in 2026, where the differences between true open source and so-called open-weights models lie, and which model is suitable for which purpose.
The term open source originally comes from software development and describes programs whose source code is publicly available. Anyone can view, change and share the code. With classic open-source software such as Linux or Firefox, the matter is clear: The source code is under an open license and the community can further develop it.
Things get more complicated when it comes to artificial intelligence. An AI model consists not only of code, but also of trained weights, training data and a model architecture. At the end of 2024, the Open Source Initiative (OSI) launched the Open Source AI Definition (OSAID) presented a binding standard for the first time. Accordingly, an AI system must guarantee four freedoms: It must be usable, investigable, modifiable and shareable for every purpose.
In practice, only a few AI models fully meet this strict definition. It is therefore important to make a distinction that has a direct impact on procurement and legal departments for companies.
The open-source AI landscape can be divided into four categories, which are relevant for different applications in the company.
The following selection is based on five criteria that are decisive for enterprise use: license and commercial usability, ecosystem maturity and availability in common toolchains, operating models such as self-hosting and on-premises, voice quality in German and European, and the ratio of computing effort to performance.
With Mixtral, Mistral AI from Paris has created a model that offers a strong cost-performance ratio. The model uses a Mixture of Experts (MoE) architecture, in which only the relevant expert modules are active, depending on the task. The result: solid performance with significantly lower computing costs. Mixtral is licensed under the Apache 2.0 license and can therefore be used commercially without restrictions.
Mixtral is particularly relevant for companies when it comes to internal chatbots, text generation, code assistance or document analysis, but infrastructure costs should remain manageable. The describes how companies strategically introduce such AI applications Guide to AI implementation in SMEs.
Qwen 2.5 is a family of models from Alibaba and is one of the most powerful open-source language models on the market. The models are available in various sizes, from compact variants to powerful versions for complex tasks. Most are under the Apache 2.0 license, but there are individual model sizes with different license conditions.
Qwen 2.5 offers a good balance between performance and resource requirements. Companies that do not immediately need the largest model variant will find a solid start in the smaller versions.
Microsoft's phi models belong to the category of small language models (SLMs) and show that size isn't everything. Phi models are significantly more compact than their large counterparts, but provide good results for many practical tasks. They run on cheaper hardware, sometimes even on end devices, and are licensed under the MIT license.
Phi models are particularly interesting for companies when AI is to run directly on a device (edge computing), for example for field service tools or production environments without a cloud connection. Which data protection issues should be clarified before using AI is shown by Blog articles on data security and AI.
OlMo 2 from the Allen Institute for AI (AI2) follows a consistently open approach. AI2 not only discloses the weights, but also the complete training code, training data, and evaluation methods. OLMo 2 is therefore one of the few models that actually meets the strict OSAID definition and is licensed under the Apache 2.0 license.
OlMo 2 is a particularly relevant option for organizations that require traceability and transparency — for example in the public sector, in research or in regulated industries.
Practitioners will be showing how SMEs operate open-source models on their own infrastructure while ensuring data sovereignty at the d:u26 on March 26 & 27 in Münster. You can find an overview of all speakers here.
Anyone who wants to build an AI in a company that can access their own documents or knowledge databases needs an embedding model. BGE-M3 from the Beijing Academy of Artificial Intelligence is one of the most powerful multilingual embedding models and is licensed under the MIT license.
In addition to English, BGE-M3 also supports German and is therefore suitable for companies in the DACH region that work with mixed-language databases. It provides the technical basis for AI to be able to access corporate knowledge in a targeted manner.
OpenAI is primarily known for ChatGPT, but has also published an open-source model with Whisper, which is of immediate benefit to many companies. Whisper transcribes spoken language in over 90 languages and is licensed under the MIT license. Self-hosting keeps all audio data in the company, which can be relevant for meeting minutes, compliance-relevant recordings or the processing of customer calls.
For many teams, automatic transcription of meetings is the easiest way to get started productively using AI.
Stable Diffusion XL is one of the most widely used models for AI-based image generation. The model is licensed under the CreativeML Open Rail++-M license, which generally allows commercial use but contains certain usage restrictions. It is therefore not open source in the traditional sense, but an open model with high practical relevance.
Stable Diffusion XL is particularly interesting for companies in marketing and product visualization. Questions about copyright and trademark risk should be clarified in advance, as the legal framework for image AI is still in a state of flux. An overview of the current AI landscape shows which AI tools are available on the market in total.
Anyone who wants to experiment with semantic search or RAG systems will almost necessarily come across ALL-MiniLM-L6-v2. The model from the Sentence-Transformers project is one of the most frequently used embedded classics and runs reliably even on moderate hardware. It is licensed under the Apache 2.0 license and is suitable as a baseline for initial experiments before switching to specialized models such as BGE-M3 later.
In addition to the models mentioned, there are three other names that come up in any discussion about open AI models and that decision makers should know — even if, strictly speaking, they do not fall under the open-source definition.
Llama (currently in version 3.1) is one of the most used open language models worldwide. However, Llama is under its own community license, which requires a special license for platforms with more than 700 million monthly active users. The OSI expressly does not classify Llama as open source.
Google's Gemma models are marketed as open models, but are subject to their own terms of use, which do not fully comply with classic open-source freedoms. They can still be used for many business scenarios — but the license conditions should be checked before productive use.
DeepSeek attracted attention with the R1 model series because the models were able to keep up with significantly more expensive proprietary systems in certain reasoning benchmarks. However, Chinese AI models bring with them additional questions, including data processing and regulatory aspects, which companies in the DACH region should clarify before deployment.
Anyone using a model under Apache 2.0 or MIT has significantly more legal leeway than with a model with a community license. For legal departments and procurement, it makes a significant difference whether a model uses permissive open source licenses or is subject to its own terms of use.
Choosing the right open-source AI model depends on four practical questions.
Tools such as Ollama or vLLM make it much easier to get started with self-hosting today than it was two years ago.
Local AI infrastructure, edge AI, and hybrid architectures — At d:u26, data experts share their experiences from specific projects. Find out why the d:u26 is the right event for you and your team.
Open-source AI models have reached a level in 2026 that not only enables productive use by SMEs, but also makes them the most economically and strategically viable option in many scenarios. The models are powerful, the open source licenses are mostly business-friendly and the toolchains for deployment and operation are sophisticated.
The key difference to proprietary AI solutions lies in control. With open source, companies decide for themselves where their data is processed, which models are used and how to adapt them to their own requirements.
Getting started can start with manageable effort: a pilot test with Whisper for meeting transcriptions, an initial document search with BGE-M3, or an internal chatbot based on Mixtral. Each of these steps provides concrete insights into what open-source AI can do in its own context.
You can find out how other SMEs have integrated open-source AI models into their processes on the data:unplugged festival 2026 on March 26 & 27 in Münster. On the Mittelstand Blazers Stage, companies share their experiences. In the master classes, it becomes specific: Which open-source AI models are suitable for getting started? How does self-hosting work in medium-sized companies?
Open-source AI affects not only the IT department, but all areas — from marketing to product development to the legal department. data:unplugged stands for practical, cross-departmental knowledge transfer — from which the entire team benefits. Get a group ticket for your entire business tea now!