Bernard Sonnenschein

2.3.2026

Open-source AI for companies: The 8 most important models at a glance

Glowing AI cube in a digital data landscape as a symbol for open-source AI models

ChatGPT, Gemini, Claude: The best-known AI tools are proprietary, cost ongoing fees and process company data on external servers. At the same time, an open-source AI landscape is growing, which represents a strategically useful alternative for SMEs in many areas. Open open source AI models offer three advantages that are decisive for companies: full control over their own data, no ongoing license costs and the ability to adapt models to their own requirements.

According to one Linux Foundation analysis 89 percent of organizations using AI are already using open source components in their infrastructure, and around 63 percent work directly with open models. Open-source AI is no longer a niche topic, but an integral part of modern AI strategies.

This article shows which open-source AI models decision makers should know in 2026, where the differences between true open source and so-called open-weights models lie, and which model is suitable for which purpose.

What is open-source AI?

The term open source originally comes from software development and describes programs whose source code is publicly available. Anyone can view, change and share the code. With classic open-source software such as Linux or Firefox, the matter is clear: The source code is under an open license and the community can further develop it.

Why AI makes the definition more complicated

Things get more complicated when it comes to artificial intelligence. An AI model consists not only of code, but also of trained weights, training data and a model architecture. At the end of 2024, the Open Source Initiative (OSI) launched the Open Source AI Definition (OSAID) presented a binding standard for the first time. Accordingly, an AI system must guarantee four freedoms: It must be usable, investigable, modifiable and shareable for every purpose.

In practice, only a few AI models fully meet this strict definition. It is therefore important to make a distinction that has a direct impact on procurement and legal departments for companies.

Permissive open source licenses vs. open weights

Permissive licenses (Apache 2.0, MIT): Allow commercial use, modification and redistribution with almost no restrictions. They offer the greatest flexibility for enterprise use and are generally legally uncomplicated.
Open-Weights/Open-Model Approaches: They are often marketed as open source, but only reveal their weights. They come with their own license terms, which may restrict or attach conditions to commercial use. Meta's Llama and Google's Gemma fall into this category. Before use, the license should be checked — preferably together with the legal department.

‍

What open-source AI is there?

The open-source AI landscape can be divided into four categories, which are relevant for different applications in the company.

Four categories at a glance

Language models for text and code: Basis for chatbots, text generation, summaries, code assistance and knowledge management. The area in which open source is most developed and offers the widest variety of models.
Embedding models: Make texts searchable for machines. They convert documents, emails, or knowledge databases into numeric vectors so that AI systems can semantically search. No RAG system and no intelligent company search work without embeddings.
Speech-to-text models: Transcribe spoken language into text. Relevant for meeting minutes, call center analyses, compliance documentation, or preparing interviews and podcasts.
Image generation models: Create or edit images based on text descriptions. In a corporate context, it is particularly relevant for marketing, product visualization and internal creative processes.

‍

Open-source AI models: The 8 most important picks for companies

The following selection is based on five criteria that are decisive for enterprise use: license and commercial usability, ecosystem maturity and availability in common toolchains, operating models such as self-hosting and on-premises, voice quality in German and European, and the ratio of computing effort to performance.

Mixtral 8x7B from Mistral AI: The efficiency champion

With Mixtral, Mistral AI from Paris has created a model that offers a strong cost-performance ratio. The model uses a Mixture of Experts (MoE) architecture, in which only the relevant expert modules are active, depending on the task. The result: solid performance with significantly lower computing costs. Mixtral is licensed under the Apache 2.0 license and can therefore be used commercially without restrictions.

Mixtral is particularly relevant for companies when it comes to internal chatbots, text generation, code assistance or document analysis, but infrastructure costs should remain manageable. The describes how companies strategically introduce such AI applications Guide to AI implementation in SMEs.

Qwen 2.5 from Alibaba Cloud: The all-rounder

Qwen 2.5 is a family of models from Alibaba and is one of the most powerful open-source language models on the market. The models are available in various sizes, from compact variants to powerful versions for complex tasks. Most are under the Apache 2.0 license, but there are individual model sizes with different license conditions.

Qwen 2.5 offers a good balance between performance and resource requirements. Companies that do not immediately need the largest model variant will find a solid start in the smaller versions.

Phi from Microsoft: Small models, big benefits

Microsoft's phi models belong to the category of small language models (SLMs) and show that size isn't everything. Phi models are significantly more compact than their large counterparts, but provide good results for many practical tasks. They run on cheaper hardware, sometimes even on end devices, and are licensed under the MIT license.

Phi models are particularly interesting for companies when AI is to run directly on a device (edge computing), for example for field service tools or production environments without a cloud connection. Which data protection issues should be clarified before using AI is shown by Blog articles on data security and AI.

OLMo 2 from AI2: Completely open, completely comprehensible

OlMo 2 from the Allen Institute for AI (AI2) follows a consistently open approach. AI2 not only discloses the weights, but also the complete training code, training data, and evaluation methods. OLMo 2 is therefore one of the few models that actually meets the strict OSAID definition and is licensed under the Apache 2.0 license.

OlMo 2 is a particularly relevant option for organizations that require traceability and transparency — for example in the public sector, in research or in regulated industries.

Practitioners will be showing how SMEs operate open-source models on their own infrastructure while ensuring data sovereignty at d:u27 on 13 & 14 April 2027 in Münster.

BGE-M3 from BAAI: The backbone of intelligent search

Anyone who wants to build an AI in a company that can access their own documents or knowledge databases needs an embedding model. BGE-M3 from the Beijing Academy of Artificial Intelligence is one of the most powerful multilingual embedding models and is licensed under the MIT license.

In addition to English, BGE-M3 also supports German and is therefore suitable for companies in the DACH region that work with mixed-language databases. It provides the technical basis for AI to be able to access corporate knowledge in a targeted manner.

Whisper from OpenAI: language transcription that works

OpenAI is primarily known for ChatGPT, but has also published an open-source model with Whisper, which is of immediate benefit to many companies. Whisper transcribes spoken language in over 90 languages and is licensed under the MIT license. Self-hosting keeps all audio data in the company, which can be relevant for meeting minutes, compliance-relevant recordings or the processing of customer calls.

For many teams, automatic transcription of meetings is the easiest way to get started productively using AI.

Stable Diffusion XL from Stability AI: images from text

Stable Diffusion XL is one of the most widely used models for AI-based image generation. The model is licensed under the CreativeML Open Rail++-M license, which generally allows commercial use but contains certain usage restrictions. It is therefore not open source in the traditional sense, but an open model with high practical relevance.

Stable Diffusion XL is particularly interesting for companies in marketing and product visualization. Questions about copyright and trademark risk should be clarified in advance, as the legal framework for image AI is still in a state of flux. An overview of the current AI landscape shows which AI tools are available on the market in total.

All-MiniLM-L6-v2: Getting started with embeddings

Anyone who wants to experiment with semantic search or RAG systems will almost necessarily come across ALL-MiniLM-L6-v2. The model from the Sentence-Transformers project is one of the most frequently used embedded classics and runs reliably even on moderate hardware. It is licensed under the Apache 2.0 license and is suitable as a baseline for initial experiments before switching to specialized models such as BGE-M3 later.

‍

Open weights are not open source: classifying Llama, Gemma and DeepSeek correctly

In addition to the models mentioned, there are three other names that come up in any discussion about open AI models and that decision makers should know — even if, strictly speaking, they do not fall under the open-source definition.

Llama by Meta

Llama (currently in version 3.1) is one of the most used open language models worldwide. However, Llama is under its own community license, which requires a special license for platforms with more than 700 million monthly active users. The OSI expressly does not classify Llama as open source.

Gemma from Google

Google's Gemma models are marketed as open models, but are subject to their own terms of use, which do not fully comply with classic open-source freedoms. They can still be used for many business scenarios — but the license conditions should be checked before productive use.

DeepSeek from China

DeepSeek attracted attention with the R1 model series because the models were able to keep up with significantly more expensive proprietary systems in certain reasoning benchmarks. However, Chinese AI models bring with them additional questions, including data processing and regulatory aspects, which companies in the DACH region should clarify before deployment.

Why differentiation is important for companies

Anyone using a model under Apache 2.0 or MIT has significantly more legal leeway than with a model with a community license. For legal departments and procurement, it makes a significant difference whether a model uses permissive open source licenses or is subject to its own terms of use.

‍

Decision-making aid: Which model is suitable for which application?

Choosing the right open-source AI model depends on four practical questions.

Four questions about model selection

What exactly should AI do? Mixtral or Qwen 2.5 are suitable for text generation, chatbots and code assistance. Embedding models such as BGE-M3 are needed for company search and RAG. For transcription, Whisper is the most pragmatic solution.
What hardware is available? Large models like Mixtral 8x7B require powerful GPUs. Microsoft's phi models, on the other hand, also run on consumer hardware or edge devices.
How strict are the data protection and compliance requirements? When data is not allowed to leave the company, self-hosting is mandatory. All models mentioned in this article can be operated on-premises.
How important is license clarity? For the most legally secure route, models running Apache 2.0 or MIT are recommended. For open-weight models such as Llama or Gemma, an individual license check is recommended.

Tools such as Ollama or vLLM make it much easier to get started with self-hosting today than it was two years ago.

Local AI infrastructure, edge AI, and hybrid architectures — at our events, data experts share their experiences from specific projects.

‍

Conclusion: Open source makes AI controllable

Open-source AI models have reached a level in 2026 that not only enables productive use by SMEs, but also makes them the most economically and strategically viable option in many scenarios. The models are powerful, the open source licenses are mostly business-friendly and the toolchains for deployment and operation are sophisticated.

The key difference to proprietary AI solutions lies in control. With open source, companies decide for themselves where their data is processed, which models are used and how to adapt them to their own requirements.

Getting started can start with manageable effort: a pilot test with Whisper for meeting transcriptions, an initial document search with BGE-M3, or an internal chatbot based on Mixtral. Each of these steps provides concrete insights into what open-source AI can do in its own context.

Anyone looking to roll out open-source AI inside their own company will find the direct exchange they need at the d:u27 Festival on 13 & 14 April 2027 in Münster. Six stages, 350+ speakers, 80+ masterclasses and over 250 exhibitors, covering data platforms, self-hosting and AI governance — with IT leads and AI heads who've already worked through the rough patches. Get your ticket for d:u27 now.

Weitere Blogs