Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the AI world

29.1.2025 | 8 minutes reading time

On January 27, 2025, the technology stock exchange experienced an unexpected crash: The NVIDIA stock price plummeted by over 17%, temporarily wiping out nearly $600 billion in market value and setting a new historical record in the stock market. Many other US tech companies also faced massive losses that day. In total, about $1 trillion in market capitalization was burned within hours [1]. The trigger? The introduction of DeepSeek-R1, an open-source language model by the Chinese AI startup DeepSeek, sent the entire industry into a frenzy. This language model not only rivals OpenAI-o1, the flagship model from OpenAI. According to DeepSeek’s claims, it can also be cost-effectively trained with comparatively little hardware and is, in contrast to proprietary models like those from OpenAI, a true "open AI". ⁠

In this article, we will take a closer look at what makes DeepSeek-R1 so special, how it was trained, and why it is equally interesting for developers and companies. We will explain the technology behind the model and its impact on the fiercely competitive AI market, describe our practical experiences with it, and take a look at future implications.

Disclaimer: This article was originally written in German. Only LLMs (mainly DeepSeek-R1 and a bit of GPT-4o) were used for the translation 😉

DeepSeek Logo

What is DeepSeek-R1?

DeepSeek-R1 is an open-source Large Language Model (LLM), released under the MIT license, allowing free use for both private and commercial purposes. Similar to OpenAI-o1, DeepSeek-R1 is a "reasoning" model. This means the model is trained to generate the correct answer using a Chain-of-Thoughts (CoT) process. Specifically, it not only generates the answer to the user's request but also describes its path to the answer. This significantly improves the quality of responses, especially in mathematical problem solving or programming tasks, compared to traditional LLMs. However, this ability comes with some disadvantages. Training and inference become more expensive and resource-intensive. In the case of o1, this means up to 30 times longer generation times compared to GPT-4o and significantly higher token costs. DeepSeek-R1 challenges this paradigm by delivering comparable performance at far lower reported costs.

Brief Technical Overview

Overview of the training pipeline of DeepSeek-R1-Zero and DeepSeek-R1

The DeepSeek team describes the training process of DeepSeek-R1 in their paper [2]. Initially, a model was trained through Reinforcement Learning (RL) directly on the pre-trained base model DeepSeek-V3-Base without using Supervised Fine-Tuning (SFT). The training process began with a simple template prompting the base model to generate a thought process, followed by a final answer. This template was limited to specifying a structural format without any content-specific guidelines to observe the model's natural development during the RL process. During training, the model showed continuous performance improvement. It learned to spend more time on complex thought processes and reconsider its approach, resulting in increasingly longer responses. However, several issues were identified during the training of DeepSeek-R1-Zero, ultimately leading to the development of DeepSeek-R1. Two major problems were the lack of readability of the generated content and the frequent mixing of different languages.

The researchers then developed a new model to address these issues. They began training with a "Cold Start" phase, using a small amount of high-quality Chain-of-Thought data to fine-tune the model. This approach aimed to bypass the initial unstable phase of RL training, drastically improving the readability of responses compared to the DeepSeek-R1-Zero model. After fine-tuning, the model was trained through the same large-scale RL process as DeepSeek-R1-Zero. With each subsequent SFT and RL step, the reasoning capabilities and response quality were further improved. Additionally, smaller Llama and Qwen models were fine-tuned using DeepSeek-R1 to teach them the reasoning abilities of DeepSeek-R1 and thus improve those models.

Performance and Training Cost

DeepSeek-R1 demonstrated very impressive performance in benchmark tests. In text comprehension and education-oriented knowledge benchmarks, DeepSeek-R1 significantly outperformed earlier versions like DeepSeek-V3 and was only slightly behind or even on par with OpenAI-o1. In mathematical tasks, DeepSeek-R1 achieved scores that surpassed those of o1. Similar performance was observed in solving programming tasks, as evidenced by benchmark results.

Benchmark results of DeepSeek-R1 compared to OpenAI-o1 and others [2]

Particularly noteworthy is the development efficiency: According to DeepSeek, the model was developed within only two months and with a budget of under $10 million — a fraction of the billions invested by competitors like OpenAI. If these numbers hold true, it challenges the assumption that AI advancement requires exorbitant investments, potentially disrupting the revenue models of chip giants like NVIDIA.

Self-Hosting and API

The open-source DeepSeek-R1 model can be downloaded for free and used at no cost. Both the large 671B model and various distilled versions of Llama and Qwen, ranging from 1.5B to 70B, are available. These models can be used on appropriate hardware with programs like Ollama or vLLM. Anyone with the computing power can run the 671B model on their own or rented hardware and thus have a personal state-of-the-art LLM with full control over it. Alternatively, DeepSeek-R1 can also be used via an API provided by DeepSeek. DeepSeek also offers a chat on their company website. In both cases, whether using the API or their web chat, all interactions run through the AI startup's servers based in China. This should be considered with regard to privacy matters among other things. However, the remarkable price difference compared to OpenAI o1’s API costs should be noted. For one million generated output tokens, OpenAI charges nearly 30 times more compared to DeepSeek-R1. ⁠

Pricing Comparison of DeepSeek-R1 and OpenAI-o1 via their respective APIs [3]

Our Experience with DeepSeek-R1

At codecentric, we have already experimented with DeepSeek-R1 and achieved overall good results. For instance, the model successfully solved several complex programming tasks where even OpenAI-o1 failed. It consistently handled more complex inquiries and questions without issues. The model's output clearly shows how the reasoning process works, enabling it to correctly answer complex queries. We also noted that although the smaller distilled models had some problems with the German language, the large 671B model was very good at understanding and generating German texts. Additionally, we are integrating a self-hosted DeepSeek-R1 model into our own CompanyGPT solution for use within our company.

The Future of AI Models

This development is not entirely unexpected. Some experts have long anticipated that open-source models would eventually match or surpass the performance and quality of proprietary models like those from OpenAI. There is also the opinion that pay-as-you-go pricing models, like the paid APIs from OpenAI or hyperscalers, will not survive in the long term. The release of DeepSeek-R1 and its implications show that not only could these theses become reality, but also that the US is not the only one dominating the AI world. Of course, OpenAI, Google, and others will not rest. With Project Stargate and further investments, the United States will do everything to continue to stay at the top of the AI market in the future.

Example request with DeepSeek-R1 (671B)

User Input

⁠>>> You're escaping a labyrinth, and there are three exits. Exit A leads to an inferno. Exit B leads to an assassin. Exit C leads to a lion that hasn't eaten in 3 years. Which exit do you pick?

Model Output

<think> Okay, so I need to figure out which exit to choose to escape the labyrinth. There are three exits: A, B, and C. Each one has a different danger. Let me break down each option. Exit A leads to an inferno. An inferno is like a huge fire, right? So if I go through there, I'd probably be burned alive. That sounds immediately dangerous. Fire spreads quickly, and there's not much you can do once you're in it without proper protection. So maybe Exit A is not the best choice unless there's some hidden trick. Exit B leads to an assassin. Assassins are trained killers. If I go through here, I might get attacked by someone who's skilled in killing. But maybe there's a chance to negotiate or fight back? It depends on the situation. Assassins might be waiting to ambush, but they are human, so maybe there's a way to outsmart them or find a weakness. Still, facing a professional killer is super risky. Exit C leads to a lion that hasn't eaten in 3 years. Wait a second, a lion that hasn't eaten in three years? That seems impossible. Lions can't survive that long without food. A lion would die of starvation in a matter of weeks, not years. So if the lion hasn't eaten in three years, it's probably dead. Therefore, Exit C might actually be safe because the lion is no longer alive. That could be a trick question where the danger is already gone. So putting it all together: A is fire (instant death), B is an assassin (high risk but maybe manageable), and C is a lion that's likely dead. The correct choice would be Exit C because the lion couldn't have survived three years without food. Therefore, the path is safe. </think> The correct exit to choose is Exit C. The lion is dead, so it’s the safest path. 🔍🦁⚰️

⁠

References

[1] https://www.bloomberg.com/news/articles/2025-01-27/nasdaq-futures-slump-as-china-s-deepseek-sparks-us-tech-concern

⁠[2] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. DeepSeek-AI - https://arxiv.org/pdf/2501.12948

[3] https://api-docs.deepseek.com/news/news250120

Was this post helpful?

Blog author

Daniel Ladischenski

Do you still have questions? Just send me a message.

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

How we can hack an AI with just a few words

How we can hack an AI with just a few words Artificial intelligence (AI) has undergone an astonishing transformation in recent years and is now present in many areas of life. Whether in the form of chatbots that help us with everyday questions or generative...

IT-Security
AI

27.1.2025 | 4 minutes reading time

Simplifying LLM Application Development: A Newcomer's Perspective

I. Introduction Large Language Models (LLMs) have become highly popular due to their transformative impact on various fields, especially within IT. They enable developers to create innovative software applications centered around AI interactions, offering...

Generative AI
AI

6.12.2024 | 13 minutes reading time

Function Calling with GPT Models

GenAI is a powerful tool for generating content and interacting with applications using natural language. However, this tool also has significant limitations when you plan to use it in your own software. GenAI's knowledge is limited to information that...

Generative AI
AI
LLM

6.9.2024 | 5 minutes reading time

How to program my LLM with Prompt Engineering

When developing a feature powered by LLMs, it is essential to make the most use of Prompt Engineering. A well designed prompt written in the “system” role of the LLM (more information here: https://www.codecentric.de/wissens-hub/blog/accessing-llms-in...

LLM
Generative AI

19.6.2024 | 8 minutes reading time

Accessing LLMs in Code – Automating LLM Calls

Hardly any technology has had such an impact in recent years as LLMs – with ChatGPT from OpenAI leading the way. Many media outlets are intensely engaged in how this tool can be used for personal and business purposes. Another aspect, which receives ...

LLM
Generative AI

30.5.2024 | 6 minutes reading time

Answer questions about your documents with OpenAI and Pinecone

In recent years, large language models (LLMs) have made remarkable progress in interacting with humans, showcasing their ability to answer a wide array of questions. Trained on publicly accessible internet content, these models have broad knowledge across...

13.11.2023 | 12 minutes reading time

Lukas Lehmann

Fighting Gandalf with magic spells (the spells are prompt injections) ...

Note: Do not attack any systems for which you do not have explicit permission to do so. In this article, I will recount the tale of outwitting a large language model by performing prompt injection attacks. Before we start, let's establish a common baseline...

IT-Security
AI

10.7.2023 | 12 minutes reading time

Michael Wagner

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

In this article, we'll explore how to use the Poetry package manager to manage the dependencies of a machine learning project that makes use of the M1 GPU for TensorFlow training. We'll cover the motivation for using Poetry in this context, and we'll...

Machine Learning
Apple
Data
AI
Python

11.1.2023 | 3 minutes reading time

Denis Stalz-John

How to use Java classes in Python

There is an old truism: “Use the right tool for the job.” However, in building software, we are often forced to nail in screws, just because the rest of the application was built with the figurative hammer Java. Of course, one of the preferred solutions...

AI
Java
Python

15.11.2021 | 8 minutes reading time

Hendrik Schawe

The universal recommender in Action(ML)

IntroductionRecommender systems have become crucial for many different businesses. E-commerce uses recommenders to guide their customers in finding the right products and to assure they stay on the site. Newspapers or entertainment websites want to keep...

AI
NoSQL
Data
Machine Learning
Python

18.4.2021 | 11 minutes reading time

Francesca Diana

NER with little data? Transformers to the rescue!

How do you solve deep learning problems with too little labelled data? The answer, of course, is transfer learning. In this post, we will apply this concept to named entity recognition (NER) andfine-tune a pre-trained BERT to extract information from...

Data
Machine Learning
AI
NLP
Agile transformation

14.12.2020 | 8 minutes reading time

Take control of named entity recognition with your own Keras model!

This post shows how to extract information from text documents with the high-level deep learning library Keras : we build, train and evaluate a bidirectional LSTM model by hand for a custom named entity recognition (NER) task on legal texts.In a previous...

Data
Python
AI
NLP
Machine Learning

13.11.2020 | 9 minutes reading time

NER @ CLI: Custom-named entity recognition with spaCy in four lines

Named entity recognition is a technical term for a solution to a key automation problem: extraction of information from text. Applications includeautomation of business processes involving documentsdistillation of data from the web by scraping websitesindexing...

Data
AI
NLP
Machine Learning

6.11.2020 | 9 minutes reading time

DISH-O-TRON – Train that vision model!

With this article we continue our endeavor of building dish-o-tron – an AI system designed to prevent the sudden appearance of dirty dishes in the community kitchen sink, and hence turning the community kitchen into a place of peace and harmony.This ...

AI
Computer Vision

11.10.2020 | 11 minutes reading time

Marcel Mikl

DISH-O-TRON – Gather that DATA you must!

This is the second article in our dish-o-tron series (a non-standard Deep Learning tutorial) in which we tackle one of the biggest problems in community kitchens: coming across someone else’s dirty dishes. We are facing this problem by building a state...

AI
Computer Vision
Machine Learning

24.9.2020 | 11 minutes reading time

Marcel Mikl

DISH-O-TRON – No more dirty dishes thanks to AI

Sadly, to tell you the truth, doing dishes is still a thing. However, so far most of our readers still like our non-standard Deep Learning tutorial.Typically, AI is demonstrated as solving various toy problems. AI plays chess and Go, AI plays video games...

10.9.2020 | 7 minutes reading time

Marcel Mikl

Why user-oriented development is so important – the story of tactics.ai

In this blog post, we want to give you an insight into the product development of tactics.ai. Our initial idea was a data-driven football analysis tool that applies machine learning techniques to analyze the strengths and weaknesses of opponents and ...

Agile
AI
Startup
Machine Learning
Product management

23.8.2020 | 8 minutes reading time

Denis Stalz-John

Thinking AI means re-thinking data

While doing AI is sexy and cool, data infrastructure is typically not considered any of this. However, production-grade machine learning applications heavily rely on proper data infrastructure. Hence, in order to generate actual business value, solid...

AI
Big Data
Data
Machine Learning

27.5.2020 | 7 minutes reading time

Marcel Mikl

Kofax Transformation Modules: Natural Language Processing, sentiments ...

Kofax Transformation Modules (KTM) offers several tools for document classification and data extraction. There are some older blog articles about these tools:– Document classification – Data extraction with format locators – Machine Learning The ...

Content Management
AI
Archiving
NLP

6.4.2020 | 8 minutes reading time

Physical regression testing for the Thermomix

Automating physical regression testing of products with computer vision and roboticsTesting a physical product can be a highly manual task. The advances in Deep Learning techniques and computer vision have led to a situation where we can start to strive...

AWS
IoT
Computer Vision
Product management
AI
Testing

31.3.2020 | 8 minutes reading time

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the AI world

What is DeepSeek-R1?

Brief Technical Overview

Performance and Training Cost

Self-Hosting and API

Our Experience with DeepSeek-R1

The Future of AI Models

Example request with DeepSeek-R1 (671B)

References

Was this post helpful?

Blog author

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

How we can hack an AI with just a few words

Simplifying LLM Application Development: A Newcomer's Perspective

Function Calling with GPT Models

How to program my LLM with Prompt Engineering

Accessing LLMs in Code – Automating LLM Calls

Answer questions about your documents with OpenAI and Pinecone

Fighting Gandalf with magic spells (the spells are prompt injections) ...

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

How to use Java classes in Python

The universal recommender in Action(ML)

NER with little data? Transformers to the rescue!

Take control of named entity recognition with your own Keras model!

NER @ CLI: Custom-named entity recognition with spaCy in four lines

DISH-O-TRON – Train that vision model!

DISH-O-TRON – Gather that DATA you must!

DISH-O-TRON – No more dirty dishes thanks to AI

Why user-oriented development is so important – the story of tactics.ai

Thinking AI means re-thinking data

Kofax Transformation Modules: Natural Language Processing, sentiments ...

Physical regression testing for the Thermomix