Python on an M1 chip: Running smoothly using Docker

14.2.2022 | 6 minutes reading time

I have been working as a data scientist at codecentric for several years now. Thus, my language of choice is Python and I am using it in several projects on a daily basis. Last year, I got pretty excited about the announcement of the new versions of the Apple M1 chip because it offered a much higher performance. Usually, I don’t need to run long trainings of neural networks on my laptop. But for small experiments, and of course debugging, my hope was to save a lot of time. In December last year I was privileged enough to choose a new business laptop and so I took the opportunity to get a Macbook Pro 16 with M1Pro Silicon. The whole installation started smoothly until I wanted to run my Python projects. Then I ran into …

Problems

Apple’s M1 chip is built on the ARM architecture in contrast to x64 chips used in prior Macbook versions. On the one hand, the advantage of this is that Apple became independent from Intel so they could design their own chip. The disadvantage, on the other hand, is that all software either needs to be emulated (with Rosetta2) or recompiled with an architecture-specific compiler for arm64 (M1) instead of x86 like before … Unfortunately, a recompilation is unlikely to work out of the box and code changes must be applied. Although Python is a scripting language, this also holds true for it because the interpreter is written in C. Furthermore, a lot of major packages like Numpy and Pandas are using C/C++ extensions for getting better performance, too. In short: With pip install I was not able to get a running environment with all necessary packages installed. There is the alternative way of using Miniforge which is a variant of Conda. But this approach lacks the possibility of reproducible environments and therefore was not an option.

Goal

In a Python project with multiple people involved it is crucial that the software environment is consistent across different platforms and systems. A way to achieve this is using a package manager like Poetry (https://python-poetry.org/ ) for Python. It stores all package dependencies and their exact versions in files tracked with Git, which makes it possible to rerun the installation everywhere generating the same environment. Besides the reproducibility of the environment, another mandatory feature is the possibility to debug code in an easy way. Everyone who develops on a regular basis knows how much time it can save to investigate the variables and behavior step by step in an interactive manner. And since this is already possible with IDEs like PyCharm or VsCode, I didn’t want to miss out on this feature when changing to a new architecture.

What didn’t work

The first thing I tried was installing Miniforge and running poetry install, a way of installing the specified dependencies into a virtualenv. Every time a package couldn’t be installed via pip (which Poetry is using in the background), I tried to install it with conda install. This soon became very complicated and due to package version restrictions I abandoned this approach. The second attempt I gave a try was running the terminal in an x64 emulation with Rosetta2. The idea was to use this to install only Python x64 packages. Unfortunately, I didn’t find out how to set the compiler correctly and for me it was intransparent what compilers and which version of Homebrew was used. Thus, not seeing any way to succeed here, I searched for a different way.

Final approach

The final approach I tried and which led to success was using Docker and running the required environments as containers. Each container is also emulated as x64, which makes it possible to install every package as before on prior Macbooks. All commands and steps are provided in the Git repository: https://github.com/JohnDenis/py-poetry-m1 . For abbreviation purposes I will only use Makefile targets for the description. They will be replaced with the corresponding command in the Makefile. You find it at the end of this post or in the Github repository. Additionally, Docker-Desktop must be installed to get everything running.

As a first step, you need to run a container from your desired Python base image (here 3.8) make run_raw
Now you are in a shell of the container and can install whatever is necessary. Additionally, the project directory is mounted in the container, which allows for changes via Poetry to be stored directly in the correct pyproject.toml and poetry.lock files. For the first installation a script is provided which is triggered by make install.
After the initial installation, your goal is to persist the current container as an image to use it for every run of your code. Open a second terminal without stopping the container. Run make commit_raw in the second terminal.
Now it’s possible to use the m1-built:latest image in combination with your favorite IDE to run and debug your Python scripts. Or you can run them from plain shell by running a container with a specific command.

Updating the environment

In some projects it occurs rarely,in others it occurs more often: the need to change packages of your environment. To save time, you don’t want to start from scratch and reinstall all packages with every change coming. This is why this solution presents a way of changing the built Docker image:

Start a Docker container of m1-built:latest by running make run_built
Interact with the Python environment as desired (e.g. poetry add, poetry update)
Open a second terminal without stopping the current container. Run make commit_raw in the second terminal.
Now you have a new version stored at the m1-built:latest tag.

Flexibility of the solution

This solution does not only work for Poetry-based environments, but also for all environments where packages are installed via pip. This means, the version of your python environment can be selected in the docker-compose.yml (e.g. 3.8, 3.9, etc) and everything else can be applied via command line when connected to the container.

Debugging with PyCharm

With the Docker image m1-built:latest being committed, PyCharm offers a convenient way to run and debug your project scripts.

Add a new interpreter (Project Settings -> Python Interpreter)
Add a Path mapping from your project root to /opt/project
Add Run configuration for main.py file

With these steps you create a run configuration which is working in the same way as normal environments.

Remarks on TensorFlow

The only package where the original Pypi packages are not working is TensorFlow. The reason seems to be that the AVX speed-up options cannot be emulated. Unfortunately, the workaround isn’t as clean as the plain solution, but currently I don’t know a different one:

Follow the instructions above and create your environment in a container.
Compile or download a version of TensorFlow where the AVX instructions are deactivated.
Replace the original installations with pip install /path/to/tensorflow_wheel.
Go ahead and commit the container.

Outlook

As long as not all Python packages are compatible with the Apple M1 silicon, this solution gives you a great way to run any Python environment on your Apple computer. And even if it’s possible to install all packages via pip, I would recommend sticking to this approach because it makes the software stack encapsulated and reproducible which saves a lot of time and nerves in the long run. One additional enhancement can be building the Docker image in a CI-pipeline. Thus, not every team member needs to conduct the described steps, but you can use the image from your private Pypi repository immediately.

Was this post helpful?

Blog author

Denis Stalz-John

Machine Learning Specialist

Do you still have questions? Just send me a message.

fromDenis Stalz-John

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

In this article, we'll explore how to use the Poetry package manager to manage the dependencies of a machine learning project that makes use of the M1 GPU for TensorFlow training. We'll cover the motivation for using Poetry in this context, and we'll...

Machine Learning
Apple
Data
AI
Python

11.1.2023 | 3 Minuten Lesezeit

Denis Stalz-John

Why user-oriented development is so important – the story of tactics.ai

In this blog post, we want to give you an insight into the product development of tactics.ai. Our initial idea was a data-driven football analysis tool that applies machine learning techniques to analyze the strengths and weaknesses of opponents and ...

Agile
AI
Startup
Machine Learning
Product management

23.8.2020 | 8 Minuten Lesezeit

Denis Stalz-John

How to define AI? – Using the Turing Test to measure human-like intelligence...

Although everyone has an intuitive way of understanding what AI means, the term is somehow difficult to grasp in its whole complexity. One example of a definition is given by Kaplan, A. and M. Haenlein (2019), who characterize AI as “a system’s ability...

AI
Testing

20.6.2019 | 3 Minuten Lesezeit

Denis Stalz-John

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Das UI-Framework Compose Multiplatform

Cross-Plattform App Entwicklung wird immer populärer, das zeigt unter anderem die Stackoverflow Developer Umfrage aus dem letzten Jahr . In der ist zu sehen, dass unter den Cross-Plattform App Frameworks React Native und Flutter am beliebtesten sind....

Android
Apple
Kotlin

4.11.2024 | 6 Minuten Lesezeit

Jimmy Nelle

Lessons learned: Was wir in einem Jahr ML Orchestrierung mit Dagster gelernt...

In einem gemeinsamen Projekt haben Tom Scholz und ich Machine Learning (ML) Services gebaut, um einem Kunden bei der Analyse von Dokumenten zu helfen. Eine Proof-Of-Concept Lösung war schnell gebaut, die es nun zu operationalisieren gilt. Hierbei war...

Machine Learning
Python
Data
Data Science

12.9.2024 | 27 Minuten Lesezeit

Patrick Soschinski

Tom Scholz

When Business Meets Technology: Vom Datenprodukt zur Datenarchitektur ...

Zusammenfassung Der Data Product Canvas (DPC) ist ein Werkzeug für die leichtgewichtige und iterative Konzeption von Datenprodukten. Dabei steigert er die Effizienz der Produktdefinition, indem er die wesentlichen Einflussbereiche auf Datenprodukte übersichtlich...

Softwarearchitektur
Data
DDD
Digitale Produktentwicklung

6.8.2024 | 21 Minuten Lesezeit

Daniel Engelhardt

Dr. Florian Rademacher

Charge your APIs Volume 28: Verbesserung von Anwendungs- und Datenintegration...

In der heutigen schnelllebigen Welt ist die nahtlose Integration von Anwendungen und Daten entscheidend für den Erfolg eines Unternehmens. In diesem Blogpost werden Konzepte wie die Maslowsche Pyramide, Team Topologies, evolutionäre Architekturen, API...

API
Data
Integration

25.7.2024 | 9 Minuten Lesezeit

Daniel Kocot

Mit Applied Data Products zum datengetriebenen Unternehmen

In den letzten Jahren ist der Hype um den Wert von Daten kontinuierlich gestiegen. Gleichzeitig sind eine Vielzahl von Konzepten und Methoden aufgekommen, wie man als Unternehmen "datengetrieben" werden kann. Vom strategischen Top-Management bis zum ...

Agilität
Big Data
Data
Produktmanagement
Digitalisierung
Data Science
Business Intelligence

18.5.2024 | 8 Minuten Lesezeit

Dr. Florian Rademacher

Stephan Hochhaus

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Immersive Web statt Metaverse

Nachdem der Hype der Investoren fürs Metaverse mit der Umbennung von Facebook in Meta seine Spitze erreicht und viele Startups mehrere Brainstorms zum Thema NFTs abgehalten haben, ist nicht mehr viel übrig geblieben vom Metaverse. Die ambitionierten ...

AR/VR
React
Frontend
Apple
Google

23.6.2023 | 5 Minuten Lesezeit

Alexander Bruckmann

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

ChatGPT im Alltag eines Python-Entwicklers

Seit einigen Tagen spiele ich mit ChatGPT herum. Beruflich und privat konnte ich damit einige Fragen bearbeiten, bspw. welche Alternativen es zu bestimmten Tools gibt, was Vorteile von Teilzeit für den Arbeitgeber sind oder wer ich bin. Leider weiß ChatGPT...

NLP
Python
Künstliche Intelligenz

27.1.2023 | 7 Minuten Lesezeit

Robert Meißner

Manches gehört zusammen, manches besser nicht - Konnaszenz in Python

Wir alle kennen es. Wir bekommen neuen Code und irgendwie macht der merkwürdige Sachen. Teilweise müssen wir Reverse Engineering betreiben. Wir wundern uns, warum eine Umgebungsvariable nicht korrekt gesetzt wird oder der Login schief geht. Bis wir merken...

Python
Softwareentwicklung
Softwarearchitektur

30.11.2022 | 7 Minuten Lesezeit

Robert Meißner

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

„Strawberry JSON Fields Forever“: Filtern nach JSON-Feldern mit GraphQL...

Schon die Beatles besangen ein uraltes Problem in ihrem Song „Strawberry JSON Fields Forever“ : Wie lässt sich mit der GraphQL Library Strawberry für Python nach Werten in JSON-Feldern einer PostgreSQL-Datenbank filtern?SetupUm das zu zeigen, braucht...

Frontend
API
Python

26.6.2022 | 4 Minuten Lesezeit

Michael Eichenseer

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Python on an M1 chip: Running smoothly using Docker

Problems

Goal

What didn’t work

Final approach

Updating the environment

Flexibility of the solution

Debugging with PyCharm

Remarks on TensorFlow

Outlook

Was this post helpful?

Blog author

Get in contact

Get in contact

More articles

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

Why user-oriented development is so important – the story of tactics.ai

How to define AI? – Using the Turing Test to measure human-like intelligence...

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Das UI-Framework Compose Multiplatform

Lessons learned: Was wir in einem Jahr ML Orchestrierung mit Dagster gelernt...

When Business Meets Technology: Vom Datenprodukt zur Datenarchitektur ...

Charge your APIs Volume 28: Verbesserung von Anwendungs- und Datenintegration...

Mit Applied Data Products zum datengetriebenen Unternehmen

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Immersive Web statt Metaverse

Große Sprachmodelle: Was ist ein LLM?

Bessere SQL-Datenpipelines mit dbt

ChatGPT im Alltag eines Python-Entwicklers

Manches gehört zusammen, manches besser nicht - Konnaszenz in Python

Streaming Wikipedia mit Apache Kafka

„Strawberry JSON Fields Forever“: Filtern nach JSON-Feldern mit GraphQL...

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten