To solve our performance problems with Gnocchi and the whole OpenStack telemetry stack, we tried Gnocchi with Ceph as backend starting with OpenStack-Ansible Newton. The experience wasn’t good. Sooner or later, we experienced slow requests and stuck PGs in our Ceph. In one case, only deleting the Gnocchi pool saved our cluster.
As a result, we switched back to MongoDB as the storage backend for ceilometer. It was not performing well, but at least it did not put our whole storage cluster at risk.
This left us with our performance problems, but then we stumbled upon the following performance tests for Gnocchi. One was done by Julien Danjou, the developer of Gnocchi. They got us thinking what went wrong with our setup.
So with Openstack-Ansible Pike and a new cloud, we gave Gnocchi another try. After our experience with Gnocchi and Ceph, we didn’t want to take the performance tests for granted. And as every setup is a bit different, we set up a simple performance test on our own. We started 700 VMs over time and then got a cup of coffee. OK, more than one cup. After some days we experienced the same problems with Ceph we already knew. We saw more and more slow requests.
As we use OpenStack-Ansible for our cloud and a three-controller setup, we deployed Gnocchi on each of our controllers. The default parameters of OpenStack-Ansible use file as storage backend and MySQL as the coordination backend. We changed the storage backend to Ceph and kept the rest of the default settings.
1gnocchi_storage_driver: ceph
The MySQL backend is not a recommended coordination backend by tooz (https://docs.openstack.org/tooz/latest/user/drivers.html), so we used Zookeeper. As OpenStack-Ansible cannot include a role for everything, we had to integrate the Zookeeper role (https://github.com/openstack/ansible-role-zookeeper.git) into our setup:
1conf.d: 2zookeeper_hosts: 3{% for server in groups['control_nodes'] %} 4 {{ server }}: 5 ip: {{ hostvars[server]['ansible_default_ipv4']['address'] }} 6{% endfor%}
1env.d: 2component_skel: 3 zookeeper_server: 4 belongs_to: 5 - zookeeper_all 6 7container_skel: 8 zookeeper_container: 9 belongs_to: 10 - infra_containers 11 - shared-infra_containers 12 contains: 13 - zookeeper_server 14 properties: 15 service_name: zookeeper
Now we could set up Zookeeper as coordination backend for Gnocchi:
1gnocchi_coordination_url: "zookeeper://{% for host in groups['zookeeper_all'] %}{{ hostvars[host]['container_address'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}" 2 3gnocchi_pip_packages: 4 - cryptography 5 - redis 6 - gnocchiclient 7# this is what we want: 8# - "gnocchi[mysql,ceph,ceph_alternative_lib,redis]" 9# but as there is no librados >=12.2 pip package we have to first install ceph without alternative support 10# after adding the ceph repo to gnocchi container, python-rados>=12.2.0 is installed and linked automatically 11# and gnocchi will automatically take up the features present in the used rados lib. 12 - "gnocchi[mysql,ceph,redis]" 13 - keystonemiddleware 14 - python-memcached 15# addiitional pip packages needed for zookeeper coordination backend 16 - tooz 17 - lz4 18 - kazoo
A word of caution: the name of the Ceph alternative lib implementation (ceph_alternative_lib) varies between Gnocchi versions.
This will help distribute the work across all metric processors on all controllers.
But that didn’t solve our problem either. The problem seemed to be our Ceph cluster. Searching the web, a lot of bug tickets showed other people experienced the same problem. But all the bug tickets put us on the right track. Newer versions of Gnocchi can separate the storage of your data. You can use a different storage type for incoming short-lived data and long-time storage.
The next step was to set up the storage layer for our incoming data. We chose Redis, as recommended, from the list of supported backends. To set up the Redis cluster, we chose this ansible role . Next, we had to configure Gnocchi with OpenStack-Ansible to use the Redis Cluster as incoming storage:
1gnocchi_conf_overrides: 2 incoming: 3 driver: redis 4 redis_url: redis://{{ hostvars[groups['redis-master'][0]]['ansible_default_ipv4']['address'] }}:{{ hostvars[groups['redis-master'][0]]['redis_sentinel_port'] }}?sentinel=master01{% for host in groups['redis-slave'] %}&sentinel_fallback={{ hostvars[host]['ansible_default_ipv4']['address'] }}:{{ hostvars[host]['redis_sentinel_port'] }}{% endfor %} 5 6gnocchi_distro_packages: 7 - apache2 8 - apache2-utils 9 - libapache2-mod-wsgi 10 - git 11 - build-essential 12 - python-dev 13 - libpq-dev 14 - python-rados 15# additional package for python redis client 16 - python-redis
We ran our performance test again and eureka! No more slow requests in Ceph. Our performance test included 700 VMs with one vCPU and one GB of RAM. We weren’t interested in the VMs but only in the telemetry data they would generate. We assume it will take some time for our cloud to grow beyond 700 VMs. In the meantime, our cluster might evolve, e.g. Ceph only has SSD journals, no SSD storage, Gnocchi will evolve, our knowledge about Gnocchi and Ceph will evolve. So we expect our current setup to cope with the upcoming load. All in all, it will give us enough time to experiment with more hints from this talk to aim for the 10000 VMs. We hope this article will help some other people to integrate Gnocchi into their OpenStack setup.
More articles
fromChristian Zunker & Daniel Marks
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog authors
Christian Zunker
Do you still have questions? Just send me a message.
Daniel Marks
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.