In the context of the Cloud Native Applications (CNA) Seed Project we are working on migrating an open source CRM application into to the cloud. After enabling horizontal scalabilty of the original application and moving it onto our OpenStack cloud with the help of CoreOS (etcd, fleet) and docker we’ve now just finished adding the monitoring / logging / log-collection functionality – a blog post describing this process in its detail will follow – which is needed for the next part of the project: enabling automatic scaling. As part of this process we’ve learnt some lessons concerning the process management in docker containers which we’d like to share in this post.
Containers and Processes
If you use Docker to run an application (e.g a webserver) you just have to write a Dockerfile that installs the webserver and define a command that tells docker what process should be started if the docker container is run. There is no problem yet because if you stop the container the process is stopped, too. If the process in the container exits the container also stops. So there is something like a 1:1 mapping: one container starts/stops one process. Docker containers are intended to be used like that.
Pseudo-Dockerfile of a webserver:
# Use base image FROM ubuntu:14.04 # install the service (e.g webserver) RUN apt-get update && apt-get install webserver # add some configuration/application ADD webserver_config.cfg /etc/webserver/webserver_config.cfg ADD webserver_app /app # process executed when container starts (starting webserver) CMD /bin/start-webserver --foreground
Problems arise
But very soon you start adding more things to your container. Very likely you start more processes inside the container. An easy way to achieve this is to use a bash-script that is used to start all the processes you use:
Pseudo-Dockerfile:
[...] # install the service (e.g webserver) RUN apt-get update && apt-get install webserver # install more needed applications apt-get install debugging-tool monitoring-tool service-discovery [...] ADD start.sh /start.sh # process executed when container starts (start-script) CMD /start.sh
Start-script (start.sh):
#!/bin/bash /bin/monitoring-tool & /bin/service-discovery & /bin/start-webserver & /bin/debbugging-tool & tail -f /var/log/webserver_logfile.log
This works well although you might notice that the last line of the script tailing a logfile is a bit hacky and lets you feel a bit uncomfortable. This line is needed to keep the script running and prevents that it exits. The script should not terminate because this is the process started when you run the docker container. If you omit the last line, the container would just stop right after starting all the processes. All these processes will not be running anymore because the container they’re living in is stopped!
Another problem is logging. With this simple script only the log of the webserver is written to stdout. That means that if you issue the ‘docker logs’ command, only the webservers logfile will be shown. But it would be much more comfortable if you could read all the logs from all the processes inside the container with this command.
A typical use case is running an application together with dynamic configuration management. That means the application is running inside the container together with another process which is responsible that the service is running with the latest configuration. This could be some tool like confd that watches some keys in a distributed key-value store like etcd. If a change of the desired configuration is detected, the tool has to take care of restarting the service (e.g apache) inside the container. If the service is the process the container is started with, it is not possible to restart this process because stopping this process from within the container stops the container.
You could setup your apache as daemon using init systems like Upstart/Systemd. This way you solve one and introduce even more problems because you would use the container like a virtual machine. What would happen if the apache service in the container fails? The container would still be running and you wouldn’t notice that the apache process isn’t running anymore.
Using process supervision – supervisord
The solution for this is relatively easy. Just use an init process for the container that handles the lifecycle of the other processes you want to run in the container. Applications that can do this are called process supervision tools – Tools like s6, runit or supervisord.
All you need to do now is to install one of these tools in the docker image, write the configuration which is basically a definition what processes you want to start and finally set the supervision tool as init process in the Dockerfile.
For Supervisord this can be done as follows:
Install supervisord (in Ubuntu based image):
Dockerfile:
[...] RUN apt-get install -yq supervisor [...]
Write the configuration (/etc/supervisor/conf.d/webserver_supervisord.conf):
[program:webserver] command=/bin/start-webserver numprocs=1 stdout_logfile=/dev/fd/1 stdout_logfile_maxbytes=0 [program:monitoring-tool] command=/bin/start-monitoring numprocs=1 stdout_logfile=/dev/fd/1 stdout_logfile_maxbytes=0 [program:service-discovery] command=/bin/start-servicediscovery numprocs=1 stdout_logfile=/dev/fd/1 stdout_logfile_maxbytes=0 [program:debugging-tool] command=/bin/start-debugging numprocs=1 stdout_logfile=/dev/fd/1 stdout_logfile_maxbytes=0
Set supervisor as init process:
Dockerfile:
[...] CMD /usr/bin/supervisord [...]
Supervisord will be started as the initial process when launching the container. Supervisord then starts and supervises the processes you defined in the configuration file. The stdout_logfile lines in the configuration tells supervisord to log to stdout. That means all outputs from all processes in the container can then be shown by ‘docker logs’. There are many more options offered by supervisord. For example it can be configured to restart a process if it fails.
From within the container you can now talk to supervisord with the supervisorctl client. This way you can easily start/stop/restart processes defined in the configuration file. This is very similar to Upstart/Systemd. For example to start the debugging-tool the command would be:
$ supervisorctl start debugging-tool
The eternal process
There is still a problem with this approach. Supervisord is always running no matter what other processes are running by supervisord. For your container life cycle that means that as long as supervisord is running your container is running. This maybe is something you don’t want or is something that can confuse other applications monitoring the containers’ status as kind of health management.
For instance if you use fleet to start docker containers in a cluster environment this can really break the expected behavior. A fleet service can be automatically restarted if it fails. If you start a container as fleet service, it would be restarted if the container stops. But with supervisord the container would run forever, even if the webserver inside the container had crashed. Fortunately there is a possibility to shutdown supervisor if a process exits, but this includes writing an additional script and extending the configuration with an event listener.
The event listener is called when the process specified enters the specified state. In this example, if the webserver process enters the process state fatal, the kill_supervisor.py script is executed.
Configuration file (/etc/supervisor/conf.d/webserver_supervisord.conf):
[...] [eventlistener:webserver_exit] command=/usr/bin/kill_supervisor.py process_name=webserver events=PROCESS_STATE_FATAL [...]
kill_supervisor.py script:
#!/usr/bin/env python import sys import os import signal def write_stdout(s): sys.stdout.write(s) sys.stdout.flush() def write_stderr(s): sys.stderr.write(s) sys.stderr.flush() def main(): while 1: write_stdout('READY\n') line = sys.stdin.readline() write_stdout('This line kills supervisor: ' + line); try: pidfile = open('/var/run/supervisord.pid','r') pid = int(pidfile.readline()); os.kill(pid, signal.SIGQUIT) except Exception as e: write_stdout('Could not kill supervisor: ' + e.strerror + '\n') write_stdout('RESULT 2\nOK') if __name__ == '__main__': main() import sys
Conclusion
Process supervision tools like supervisord simplify process handling in docker containers. Especially if you need to run multiple processes or need to restart processes inside containers.
Supervisord is one of these process supervision tools. It is easy to install, configure and provides many features. More advanced tasks such as stopping the container as soon as a supervised process is stopped involve writing additional scripts for event handling.
Word of caution: Supervisord does not resolve all problems regarding process management in docker containers: A problem that is still present is the PID 1 zombie reaping problem. Usually an init system is responsible to cleanup zombie processes. Supervisord does not do this. To tackle this problem, you can use one of the other supervision tools mentioned above or use a proper base-image for this.
Sources
Links
- Cloud-Native Applications Initiative Page
- CNA Seed Project: Migration Process Part 1
- CNA Seed Project: Github Repository
Excellent Article. Superb down to earth explanation. Thank You