Nagios OpenStack Installer – Automated monitoring of your OpenStack VMs

There are many tools available which can be used to monitor operation of the Opentack infrastructure, but as OpenStack user you might not be interested in monitoring OpenStack itself. Your primary interest should be the operation of the VMs that are hosted on OpenStack. Nagios OpenStack Installer is a tool for exactly that purpose: it uses a Nagios VM inside the OpenStack environment and configures it to monitor all VMs that you own.

Nagios OpenStack Installer configures your OpenStack monitoring environment remotely from your desktop PC or labtop. In order to use Nagios OpenStack Installer you need to fulfil the following prerequisites.

  • You must have an SSH Key for securely accessing the Nagios VM and the VMs you own and you must know the SSH credentials to access the VMs.
  • You must know your OpenStack user account (name and id), your OpenStack password, the OpenStack Keystone authentication URL and the OpenStack tenant (“project”) (name and id) you work with.
  • You must be able to create a VM that serves as Nagios VM and you must own a publicly available IP (“floating IP”) to make the Nagios dashboard accessible to the outside world.
  • Nagios OpenStack Installer is a Python tool and requires some Python packages. Make sure to install Python 2.7 on your desktop. Additionally you need the following packages:
    • pip: The package manager to install Python packages from the PyPI repository (Windows users should refer to the pip developer’s “get pip” manual to install pip, Cygwin users are recommended to follow these guidelines in atbrox blog).
    • fabric: This package is used to access OpenStack VMs via SSH and remotely execute tasks on the VMs.
    • python-keystoneclient: To access the OpenStack Keystone API and authenticate to your OpenStack environment.
    • python-novaclient: To manage VMs which are hosted on OpenStack.
    • cuisine: This is a configuration management tool and lightweight alternative to configuration managers like Puppet or Chef. cuisine is required to manage the packages and configuration files on the Nagios VM and the monitored VMs.
    • pickle: pickle is a object serialization tool that can store objects and their current state in a file dump. Object serilaization is used to get the list of VMs which should be monitored.
    • We recommend to use pip for installation of the required packages, since pip automatically installs package dependencies.
  • You must have Git downloaded and installed.

After having installed the prerequisites on your local PC or labtop, you can use Nagios OpenStack Installer by performing the following steps.

  1. Create a new directory and clone the Nagios OpenStack Installer Github repository in it.git clone https://github.com/icclab/kobe6661-nagios-openstack-installer.git
  2. Edit the credentials in install_autoconfig.py, remote.py, remote_server_config.py and vm_list_extractor.py to match your OpenStack and SSH credentials.
  3. Run remote_server_config.py from Python console. This installs and configures Nagios server on your Nagios VM. After installation you should be able to access the Nagios Dashboard by pointing your webbrowser to “http://<your_nagios_public_ip>/nagios” and providing your Nagios login credentials.
  4. Run vm_list_extractor.py from Python console. This will extract the list of VMs on OpenStack that should be monitored and save the list as pickle file dump on your computer.
  5. Run install_autoconfig.py from Python console. This will upload the Python scripts required to automatically update the Nagios configuration in case of changes in the OpenStack VM environment (nagios_config_updater.py, config_transporter.py, config_generator.py, vm_list_extractor.py). Additionally it will run these Python scripts on the Nagios VM to let Nagios capture the VMs which should be monitored, install and run the required Nagios and NRPE plugins on these VMs and reconfigure and restart Nagios server to monitor these VMs remotely.

Now the Nagios environment is installed and you should be able to monitor your VMs. Nagios OpenStack Installer is available on ICCLab’s Github repository. Feel free to try it out and give feedback about future improvements.

Deploy Ceph and start using it: end to end tutorial – Installation (part 1/3)

Ceph is one of the most interesting distributed storage systems available, with a very active development and a complete set of features that make it a valuable candidate for cloud storage services. This tutorial goes through the required steps (and some related troubleshooting), required to setup a Ceph cluster and access it with a simple client using librados. Please refer to the Ceph documentation for detailed insights on Ceph components.

(Part 2/3 – Troubleshooting – Part 3/3 – librados client)

Assumptions

  • Ceph version: 0.79
  • Installation with ceph-deploy
  • Operating system for the Ceph nodes: Ubuntu 14.04

Cluster architecture

In a minimum Ceph deployment, a Ceph cluster includes one Ceph monitor (MON) and a number of Object Storage Devices (OSD).

Administrative and control operations are issued from an admin node, which must not necessarily be separated from the Ceph cluster (e.g., the monitor node can also act as the admin node). Metadata server nodes (MDS) are required only for Ceph Filesystem (Ceph Block Devices and Ceph Object Storage do not use MDS).

Preparing the storage

WARNING: preparing the storage for Ceph means to delete a disk’s partition table and lose all its data. Proceed only if you know exactly what you are doing!

Ceph will need some physical storage to be used as Object Storage Devices (OSD) and Journal. As the project documentation recommends, for better performance, the Journal should be on a separate drive than the OSD. Ceph supports ext4, btrfs and xfs. I tried setting up clusters with both btrfs and xfs, however I could achieve stable results only with xfs, so I will refer to this latter.

  1. Prepare a GPT partition table (I have observed stability issues when using a dos partition)
    $ sudo parted /dev/sd<x>
    (parted) mklabel gpt
    (parted) mkpart primary xfs 0 ­100%
    (parted) quit

    if parted complains about alignment issues (“Warning: The resulting partition is not properly aligned for best performance”), check this two links to find a solution: 1 and 2.

  2. Format the disk with xfs (you might need to install xfs tools with sudo apt-get install xfsprogs)
    $ sudo mkfs.xfs /dev/sd<x>1
  3. Create a Journal partition (raw/unformatted)
    $ sudo parted /dev/sd<y>
    (parted) mklabel gpt
    (parted) mkpart primary 0 100%

 Install Ceph deploy

The ceph-deploy tool must only be installed on the admin node. Access to the other nodes for configuration purposes will be handled by ceph-deploy over SSH (with keys).

  1. Add Ceph repository to your apt configuration, replace {ceph-stable-release} with the Ceph release name that you want to install (e.g., emperor, firefly, …)
    $ echo deb http://ceph.com/debian-{ceph-stable-release}/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
  2. Install the trusted key with
    $ wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo apt-key add -
  3. If there is no repository for your Ubuntu version, you can try to select the newest one available by manually editing the file /etc/apt/sources.list.d/ceph.list and changing the Ubuntu codename (e.g., trusty -> raring)
    $ deb http://ceph.com/debian-emperor raring main
  4. Install ceph-deploy
    $ sudo apt-get update
    $ sudo apt-get install ceph-deploy

Setup the admin node

Each Ceph node will be setup with an user having passwordless sudo permissions and each node will store the public key of the admin node to allow for passwordless SSH access. With this configuration, ceph-deploy will be able to install and configure every node of the cluster.

NOTE: the hostnames (i.e., the output of hostname -s) must match the Ceph node names!

  1. [optional] Create a dedicated user for cluster administration (this is particularly useful if the admin node is part of the Ceph cluster)
    $ sudo useradd -d /home/cluster-admin -m cluster-admin -s /bin/bash

    then set a password and switch to the new user

    $ sudo passwd cluster-admin
    $ su cluster-admin
  2. Install SSH server on all the cluster nodes (even if a cluster node is also an admin node)
    $ sudo apt-get install openssh-server
  3. Add a ceph user on each Ceph cluster node (even if a cluster node is also an admin node) and give it passwordless sudo permissions
    $ sudo useradd -d /home/ceph -m ceph -s /bin/bash
    $ sudo passwd ceph
    <Enter password>
    $ echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
    $ sudo chmod 0440 /etc/sudoers.d/ceph
  4. Edit the /etc/hosts file to add mappings to the cluster nodes. Example:
    $ cat /etc/hosts
    127.0.0.1       localhost
    192.168.58.2    mon0
    192.168.58.3    osd0
    192.168.58.4    osd1

    to enable dns resolution with the hosts file, install dnsmasq

    $ sudo apt-get install dnsmasq
  5. Generate a public key for the admin user and install it on every ceph nodes
    $ ssh-keygen
    $ ssh-copy-id ceph@mon0
    $ ssh-copy-id ceph@osd0
    $ ssh-copy-id ceph@osd1
  6. Setup an SSH access configuration by editing the .ssh/config file. Example:
    Host osd0
       Hostname osd0
       User ceph
    Host osd1
       Hostname osd1
       User ceph
    Host mon0
       Hostname mon0
       User ceph
  7. Before proceeding, check that ping and host commands work for each node
    $ ping mon0
    $ ping osd0
    ...
    $ host osd0
    $ host osd1

Setup the cluster

Administration of the cluster is done entirely from the admin node.

  1. Move to a dedicated directory to collect the files that ceph-deploy will generate. This will be the working directory for any further use of ceph-deploy
    $ mkdir ceph-cluster
    $ cd ceph-cluster
  2. Deploy the monitor node(s) – replace mon0 with the list of hostnames of the initial monitor nodes
    $ ceph-deploy new mon0
    [ceph_deploy.cli][INFO  ] Invoked (1.4.0): /usr/bin/ceph-deploy new mon0
    [ceph_deploy.new][DEBUG ] Creating new cluster named ceph
    [ceph_deploy.new][DEBUG ] Resolving host mon0
    [ceph_deploy.new][DEBUG ] Monitor mon0 at 192.168.58.2
    [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
    [ceph_deploy.new][DEBUG ] Monitor initial members are ['mon0']
    [ceph_deploy.new][DEBUG ] Monitor addrs are ['192.168.58.2']
    [ceph_deploy.new][DEBUG ] Creating a random mon key...
    [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
    [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
  3. Add a public network entry in the ceph.conf file if you have separate public and cluster networks (check the network configuration reference)
    public network = {ip-address}/{netmask}
  4. Install ceph in all the nodes of the cluster. Use the --no-adjust-repos option if you are using different apt configurations for ceph. NOTE: you may need to confirm the authenticity of the hosts if your accessing them on SSH for the first time!
    Example (replace mon0 osd0 osd1 with your node names):

    $ ceph-deploy install --no-adjust-repos mon0 osd0 osd1
  5. Create monitor and gather keys
    $ ceph-deploy mon create-initial
  6. The content of the working directory after this step should look like
    cadm@mon0:~/my-cluster$ ls
    ceph.bootstrap-mds.keyring  ceph.bootstrap-osd.keyring  ceph.client.admin.keyring  ceph.conf  ceph.log  ceph.mon.keyring  release.asc

Prepare OSDs and OSD Daemons

When deploying OSDs, consider that a single node can run multiple OSD Daemons and that the journal partition should be on a separate drive than the OSD for better performance.

  1. List disks on a node (replace osd0 with the name of your storage node(s))
    $ ceph-deploy disk list osd0

    This command is also useful for diagnostics: when an OSD is correctly mounted on Ceph, you should see entries similar to this one in the output:

    [ceph-osd1][DEBUG ] /dev/sdb :
    [ceph-osd1][DEBUG ] /dev/sdb1 other, xfs, mounted on /var/lib/ceph/osd/ceph-0
  2. If you haven’t already prepared your storage, or if you want to reformat a partition, use the zap command (WARNING: this will erase the partition)
    $ ceph-deploy disk zap --fs-type xfs osd0:/dev/sd<x>1
  3. Prepare and activate the disks (ceph-deploy also has a create command that should combine this two operations together, but for some reason it was not working for me). In this example, we are using /dev/sd<x>1 as OSD and /dev/sd<y>2 as journal on two different nodes, osd0 and osd1
    $ ceph-deploy osd prepare osd0:/dev/sd<x>1:/dev/sd<y>2 osd1:/dev/sd<x>1:/dev/sd<y>2
    $ ceph-deploy osd activate osd0:/dev/sd<x>1:/dev/sd<y>2 osd1:/dev/sd<x>1:/dev/sd<y>2

Final steps

Now we need to copy the cluster configuration to all nodes and check the operational status of our Ceph deployment.

  1. Copy keys and configuration files, (replace mon0 osd0 osd1 with the name of your Ceph nodes)
    $ ceph-deploy admin mon0 osd0 osd1
  2. Ensure proper permissions for admin keyring
    $ sudo chmod +r /etc/ceph/ceph.client.admin.keyring
  3. Check the Ceph status and health
    $ ceph health
    $ ceph status

    If, at this point, the reported health of your cluster is HEALTH_OK, then most of the work is done. Otherwise, try to check the troubleshooting part of this tutorial.

Revert installation

There are useful commands to purge the Ceph installation and configuration from every node so that one can start over again from a clean state.

This will remove Ceph configuration and keys

ceph-deploy purgedata {ceph-node} [{ceph-node}]
ceph-deploy forgetkeys

This will also remove Ceph packages

ceph-deploy purge {ceph-node} [{ceph-node}]

Before getting a healthy Ceph cluster I had to purge and reinstall many times, cycling between the “Setup the cluster”, “Prepare OSDs and OSD Daemons” and “Final steps” parts multiple times, while removing every warning that ceph-deploy was reporting.

 

Automated Vagrant installation of MySQL HA using DRBD, Corosync and Pacemaker

Fig. 1: Redundant MySQL Server nodes using Pacemaker, Corosync and DRBD.

Fig. 1: Redundant MySQL Server nodes using Pacemaker, Corosync and DRBD.

If automation is required, Vagrant and Puppet seem to be the most adequate tools to implement it. What about automatic installation of High Availability database servers? As part of  our Cloud Dependability efforts, the ICCLab works on automatic installation of High Availability systems. One such HA system is a MySQL Server – combined with DRBD, Corosync and Pacemaker.

In this system the server-logic of the MySQL Server runs locally on different virtual machine nodes, while all database files are stored on a clustered DRBD-device which is distributed on all the nodes. The DRBD resource is used by Corosync which acts as resource layer for Pacemaker. If one of the nodes fails, Pacemaker automagically restarts the MySQL server on another node and synchronizes the data on the DRBD device. This combined DRBD and Pacemaker approach is best practice in the IT industry.

At ICCLab we have developed an automatic installation script which creates 2 virtual machines and configures MySQL, DRBD, Corosync and Pacemaker on both machines. The automated installation script can be downloaded from Github.

OpenStack Grizzly installation for the lazy

As kindof advertisement for the new OpenStack Grizzly release we have created an automated single-node OpenStack Grizzly installation which uses Vagrant and Puppet. The automated installation can be downloaded from Github using the following URL: https://github.com/kobe6661/vagrant_grizzly_install.git

Please feel free to install it on your machine and test the new release.