Managing hosts in a running OpenStack environment

How does one remove a faulty/un/re-provisioned physical machine from the list of managed physical nodes in OpenStack nova? Recently we had to remove a compute node in our cluster for management reasons (read, it went dead on us). But nova perpetually maintains the host entry hoping at some point in time, it will come back online and start reporting its willingness to host new jobs.

Normally, things will not break if you simply leave the dead node entry in place. But it will mess up the overall view of the cluster if you wish to do some capacity planning. The resources once reported by the dead node will continue to show up in the statistics and things will look all ”blue” when in fact they should be ”red”.

There is no straight forward command to fix this problem, so here is a quick and dirty fix.

  1. log on as administrator on the controller node
  2. locate the nova configuration file, typically found at /etc/nova/nova.conf
  3. location the ”connection” parameter – this will tell you the database nova service uses

Depending on whether the database is mysql or sqlite endpoint, modify your queries. The one shown next are for mysql endpoint.

# mysql -u root
mysql> use nova;
mysql> show tables;

The tables of interest to us are ”compute_nodes” and ”services”. Next find the ”host” entry of the dead node from ”services” table.

mysql> select * from services;
+---------------------+---------------------+------------+----+-------------------+------------------+-------------+--------------+----------+---------+-----------------+
| created_at          | updated_at          | deleted_at | id | host              | binary           | topic       | report_count | disabled | deleted | disabled_reason |
+---------------------+---------------------+------------+----+-------------------+------------------+-------------+--------------+----------+---------+-----------------+
| 2013-11-15 14:25:48 | 2014-04-29 06:20:10 | NULL       |  1 | stable-controller | nova-consoleauth | consoleauth |      1421475 |        0 |       0 | NULL            |
| 2013-11-15 14:25:49 | 2014-04-29 06:20:05 | NULL       |  2 | stable-controller | nova-scheduler   | scheduler   |      1421421 |        0 |       0 | NULL            |
| 2013-11-15 14:25:49 | 2014-04-29 06:20:06 | NULL       |  3 | stable-controller | nova-conductor   | conductor   |      1422189 |        0 |       0 | NULL            |
| 2013-11-15 14:25:52 | 2014-04-29 06:20:05 | NULL       |  4 | stable-compute-1  | nova-compute     | compute     |      1393171 |        0 |       0 | NULL            |
| 2013-11-15 14:25:54 | 2014-04-29 06:20:06 | NULL       |  5 | stable-compute-2  | nova-compute     | compute     |      1393167 |        0 |       0 | NULL            |
| 2013-11-15 14:25:56 | 2014-04-29 06:20:05 | NULL       |  6 | stable-compute-4  | nova-compute     | compute     |      1392495 |        0 |       0 | NULL            |
| 2013-11-15 14:26:34 | 2013-11-15 15:06:09 | NULL       |  7 | 002590628c0c      | nova-compute     | compute     |          219 |        0 |       0 | NULL            |
| 2013-11-15 14:27:14 | 2014-04-29 06:20:10 | NULL       |  8 | stable-controller | nova-cert        | cert        |      1421467 |        0 |       0 | NULL            |
| 2013-11-15 15:48:53 | 2014-04-29 06:20:05 | NULL       |  9 | stable-compute-3  | nova-compute     | compute     |      1392736 |        0 |       0 | NULL            |
+---------------------+---------------------+------------+----+-------------------+------------------+-------------+--------------+----------+---------+-----------------+

The output for one of our test cloud is shown above, clearly the node that we want to remove is ”002590628c0c”. Note down the corresponding id for the erring host entry. This ”id” value will be used for ”service_id” in the following queries. Modify the example case with your own specific data. It is important that you first remove the corresponding entry from the ”compute_nodes” table and then in the ”services” table, otherwise due to foreign_key dependencies, the deletion will fail.

mysql> delete from compute_nodes where service_id=7;
mysql> delete from services where host='002590628c0c';

Change the values above with corresponding values in your case. Voila! The erring compute entries are gone in the dashboard view and also from the resource consumed metrics.

One thought on “Managing hosts in a running OpenStack environment

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>