Migration of Ceilometer Energy Consumption Data from Havana/MySQL to Icehouse/MongoDB

We are working on upgrading some of our servers from Havana to Icehouse. One of the reasons for doing this is that the performance of Ceilometer on Havana is underwhelming. Part of the move involves moving to a mongodb backend (default in Mirantis Openstack 5.0) but we also want to keep our energy consumption data from our Havana system. Consequently, we have a data migration issue from Havana/mysql to Icehouse/mongo. Here, we describe how we did this migration. (Note that as this is just a small experimental system, we don’t have important user and account data to transfer).

While there is lots of information out there on moving from mysql to mongo the Havana to Icehouse move also brings specific issues with changes in data structures which had to be taken into account here. The data structures are shown below:
mysql-mongo

In havana/mysql, the energy consumption data is captured in three tables: the resource, the source association and the meter. The sample data appears in the meter table. In icehouse/mongo, the source association table has been dropped and there are only two tables – the resources and the meters; again the sample data is stored in the meter table. The resource table has been extended slightly to include more information describing the meter (the unit it uses and the meter type) and the names of some of the fields have been modified a little. The source – which was previously in a separate sourceassoc – has now been captured in both the resource and meter tables.

While the differences are not large, migrating the data was not as simple as exporting a csv and importing it into mongo; we wrote a short python script to make the data transformations necessary for this import. We used the MySQLdb, pymongo and json libraries.

First, we had to import relevant libraries and set up the database connection.


#!/usr/bin/python
import MySQLdb
import MySQLdb.cursors
import pymongo
from pymongo import MongoClient
import json
import datetime
connection = MySQLdb.connect (host = "host.ip.address", user = "user", passwd = "passw", db = "ceilometer", cursorclass = MySQLdb.cursors.DictCursor)

Then, we have to extract the resource information from the mysql database. We need to extract that subset of the resources that are connected to energy meters via the source association and then we need to change the names of the fields in each of the records – id needs to be changed to _id, source_id is changed to source, resource_metadata is changed to metadata and change the metadata from string type to dictionary.(Note that in our system, we had three energy meters named ‘node-41.domain.tld’, ‘node-42.domain.tld’ and ‘node-43.domain.tld’). Also, here we add a source field which comes from the sourceassoc directly to the resource record (in the field source).


cursor = connection.cursor()
cursor.execute ("select distinct r.id, r.resource_metadata, r.project_id, r.user_id, s.source_id from resource as r inner join sourceassoc as s on s.resource_id = r.id where  r.id = 'node-41.domain.tld'  or r.id = 'node-42.domain.tld' or r.id = 'node-43.domain.tld'")
resource_data = cursor.fetchall()
cursor.close()

for resource in resource_data:
    resource['_id'] = resource.pop('id')
    resource['source'] = resource.pop('source_id')
    resource['metadata'] = resource.pop('resource_metadata')
    resource['metadata'] = json.loads(resource['metadata'])

Next, we need to add basic meter information to the resource table. This had to be extracted from the havana/mysql meter tables and inserted into the icehouse/mongo resource tables. It involved obtaining the set of distinct counter names, units, types and resource id from havana/mysql and matching these with resource ids in icehouse/mongo.


cursor = connection.cursor()
# assumes that the counter_name, counter_unit and counter_type are the
# same in all samples for a particular meter
cursor.execute ("select distinct counter_name, counter_unit, counter_type, resource.id from meter, resource where resource.id = meter.resource_id and meter.counter_name = 'energy'")
meter_resource_data = cursor.fetchall()
cursor.close()

try:
    for meter_resource in meter_resource_data:
        for resource in resource_data:
            if resource['_id'] == meter_resource['id'] :
                del meter_resource['id']
                meter_list = []
                meter_list.append(meter_resource)
                resource['meter'] = meter_list
                break
except:
    print 'Error iterating over resource samples' 

Having made the modifications to the resource table, we can now write it to the mongo database.


client = MongoClient()
# the database name is usually ceilometer, so the next line usually looks like
# db = client.ceilometer
db = client.databasename
# the collection name is usually resource, so the next line usually looks like
# collection = db.resource
collection = db.collectioname
for row in resource_data:
    collection.insert(row)

Only a couple of small modifications are required to the meter table – we need to remove the id and modify the timestamp as the timestamp in the havana/mysql model is represented as standard unix time, but the icehouse/mongo timestamp is represented using an ISO format. The code below does this.


cursor = connection.cursor()
cursor.execute ("select m.*, s.source_id from meter as m inner join sourceassoc as s on s.meter_id = m.id and m.counter_name = 'energy';")
meters = cursor.fetchall()
connection.close()
try:
    for meter in meters:
        meter['resource_metadata'] = json.loads(meter['resource_metadata'])
        meter['source'] = meter.pop('source_id')
        meter['timestamp'] = datetime.datetime.fromtimestamp(meter['timestamp'])
        meter.pop('id')
except:
    print 'Error iterating over meter samples'
# and then we put it into the mongo db
# the collection name meter is assumed below - this is default in ceilometer
collection_meter = db.meter
for row in meters:
    collection_meter.insert(row)

Note: Initial tests indicate that Icehouse/mongo is much faster than Havana/mysql, the difference between the responses was approximately 30 sec (45 sec in mysql and 16 in mongo).

The full code is below in case you’d like to copy and paste.


#!/usr/bin/python
#-*- coding: utf-8 -*-
import MySQLdb
import MySQLdb.cursors
import pymongo
from pymongo import MongoClient
import json
import datetime

connection = MySQLdb.connect(host = "host", user = "user", passwd = "passw", db = "ceilometer", cursorclass = MySQLdb.cursors.DictCursor)
cursor = connection.cursor()
cursor.execute ("select distinct r.id, r.resource_metadata, r.project_id, r.user_id, s.source_id from resource as r inner join sourceassoc as s on s.resource_id = r.id where  r.id = 'node-41.domain.tld'  or r.id = 'node-42.domain.tld' or r.id = 'node-43.domain.tld'")
resource_data = cursor.fetchall()
cursor.close()

for resource in resource_data:
    resource['_id'] = resource.pop('id')
    resource['source'] = resource.pop('source_id')
    resource['metadata'] = resource.pop('resource_metadata')
    resource['metadata'] = json.loads(resource['metadata'])

cursor = connection.cursor()
cursor.execute ("select distinct counter_name, counter_unit, counter_type, resource.id from meter, resource where resource.id = meter.resource_id")
meter_resource_data = cursor.fetchall()
cursor.close()

try:
    for meter_resource in meter_resource_data:
        for resource in resource_data:
            if resource['_id'] == meter_resource['id'] :
                del meter_resource['id']
                meter_list = []
                meter_list.append(meter_resource)
                resource['meter'] = meter_list
                break
except:
    print 'Error iterating over resource samples'

cursor = connection.cursor()
cursor.execute ("select m.*, s.source_id from meter as m inner join sourceassoc as s on s.meter_id = m.id and m.counter_name = 'energy';")
meters = cursor.fetchall()
connection.close()

try:
    for meter in meters:
        meter['resource_metadata'] = json.loads(meter['resource_metadata'])
        meter['source'] = meter.pop('source_id')
        meter['timestamp'] = datetime.datetime.fromtimestamp(meter['timestamp'])
        meter.pop('id')
except:
    print 'Error iterating over meter samples'

client = MongoClient()
db = client.ceilometer

collection_resource = db.resource
for row in resource_data:
    collection_resource.insert(row)

collection_meter = db.meter
for row in meters:
    collection_meter.insert(row)

Thanks Seán for all the help.

If you’re interested in this topic, you may be interested in some of our other blog posts:


Leave a Reply

Your email address will not be published. Required fields are marked *