Setting up Live Migration in Openstack Icehouse [Juno]

[Update 8.12.2014] Since OpenStack’s Juno release hasn’t introduced any changes regarding live migration, Juno users should be able to follow this tutorial as well as the Icehouse users. If you experience any issues let us know. The same setup can be used for newer versions of QEMU and Libvirt as well. Currently we are using QEMU 2.1.5 with Libvirt 1.2.11.

The Green IT theme here in ICCLab is working on monitoring and reducing datacenter energy consumption by leveraging Openstack’s live migration feature. We’ve already experimented a little with live migration in the Havana release (mostly with no luck), but since live migration is touted as one of the new stable features in the Icehouse release, we decided to investigate how it has evolved. This blogpost, largely based on official Openstack documentation, provides step-by-step walkthrough of how to setup and perform virtual machine live migration with servers running the Openstack Icehouse release and KVM/QEMU hypervisor with libvirt.

Virtual machine (VM) live migration is a process, where a VM instance, comprising of its states, memory and emulated devices, is moved from one hypervisor to another with ideally no downtime. It can come handy in many situations such as basic system maintenance, VM consolidation and more complex load management systems designed to reduce data center energy consumption. Following system configuration was used for our testing:

  • 3 nodes: 1 control node (node-1), 2 compute nodes (node-2, node-3)
  • Mirantis Openstack 5.0 (which contains a set of sensible deployment options and the Fuel deployment tool)
  • Openstack Icehouse release
  • Nova 2.18.1
  • QEMU 1.2.1
  • Libvirt 0.10.2

The default Openstack live migration process requires a shared file system (e.g. NFS, GlusterFS) across both source and destination computing hosts to copy the VM via disk. Openstack also supports “Block live migration” where a VM disk is copied via TCP and hence no shared file system is needed. Shared files system usually ensure better migration performance while the block migration approach provides better security due to file system separation.

migration

System setup

1. Network configuration

Make sure all hosts (hypervisors) run in the same network/subnet.

1.1. DNS configuration

Check configuration and consistency of /etc/hosts file across all hosts.

192.168.0.2     node-1  node-1.domain.tld
192.168.0.3     node-2  node-2.domain.tld
192.168.0.4     node-3  node-3.domain.tld

1.2. Firewall configuration

Configure /etc/sysconfig/iptables file to allow libvirt listen on TCP port 16509 and don’t forget to add a record accepting KVM communication on TCP port within the range from 49152 to 49261.

-A INPUT -p tcp -m multiport --ports 16509 -m comment --comment "libvirt" -j ACCEPT
-A INPUT -p tcp -m multiport --ports 49152:49216 -m comment --comment "migration" -j ACCEPT

2. Libvirt configuration

Enable libvirt listen flag at /etc/sysconfig/libvirtd file.

LIBVIRTD_ARGS=”–listen”

Configure /etc/libvirt/libvirtd.conf file to make the hypervisor listen tcp communication with none athentication. Since authentication is set to NONE it’s strongly recommended to use SSH keys for authentication.

listen_tls = 0
listen_tcp = 1
auth_tcp = “none”

3. Nova configuration

Openstack doesn’t use real live migration mechanism as a default setting, because there is no guarantee that the migration is successful. An example of situation in which a migration never ends is one in which memory pages are dirtied faster than they are transfered to destination host.

To enable real live migration set up live_migration flag in /etc/nova/nova.conf file as follows:

live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

Once these settings are configured, it should be possible to perform a live migration.

Live Migration Execution

First, list available VMs:

$ nova list
+-------------------------+------+--------+-------------+---------------+
| ID                      | Name | Status | Power State | Networks      |
+-------------------------+------+--------+-------------+---------------+
| 4cfe0dfb-f28f-43e9-.... | vm   | ACTIVE | Running     | 10.0.0.2      |
+-------------------------+------+--------+-------------+---------------+

Next, show VM details and determine which host an instance running on:

nova show <VM-ID>
+--------------------------------------+--------------------------------+
| Property                             | Value                          |
+--------------------------------------+--------------------------------+
| OS-EXT-SRV-ATTR:host                 | node-2.domain.tld              |
+--------------------------------------+--------------------------------+

After that, list the available compute hosts and choose the host you want to migrate the instance to:

$ nova host-list
+-------------------+-------------+----------+
| host_name         | service     | zone     |
+-------------------+-------------+----------+
| node-2.domain.tld | compute     | nova     |
| node-3.domain.tld | compute     | nova     |
+-------------------+-------------+----------+

Then, migrate the instance to new host.

For live migration using shared file system use:

$ nova live-migration <VM-ID> <DEST-HOST-NAME>

For block live migration use the same command with block_migrate flag enabled:

$ nova live-migration --block_migrate <VM-ID> <DEST-HOST-NAME>

Finally, show the VM details and check if it has been migrated successfully:

$ nova show <VM-ID>
+--------------------------------------+----------------------------+
| Property                             | Value                      |
+--------------------------------------+----------------------------+
| OS-EXT-SRV-ATTR:host                 | node-3.domain.tld          |
+--------------------------------------+----------------------------+

Note: If you don’t specify target compute node explicitly nova-scheduler chooses suitable one from available nodes automatically.

Congratulations, you’ve just migrated your VM.

[UPDATE]
If you are more interested in live migration performance in OpenStack you can check out our newer blog posts:

 


29 Kommentare

  • Hi,

    I have tried to setup the openStack icehouse with three different nodes where one node acts as controller node, and other two nodes act as compute nodes. I have followed the instructions that is mentioned here to enable live block migration. But I am getting this following error:

    NoLiveMigrationForConfigDriveInLibVirt: Live migration of instances with config drives is not supported in libvirt unless libvirt instance path and drive data is shared across compute nodes.

    Thanks for your any kind of suggestion.

    • Hello,

      first of all please note, that there are 2 types of live migrations in Openstack – live migration (LM) and block live migration (BLM).
      LM requires shared file system between compute nodes and instance path should be the same in both cases (default is /var/lib/nova/instances).
      If you don’t use shared file system make sure that you are running the live migration with –block-migrate parameter (or check block migration option in Horizon). We also experienced some issues trying to BLOCK live migrate instance on SHARED storage. So I would suggest try using BLM only for NON-SHARED filesystem first.

      Otherwise you might be experiencing the bug described here and can apply released fix – https://bugs.launchpad.net/nova/+bug/1351002/

      • Hi,
        Thanks for replying, I am trying live migration with block migrate parameter without sharing any storage but libvirt is complaining about this “NoLiveMigrationForConfigDriveInLibVir”. I am wondering is there any configuration missing in libvirt or qemu?

        • Here are ours libvirt and QEMU settings:
          /etc/libvirt/libvirtd.conf:
          listen_tls = 0
          listen_tcp = 1
          auth_tcp = “none”

          /etc/libvirt/qemu.conf:
          security_driver=”none”

          Hope it helps.

          • Thanks a lot for the config file but I have already this parameters. But I have found the problem in the nova.conf, the parameter should be “force_config_drive = None” .

            I have one more query as you already did some tests with the live migration, I am wondering how you keep track of the total migration time? How the migration traffic is different from other traffics? The log file is generating lots of messages, so how can I pick the right time?

            Do you have some scripts to do the measurements like for calculating the downtime, total data transfer? If possible can you please share with me.

            Thanks again for your cordial response.

          • Hi,
            Thanks for the reply. It was the problem in nova.conf file, the parameter force_config_drive need to be false.

            I was wondering how can I keep track of the migration time as nova is producing lots of lines in log file and its hard to guess the correct time?

    • Glad that you’ve solved your problem.

      Concerning the testing scripts, since all these test were quite complex and not successful in every case, the best (fastest) option was to keep tracking the results manually. Here are the methods we used.

      Migration time:
      Migration iniciation generates following record in /var/log/nova-all.log at the DESTIONATION host – “Oct 10 12:49:04 node-x nova-nova.virt.libvirt.driver INFO: Instance launched has CPU info:…”
      END of the migration is recorded in the same file on the SOURCE node by record- “Oct 10 12:49:28 node-x nova-nova.compute.manager INFO: Migrating instance to node-y.domain.tld finished successfully.”

      Downtime:
      You can simply get the migration time by subtracting these two values, but please note that “Instance launched” record is generated every time when any instance is spawned on the node and not only in the case of VM migration..
      You can GREP for Migration or Instance to get only these records.

      Downtime is calculated easily – you can just multiply number of lost packets by the interval between two following packets.

      Anyway, the idea of having these tests scripted is really tempting and can save some time in our future work. So I guess I’ll code it down. I’ll keep you updated.

  • Hi,, Thanks for the article, this is helpfull. I try to follow your instruction, but I got a problem and this is log from /var/log/libvirt/libvirtd.log
    2014-12-16 04:36:08.505+0000: 1935: info : libvirt version: 1.2.2
    2014-12-16 04:36:08.505+0000: 1935: error : virNetSocketNewConnectTCP:484 : unable to connect to server at ‘compute2:16509’: No route to host
    2014-12-16 04:36:08.506+0000: 1935: error : doPeer2PeerMigrate:4040 : operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute2/system: unable to connect to server at ‘compute2:16509’: No route to host

    Can you help me? thanks before.

    • Hi, it seems there is no route to the destination machine. Can you ping the destination host from the source (‘ping compute2’)? If not check /etc/hosts whether it contains record ‘ compute2’ (This file should contain ips and hostnames of every host) or try to use host’s ip address directly instead of the hostname in the configuration files (/etc/nova/nova.conf). Also check out the routing configuration on your hosts by the ‘route’ command. Hope it helps.

  • Hi,
    I tried live migration
    I get an error like
    $nova live-migration bf441c35-d7e4-4ffa-926e-523690bd815d celestial7

    ERROR (ClientException): Live migration of instance bf441c35-d7e4-4ffa-926e-523690bd815d to host celestial7 failed (HTTP 500) (Request-ID: req-69bd07e6-3318-4f50-8663-217b98ba7564)

    Overview of instance :

    Message
    Remote error: libvirtError Requested operation is not valid: no CPU model specified [u’Traceback (most recent call last):\n’, u’ File “/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py”, line 142, in _dispatch_and_reply\n executo
    Code
    500
    Details
    File “/opt/stack/nova/nova/conductor/manager.py”, line 606, in _live_migrate block_migration, disk_over_commit) File “/opt/stack/nova/nova/conductor/tasks/live_migrate.py”, line 194, in execute return task.execute() File “/opt/stack/nova/nova/conductor/tasks/live_migrate.py”, line 62, in execute self._check_requested_destination() File “/opt/stack/nova/nova/conductor/tasks/live_migrate.py”, line 100, in _check_requested_destination self._call_livem_checks_on_host(self.destination) File “/opt/stack/nova/nova/conductor/tasks/live_migrate.py”, line 142, in _call_livem_checks_on_host destination, self.block_migration, self.disk_over_commit) File “/opt/stack/nova/nova/compute/rpcapi.py”, line 391, in check_can_live_migrate_destination disk_over_commit=disk_over_commit) File “/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py”, line 156, in call retry=self.retry) File “/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py”, line 90, in _send timeout=timeout, retry=retry) File “/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py”, line 417, in send retry=retry) File “/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py”, line 408, in _send raise result
    Created
    Feb. 17, 2015, 8:20 a.m.

    Can anybody help me with this.
    Thanks

    • Hello,
      you can try to specify libvirt_cpu_mode and libvirt_cpu_model flags in nova.conf file (https://wiki.openstack.org/wiki/LibvirtXMLCPUModel) as follows:
      libvirt_cpu_mode = custom
      libvirt_cpu_model = cpu64_rhel6

      Don’t forget to restart nova-compute service afterwards.
      You can also find more error related messages directly in the libvirtd.log on the source/destination hosts.
      Hope it helps.
      Cheers!

      • Thanks a lot for the suggestion
        I found my cpu model in cpu_max.xml and did changes accordingly
        Still giving same error.

        libvirtd.log after error:

        2015-02-17 10:36:37.293+0000: 17166: warning : qemuOpenVhostNet:522 : Unable to open vhost-net. Opened so far 0, requested 1

        How do I solve this ?
        Thanks in advance.

        • and this
          libvirtd.log

          error : netcfStateCleanup:109 : internal error: Attempt to close netcf state driver with open connections

    • Hello again, unfortunately I have never seen that behavior before. You could try to use directly model “cpu64_rhel6” or “kvm64” and try if it fits your configuration otherwise I would suggest to start looking at the logs from nova (nova-all.log and nova/nova-compute.log), libvirt (see the previous post) and QEMU (libvirt/qemu/instance/[instance.id].log) from both the source and destination. Could you please paste relevant parts of the those logs somewhere online or send it to me directly (cima[at]zhaw.ch) together with nova.conf file and your setup configuration?

  • Hi,

    Please clarify…
    I do not have the directory /etc/sysconfig on ubuntu 14.04, is this something I create manually?
    If the answer is yes, should I then create the file /etc/sysconfig/libvirtd manually as well?

    Thanks

    • Hi Bobby, /etc/sysconfig is normally present in Redhat Linux distributions (RHEL, Centos, Fedora), but you are using Ubuntu which is a different operating system.

      libvirt is configured differently in Ubuntu.
      In Ubuntu the libvirtd configuration file is found under /etc/default and it is called “libvirt-bin” (instead of “libvirtd”). So your file should be “/etc/default/libvirt-bin” (instead of “/etc/sysconfig/libvirtd”).
      In this file you should add:
      libvirtd_opts=”-d -l”
      (Instead of:
      LIBVIRTD_ARGS=”–listen”)

      The rest of the process should be similar.

  • Hi Cima,

    I am running Devstack multinode setup(one controller,2 compute) from master branch.
    I am trying to do block migration with the steps provided by you, but i am not seeing the instance migration.
    can you please help me to resolve this.
    I am observing that my nova-compute service on destination node is going offline after triggering block migration.
    Here are my setup details.
    One VM running controller+network+compute node (i had disabled nova-compute on this node).
    Two other VMs are running as compute nodes.
    I bring up VM on 1st compute node and trying to migrate to another compute node.
    Interestingly i am not seeing error when i trigger block live migration from OpenStack dashboard, but after some time my destination compute node is going offline.

    below is my nova.conf on compute node.

    [DEFAULT]
    vif_plugging_timeout = 300
    vif_plugging_is_fatal = True
    linuxnet_interface_driver =
    security_group_api = neutron
    network_api_class = nova.network.neutronv2.api.API
    firewall_driver = nova.virt.firewall.NoopFirewallDriver
    compute_driver = libvirt.LibvirtDriver
    default_ephemeral_format = ext4
    metadata_workers = 8
    ec2_workers = 8
    osapi_compute_workers = 8
    rpc_backend = rabbit
    keystone_ec2_url = http://10.212.24.106:5000/v2.0/ec2tokens
    ec2_dmz_host = 10.212.24.106
    xvpvncproxy_host = 0.0.0.0
    novncproxy_host = 0.0.0.0
    vncserver_proxyclient_address = 127.0.0.1
    vncserver_listen = 127.0.0.1
    vnc_enabled = true
    xvpvncproxy_base_url = http://10.212.24.106:6081/console
    novncproxy_base_url = http://10.212.24.106:6080/vnc_auto.html
    logging_exception_prefix = %(color)s%(asctime)s.%(msecs)03d TRACE %(name)s ^[[01;35m%(instance)s^[[00m
    logging_debug_format_suffix = ^[[00;33mfrom (pid=%(process)d) %(funcName)s %(pathname)s:%(lineno)d^[[00m
    logging_default_format_string = %(asctime)s.%(msecs)03d %(color)s%(levelname)s %(name)s [^[[00;36m-%(color)s] ^[[01;35m%(instance)s%(color)s%(message)s^[[00m
    logging_context_format_string = %(asctime)s.%(msecs)03d %(color)s%(levelname)s %(name)s [^[[01;36m%(request_id)s ^[[00;36m%(user_name)s %(project_name)s%(color)s] ^[[01;35m%(instance)s%(color)s%(message)s^[[00m
    #force_config_drive = True
    force_config_drive = False
    send_arp_for_ha = True
    multi_host = True
    instances_path = /opt/stack/data/nova/instances
    state_path = /opt/stack/data/nova
    s3_listen = 0.0.0.0
    metadata_listen = 0.0.0.0
    ec2_listen = 0.0.0.0
    osapi_compute_listen = 0.0.0.0
    instance_name_template = instance-%08x
    my_ip = 10.212.24.108
    s3_port = 3333
    s3_host = 10.212.24.106
    default_floating_pool = public
    force_dhcp_release = True
    dhcpbridge_flagfile = /etc/nova/nova.conf
    scheduler_driver = nova.scheduler.filter_scheduler.FilterScheduler
    rootwrap_config = /etc/nova/rootwrap.conf
    api_paste_config = /etc/nova/api-paste.ini
    allow_resize_to_same_host = True
    debug = True
    verbose = True

    [database]
    connection =

    [api_database]
    connection =

    [oslo_concurrency]
    lock_path = /opt/stack/data/nova

    [spice]
    enabled = false
    html5proxy_base_url = http://10.212.24.106:6082/spice_auto.html

    [oslo_messaging_rabbit]
    rabbit_userid = stackrabbit
    rabbit_password = cloud
    rabbit_hosts = 10.212.24.106
    [glance]
    api_servers = http://10.212.24.106:9292

    [cinder]
    os_region_name = RegionOne

    [libvirt]
    vif_driver = nova.virt.libvirt.vif.LibvirtGenericVIFDriver
    inject_partition = -2
    live_migration_uri = qemu+ssh://stack@%s/system
    #live_migration_uri = qemu+tcp://%s/system
    use_usb_tablet = False
    cpu_mode = none
    virt_type = qemu
    live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED

    [neutron]
    url = http://10.212.24.106:9696
    region_name = RegionOne
    admin_tenant_name = service
    auth_strategy = keystone
    admin_auth_url = http://10.212.24.106:35357/v2.0
    admin_password = cloud
    admin_username = neutron

    [keymgr]
    fixed_key = f1a0b6c1ce848e4ed2dad31abb3e7fd49183dc558e34931c22eb4884a9096ddc
    I am able to ssh without password through stack user from controller node to all compute nodes.

    • Correction in above comment:
      nova-compute on source compute node is going offline not destination.

  • Hi Cima,

    Below is the output from libvirt.log
    2015-08-25 06:34:32.323+0000: 4216: debug : virConnectGetLibVersion:1590 : conn=0x7f7b14002460, libVir=0x7f7b3bde1b90
    2015-08-25 06:34:32.325+0000: 4212: debug : virDomainLookupByName:2121 : conn=0x7f7b14002460, name=instance-00000014
    2015-08-25 06:34:32.325+0000: 4212: debug : qemuDomainLookupByName:1402 : Domain not found: no domain with matching name ‘instance-00000014’
    2015-08-25 06:34:32.327+0000: 4213: debug : virConnectGetLibVersion:1590 : conn=0x7f7b14002460, libVir=0x7f7b3d5e4b90
    2015-08-25 06:34:32.328+0000: 4214: debug : virDomainLookupByName:2121 : conn=0x7f7b14002460, name=instance-00000014
    2015-08-25 06:34:32.328+0000: 4214: debug : qemuDomainLookupByName:1402 : Domain not found: no domain with matching name ‘instance-00000014’
    2015-08-25 06:34:32.612+0000: 4218: debug : virConnectGetLibVersion:1590 : conn=0x7f7b14002460, libVir=0x7f7b3addfb90
    2015-08-25 06:34:32.614+0000: 4216: debug : virDomainLookupByName:2121 : conn=0x7f7b14002460, name=instance-00000014
    2015-08-25 06:34:32.614+0000: 4216: debug : qemuDomainLookupByName:1402 : Domain not found: no domain with matching name ‘instance-00000014’

    • The above one is libvirtd.log snapshot from destination host, but i didn’t see any issue on source compute node where instance is running

      • Hello, I would suggest to take a look “one level up” and check nova-compute.log on both the source and the destination. Also Qemu logs could be useful (note that Qemu records logs per instance. Default location is /var/log/libvirt/qemu/[instance_name].log). Log is created after the instance is spawned. You can get the instance name using “virsh list” command on the host it runs. Virsh command also gives you an insight on the instance state and whether the instance is being created on the destination on removed from the source.

        • Hi Cima,

          There was ssh key issue, I replace ssh with tcp in nova configuration and now it is working fine.

          Thanks,
          Vasu.

  • Hello,

    I am trying block migration in Juno and running into the below error:

    Starting monitoring of live migration _live_migration /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5798
    2015-09-09 14:32:03.939 12766 DEBUG nova.virt.libvirt.driver [-] [instance: 9bb19e79-a8a2-481f-aefb-e05d0567c198] Operation thread is still running _live_migration_monitor /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5652
    2015-09-09 14:32:03.940 12766 DEBUG nova.virt.libvirt.driver [-] [instance: 9bb19e79-a8a2-481f-aefb-e05d0567c198] Migration not running yet _live_migration_monitor /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5683
    2015-09-09 14:32:03.948 12766 ERROR nova.virt.libvirt.driver [-] [instance: 9bb19e79-a8a2-481f-aefb-e05d0567c198] Live Migration failure: Unable to pre-create chardev file ‘/var/lib/nova/instances/9bb19e79-a8a2-481f-aefb-e05d0567c198/console.log’: No such file or directory
    2015-09-09 14:32:03.948 12766 DEBUG nova.virt.libvirt.driver [-] [instance: 9bb19e79-a8a2-481f-aefb-e05d0567c198] Migration operation thread notification thread_finished /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5789
    2015-09-09 14:32:04.441 12766 DEBUG nova.virt.libvirt.driver [-] [instance: 9bb19e79-a8a2-481f-aefb-e05d0567c198] VM running on src, migration failed _live_migration_monitor /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5658
    2015-09-09 14:32:04.442 12766 DEBUG nova.virt.libvirt.driver [-] [instance: 9bb19e79-a8a2-481f-aefb-e05d0567c198] Fixed incorrect job type to be 4 _live_migration_monitor /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5678
    2015-09-09 14:32:04.442 12766 ERROR nova.virt.libvirt.driver [-] [instance: 9bb19e79-a8a2-481f-aefb-e05d0567c198] Migration operation has aborted

    It used to work well for me with the Icehouse version.Not sure whats wrong here.Am i hitting the bug mentioned below,any idea?

    https://bugs.launchpad.net/nova/+bug/1392773


Leave a Reply

Your email address will not be published. Required fields are marked *