Skip to main content

Raymii.org Logo (IEC resistor symbol)logo

Quis custodiet ipsos custodes?
Home | About | All pages | RSS Feed | Gopher

Openstack - (Manually) migrating (KVM) Nova compute virtual machines

Published: 13-06-2015 | Author: Remy van Elst | Text only version of this article


Table of Contents


This guide shows you how to migrate KVM virtual machines with the Openstack Novacompute service, either manually or with the Openstack tooling.

Migrating compute instances is very usefull. It allows an administrator to freeup a compute node for maintenance/updates. It also allows an administrator tobetter distribute resources between compute nodes.

Openstack provides a few different ways to migrate virtual machines from onecompute node to another. Each option has different requirements andrestrictions, for example:

Later on in this guide, I'll list the most common limitations per situation.

This article describes the most common migration scenario's including live andmanual migration using native linux tools.

You can see all my Openstack related articles here. For example, how tobuild a High Available cluster with Ansible and Openstack.

If you like this article, consider sponsoring me by trying out a Digital OceanVPS. With this link you'll get $100 credit for 60 days). (referral link)

It is tested on an Openstack cloud running Icehouse with KVM as the onlyhypervisor. If you run Xen or something else, the manual process is mostly thesame, however might require specific adaptions.

Configure (live) migration

There are a few things you need to configure (live) migrations on an Openstackcloud. The Openstack documentation describes this very well, read thatfirst.

If you have a cloud configured for live migration, read on.

Migration limitations

Openstack has two commands specific to virtual machine migration:

The nova migrate command shuts down an instance to move it to anotherhypervisor.

The nova live-migration command has almost no instance downtime.

Here are some examples when to use which option:

All these options are described below in detail.

Hypervisor Capacity

Before you do a migration, check if the hypervisor host has enough free capacityfor the VM you want to migrate:

nova host-describe compute-30

Example output:

+-------------+----------------------------------+-----+-----------+---------+| HOST        | PROJECT                          | cpu | memory_mb | disk_gb |+-------------+----------------------------------+-----+-----------+---------+| compute-30 | (total)                          | 64  | 512880    | 5928    || compute-30 | (used_now)                       | 44  | 211104    | 892     || compute-30 | (used_max)                       | 44  | 315568    | 1392    || compute-30 | 4[...]0288                       | 1   | 512       | 20      || compute-30 | 4[...]0194                       | 20  | 4506      | 62      |+-------------+----------------------------------+-----+-----------+---------+

In this table, the first row shows the total amount of resources available onthe physical server. The second line shows the currently used resources. Thethird line shows the maximum used resources. The fourth line and below shows theresources available for each project.

If the VM flavor fits on this hypervisor, continue on with the manual migration.If not, free up some resources or choose another compute server.

If the hypervisor node lacks enough capacity, the migration will fail.

(Live) migration with nova live-migration

The live-migration command works with the following types of vm's/storage:

The live-migration command requires the same CPU on both hypervisors. It ispossible to set a generic CPU for the VM's, or a generic set of CPU features.This however does not work on versions lower than Kilo due to a bug whereNova compares the actual CPU instead of the virtual CPU. In my case, all thehypervisor machines are the same, lucky me. This is fixed in Kilo or later.

On versions older than Kilo, the Compute service does not use libvirt's livemigration functionality by default, therefore guests are suspended beforemigration and might experience several minutes of downtime. This is becausethere is a risk that the migration process will never end. This can happen ifthe guest operating system uses blocks on the disk faster than they can bemigrated.

To enable true live migration using libvirt's migrate functionality, see theOpenstack documentation linked below.

Shared storage / Volume backed instances

A live-migration is very simple. Use the following command with an instance UUIDand the name of the compute host:

nova live-migration $UUID $COMPUTE-HOST

If you have shared storage, or if the instance is volume backed, this will sendthe instances memory (RAM) content over to the destination host. The sourcehypervisor keeps track of which memory pages are modified on the source whilethe transfer is in progress. Once the initial bulk transfer is complete, pageschanged in the meantime are transferred again. This is done repeatedly with(ideally) ever smaller increments.

As long as the differences can be transferred faster than the source VM dirtiesmemory pages, at some point the source VM gets suspended. Final differences aresent to the target host and an identical machine started there. At the same timethe virtual network infrastructure takes care of all traffic being directed tothe new virtual machine. Once the replacement machine is running, the suspendedsource instance is deleted. Usually the actual handover takes place so quicklyand seamlessly that all but very time sensitive applications ever noticeanything.

You can check this by starting a ping to the VM you are live-migrating. Itwill stay online and when the VM is suspended and resumed on the targethypervisor, the ping responses will take a bit longer.

Block based storage (--block-migrate)

If you don't have shared storage and the VM is not backed by a volume as rootdisk (image based VM's) a live-migration requires an extra parameter:

nova live-migration ----block-migrate $UUID $COMPUTE-HOST

The process is almost exactly the same as described above. There is one extrastep however. Before the memory contents is sent the disk content is copiedover, without downtime. When the VM is suspended, both the memory contents andthe disk contents (difference to the earlier copy) are sent over. The suspendaction takes longer and might be noticable as downtime.

The --block-migrate option is incompatible with read only devices such as ISOCD/DVD drives and the Config Drive.

Migration with nova migrate

The nova migrate command shuts down an instance, copies over the disk to ahypervisor with enough free resources, starts it up there and removes it fromthe source hypervisor. The VM is shut down and will be down as long as thecopying. With a migrate, the Openstack cluster chooses an compute-serviceenabled hypervisor with the most resources available. This works with any typeof instance, with any type of backend storage.

A migrate is even simpler than a live-migration. Here's the syntax:

nova migrate $UUID

This is perfect for instances that are part of a clustered service, or when youhave scheduled and communicated downtime for that specific VM. The downtime isdependent on the size of the disk and the speed of the (storage) network.

rsync over ssh is used to copy the actual disk, you can test the speedyourself with a few regular rsync tests, and combine that with the disksize toget an indication of the migration downtime.

Migrating to a speficic compute node, the dirty way

As seen above, we cannot migrate virtual machines to a specific compute node ifthe compute node does not have shared storage and the virtual machine has aconfigdrive enabled. You can force the Openstack cluster to choose a specifichypervisor by disabling the nova-compute service on all the other hypervisors.The VM's will keep running on there, only new virtual machines and migrationsare not possible on those hypervisors.

If you have a lot of creating and removing of machines in your Openstack Cloud,this might be a bad idea. If you use (Anti) Affinity Groups, vm's created inthere will also fail to start depending on the type of Affinity Group. See myarticle on Affinity Groups for more info on those.

Therefore, use this option with caution. If we have 5 compute nodes,compute-30 to compute-34 and we want to migrate the machine to compute-34,we need to disable the nova-compute service on all other hypervisors.

First check the state of the cluster:

nova service-list --binary nova-compute # or nova-conductor, nova-cert, nova-consoleauth, nova-scheduler

Example output:

+----+--------------+--------------+------+----------+-------+----------------------------+-----------------+| Id | Binary       | Host         | Zone | Status   | State | Updated_at                 | Disabled Reason |+----+--------------+--------------+------+----------+-------+----------------------------+-----------------+| 7  | nova-compute | compute-30   | OS1  | enabled  | up    | 2015-06-13T17:04:27.000000 | -               || 8  | nova-compute | compute-31   | OS2  | enables  | up    | 2015-06-13T17:02:49.000000 | -               || 9  | nova-compute | compute-32   | OS2  | enabled  | up    | 2015-06-13T17:02:50.000000 | None            || 10 | nova-compute | compute-33   | OS2  | enabled  | up    | 2015-06-13T17:02:50.000000 | -               || 11 | nova-compute | compute-34   | OS1  | disabled | up    | 2015-06-13T17:02:49.000000 | Migrations Only |+----+--------------+--------------+------+----------+-------+----------------------------+-----------------+

In this example we have 5 compute nodes, of which one is disabled with reasonMigrations Only. In our case, before we started migrating we have enabled nova-compute on that hypervisor and disabled it on all the other hypervisors:

nova service-disable compute-30 nova-compute --reason 'migration to specific hypervisor the dirty way'nova service-disable compute-31 nova-compute --reason 'migration to specific hypervisor the dirty way'etc...

Now execute the nova migrate command. Since you've disabled all computehypervisors except the target hypervisor, that one will be used as migrationtarget.

All new virtual machines created during the migration will also be spawned onthat specific hypervisor.

When the migration is finished, enable all the other compute nodes:

nova service-enable compute-30 nova-computenova service-enable compute-31 nova-computeetc...

In our case, we would disable the compute-34 because it is for migrationsonly.

This is a bit dirty and might cause problems if you have monitoring on thecluster state or spawn a lot of machines all the time.

Manual migration to a specific compute node

As seen above, we cannot migrate virtual machines to a specific compute node ifthe compute node does not have shared storage and the virtual machine has aconfigdrive enabled. Since Openstack is just a bunch of wrappers around nativeLinux tools, we can manually migrate the machine and update the Nova databaseafterwards.

Do note that this part is specific to the storage you use. In this example weuse local storage (or, a local folder on an NFS mount not shared with othercompute nodes) and image-backed instances.

In my case, I needed to migrate an image-backed block storage instance to a non-shared storage node, but the instance had a configdrive enabled. Disabling thecompute service everywhere is not an option, since the cluster was getting abouta hundred new VM's every 5 minutes and that would overload the hypervisor node.

This example manually migrates a VM from compute-30 to compute-34. Thesenodes are in the same network and can access one another via SSH keys based ontheir hostname.

Shut down the VM first:

nova stop $VM_UUID

Also detach any volumes:

nova volume-detach $VM_UUID $VOLUME_UUID

Use the nova show command to see the specific hypervisor the VM is running on:

nova show UUID | grep hypervisor

Example output:

| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-30    |

Login to that hypervisor via SSH. Navigate to the folder where this instance islocated, in our case, /var/lib/nova-compute/instances/$UUID.

The instance is booted from an image based root disk, named disk. qemu inour case diffs the root disk from the image the VM was created from. Thereforethe new hypervisor also needs that backing image. Find out which file is thebacking image:

cd /var/lib/nova-compute/instances/UUID/qemu-img info disk # disk is the filename of the instance root disk

Example output:

  image: disk  file format: qcow2  virtual size: 32G (34359738368 bytes)  disk size: 1.3G  cluster_size: 65536  backing file: /var/lib/nova-compute/instances/_base/d00[...]61  Format specific information:      compat: 1.1      lazy refcounts: false 

The file /var/lib/nova-compute/instances/_base/d004f7f8d3f79a053fad5f9e54a4aed9e2864561 is the backingdisk. Note that the long filename is not a UUID but a checksum of the specificimage version. In my case it is a raw disk:

qemu-img info /var/lib/nova-compute/instances/_base/d00[...]61

Example output:

image: /var/lib/nova-compute/instances/_base/d00[...]61file format: rawvirtual size: 8.0G (8589934592 bytes)disk size: 344M

Check the target hypervisor for the existence of that image. If it is not there,copy that file to the target hypervisor first:

rsync -r --progress /var/lib/nova-compute/instances/_base/d00[...]61 -e ssh compute-34:/var/lib/nova-compute/instances/_base/d00[...]61

On the target hypervisor, set the correct permissions:

chown nova:nova /var/lib/nova-compute/instances/_base/d00[...]61

Copy the instance folder to the new hypervisor:

cd /var/lib/nova-compute/instances/rsync -r --progress $VM_UUID -e ssh compute-34:/var/lib/nova-compute/instances/

Set the correct permissions on the folder on the target hypervisor:

chown nova:nova /var/lib/nova-compute/instances/$VM_UUIDchown nova:nova /var/lib/nova-compute/instances/$VM_UUID/disk.info chown nova:nova /var/lib/nova-compute/instances/libvirt.xmlchown libvirt:kvm /var/lib/nova-compute/instances/$VM_UUID/console.log chown libvirt:kvm /var/lib/nova-compute/instances/$VM_UUID/disk chown libvirt:kvm /var/lib/nova-compute/instances/$VM_UUID/disk.config

If you use other usernames and groups, change those in the command.

Log in to your database server. In my case that is a MySQL Galera cluster. Startup a MySQL command prompt in the nova database:

mysql nova

Execute the following command to update the nova database with the newhypervisor for this VM:

update instances set node='compute-34', host=node where uuid='$VM_UUID';

This was tested on an IceHouse database scheme, other versions might requireother queries.

Use the nova show command to see if the new hypervisor is set. If so, startthe VM:

nova start $VM_UUID

Attach any volumes that were detached earlier:

nova volume-attach $VM_UUID $VOLUME_UUID

Use the console to check if it all works:

nova get-vnc-console $VM_UUID novnc

Do note that you must check the free capacity yourself. The VM will work ifthere is not enough capacity, but you do run in to weird issues with thehypervisor like bad performance or killed processes (OOM's).

Conclusion

Openstack offers many ways to migrate machines from one compute node to another.Each way is applicable in certain scenario's, and if all else fails you canmanually migrate machines using the underlying linux tools. This article givesyou an overview of the most common migration ways and the scenario's when theyare applicable. Happy migrating.

Further reading

Tags: articles, cloud, cluster, compute, kvm, migrate, nova, openstack, qemu