Fix inconsistent Openstack volumes and instances from Cinder and Nova via the database
Published: 22-12-2014 | Author: Remy van Elst | Text only version of this article
Table of Contents
When running Openstack, sometimes the state of a volume or an instance can beinconsistent on the cluster. Nova might find a volume attached while Cinder saysthe volume is detached or otherwise. Sometimes a volume deletion hangs, or adetach does not work. If you've found and fixed the underlying issue (lvm,iscsi, ceph, nfs etc...) you need to bring the database up to date with the newconsistent state. Most of the time a reset-state works, sometimes you need tomanually edit the database to correct the state. These snippets show you how.
Please note that it is important to find and fix the underlying issue. If youfor example have a volume which hangs on detaching, resetting the database is aquick hack and not a real fix. Make sure you first fix the underlying issue andcause before you update the database.
These examples were tested with all components on Juno and on Icehouse withMySQL as the backing database.
Please be extermely carefull with these examples.
Delete an instance
Your NFS backing storage might have crashed halfway during a VM delete. You'vemanually deleted all the related files (disk, config etc) and removed the VMdomain from the backing hypervisor (virsh, esxi etc). However
nova show stillsees the VM as active (or error). A
nova reset-state --active doesn't fix thedelete part. The following query can be used to set an instance as deleted:
$ mysql nova_db> update instances set deleted='1', vm_state='deleted', deleted_at='now()'' where uuid='$vm_uuid' and project_id='$project_uuid';
nova delete $uuid is the correct way to delete a VM.
If you want to actually delete a from the database instead of marking it asdeleted, the following queries should do that:
$ mysql nova_db> delete from instance_faults where instance_faults.instance_uuid = '$vm_uuid';> delete from instance_id_mappings where instance_id_mappings.uuid = '$vm_uuid';> delete from instance_info_caches where instance_info_caches.instance_uuid = '$vm_uuid';> delete from instance_system_metadata where instance_system_metadata.instance_uuid = '$vm_uuid';> delete from security_group_instance_association where security_group_instance_association.instance_uuid = '$vm_uuid';> delete from block_device_mapping where block_device_mapping.instance_uuid = '$vm_uuid';> delete from fixed_ips where fixed_ips.instance_uuid = '$vm_uuid';> delete from instance_actions_events where instance_actions_events.action_id in (select id from instance_actions where instance_actions.instance_uuid = '$vm_uuid');> delete from instance_actions where instance_actions.instance_uuid = '$vm_uuid';> delete from virtual_interfaces where virtual_interfaces.instance_uuid = '$vm_uuid';> delete from instances where instances.uuid = '$vm_uuid';
Change the compute host of a VM
nova migrate or
nova resize might have failed. The disk could be alreadymigrated or still on your shared storage but nova is confused. Make sure the VMdomain is only one compute node (preferably the on it came from, use
novamigration-list to find that out) and the backing disk/config files are alsoonly on one hypervisor node (lsof and tgt-adm are your friends here). Thefollowing query changes the VM hypervisor host for nova:
$ mysql nova_db> update instances set host='compute-hostname.domain',node='compute-hostname.domain' where uuid='$vm_uuid' and project_id='$project_uuid';
nova migrate $vm_uuid or a
nova resize $vm_uuid $flavor should beenough.
Set a volume as detached in Cinder
Your backing cinder storage might have issues or bugs which cause
nova volume-detach $vm_uuid $volume_uuid to fail sometimes. It might be detached in Novabut still have the state
Detaching in Cinder. Make sure the VM domain has theactual disk removed. Also check our backing storage (ceph, lvm, iscsi etc..) tomake sure it is actually detached and not in use anymore.
cinder reset-state --state available $volume_uuid first. If that fails,the following
cinder mysql query sets the Cinder state to available:
$ mysql cinder_db> update cinder.volumes set attach_status='detached',status='available' where id ='$volume_uuid';
Absolutely make sure that there is no data being written from to the volume, itmight cause data loss otherwise.
Do note that the cinder python api (
import cinderclient.v2) also has the
cinder.volumes.detach(volume_id) call. You do need to write some toolingaround that.(http://docs.openstack.org/developer/cinder/devref/volume.html?highlight=detach_volume).
Detach a volume from Nova
Sometimes the volume is detached from Cinder but Nova still shows it asattached. Same caution warnings as above count, make sure you check your backingstorage first to see if the volume is actually detached and not in use, dataloss otherwise.
The followng query removes the nova block device mapping:
$ mysql nova_db> delete from block_device_mapping where not deleted and volume_id='$volume_uuid' and project_id='$project_uuid';
The correct way is, of course,
nova volume_detach $vm_uuid $volume_uuid.
If you use
virsh make sure you also
nova reboot --hard $vm_uuid to rebuildthe
virsh domain. If you don't do that, the volume might fail to attachbecause
virsh can't attach it at the mount point (
/dev/vdX) since it thinksit is already in use.
Delete a volume from Cinder
It might be that a volume has an error deleting. It ends up in the
Error_deleting state. Try a
cinder reset-state --state available$volume_uuid first. If all fails, check your backing storage to see whathappened and if the volume is actually removed or not. If not, remove it. Thenyou can update the cinder database to set it as deleted:
$ mysql cinder_db> update volumes set deleted=1,status='deleted',deleted_at=now(),updated_at=now() where deleted=0 and id='$volume_uuid';
The correct way is
cinder delete $volume_uuid.
Word of caution
If you have these inconsistencies you have bigger problems you need to fixinstead of manually setting state and updating components. Openstack should makethat part easier, remember?
If you execute these queries wrong you can cause serious data loss!
Check your logging, set it to debug everywhere and get a reproducable scenario.Then find a solution, report a bug, test the fix and deploy it in your test,accept and then production environment.Tags: articles, cinder, cloud, compute, iscsi, lvm, mysql, nfs, nova, openstack, volume, zfs