Solving problems invented by others...
Duplicate disk UUIDs and how to get rid of it… (hopefully)

Duplicate disk UUIDs and how to get rid of it… (hopefully)

Occurrence of the error

As most IT stories, this one also started with an error message.

While trying to install a monitoring solution to our vSphere environment we became aware that we have duplicate disk UUIDs.
(NetApp kb article)

A first query with PowerCLI showed that we have hundreds of disks with duplicate UUIDs.

(Get-VM | Get-Harddisk).ExtensionData.Backing.UUID

While looking at the vms with the same disk UUID we soon realized that these vms have been cloned from a template. So somehow while cloning from a template the disk UUID seems not to be newly generated.

It turns out that this is not really a new problem as this kb article describes:
https://kb.vmware.com/s/article/2006865

Although this kb article only mentions ESXi versions till 6.0.x, I could easily reproduce the error in a 6.7 and a 7.0 environment.

At that point we called the VMware support and asked for help. They confirmed that the error still exists, even in the newest version 7.0. We got some informations about what VMware already knows about the problem. We threw our observations into the ring. We came to the conclusion that we need to set up a test scenario in our vSphere 7.0 environment. There we tested all possible combination we could think of. We found out that the error only occurs if a special constellation is met: When the host where you want to deploy the vm has no access to the datastore where the template is stored.

Let’s assume the following scenario:

  • You have a datastore where you store your templates. -> “template”-datastore
  • Only some of your hosts has access to this datastore. -> “template”-hosts
  • You want to deploy a vm from a template in the “template”-Datastore to a host which has no access to the “template”-datastore -> “deployment”-host

This will result in a vm which has the same disk UUIDs than the template.

As long as the above constellations are met, our tests reveal that the following parameters have no effect on the occurrence of the error:

  • Deployment to a host within the same host-cluster as the “template”-hosts
  • Deployment to a host in a different host-cluster as the “template”-hosts
  • Usage of a content library. (As long as the library is stored in the “templates”-datastore where only the “template”-hosts has access to.)

Based on our current understanding of the error, it only occurs if there are two ESXi hosts involved in the creation of the new vm and the “deployment”-host has no access to the “template”-datastore as shown in this picture:

Change the disk UUID

As the kb article mentions, disk UUIDs can be changed with vmkfstools command on the ESXi host. This even works if the vm is powered on.
https://kb.vmware.com/s/article/2006865

This looks kinda uncool if you have hundreds of disks to be changed. Therefore i searched a way to do this with PowerCLI. I found a very good hint in this blog article:
http://blog.chrischua.net/2015/06/23/change-vmdk-uuid-using-powercli/

With this informations i was able to change the disk uuid of a vm by using this PowerCLI commands:

# get the hard disk of the vm
$mydisk = Get-VM -Name "<insert-vm-name-here>" | Get-Harddisk

# get a reference to the VirtualDiskManager service
$vdm = Get-View -Id (Get-View ServiceInstance).Content.VirtualDiskManager

# get a reference to the datacenter object
$mydc = Get-Datacenter

# query the actual disk uuid
$vdm.QueryVirtualDiskUuid($mydisk.ExtensionData.Backing.FileName,$mydc.Id)

# set a new disk uuid (example guid)
$vdm.SetVirtualDiskUuid($mydisk.ExtensionData.Backing.FileName,$mydc.Id,"60 00 C2 91 17 92 fe 42-ad 44 79 5f 65 6d 93 88")

These commands even work if the vm is running. They change the disk UUID parameter in the vmdk descriptor file in the datastore. There was only one thing that amazed me.
If i query the same disk with the Get-Harddisk command i get the disk information how the vCenter sees it. Even if i changed the disk UUID with the above commands and verified that the vmdk file in the datastore has been changed, i still got the old UUID.

(Get-VM -Name "<insert-vm-name-here>" | Get-Harddisk).ExtensionData.Backing.Uuid

This only changes, when i stop and start the vm. While talking to the VMware support they mentioned that the disk UUID is only read when the vmx process on the ESXi host starts. While thinking about what they said i thought that vMotion also starts a new vmx process when the vm is being moved to another host. And i was right! vMotion really ensures that the disk UUID changes also in the vCenter database.

So till now, i could change the disk UUID of a running vm without downtime. Next step was to take a look at the OS inside the vm.

diskUUID inside the vm

At first i learned that this information only finds its way into the operating system of the vm, if the vm has the advanced setting disk.enableUUID set to TRUE.
The following kb article describes how to add this setting to a vm:
https://kb.vmware.com/s/article/52815

If this is not set, the operating system inside the vm could not read the disk UUID. In that case the change of the disk UUID could not have any effects to the operating system or the applications inside the vm.

If the setting is set, you can see the disk UUID inside the operating system here:

  • Windows
(Get-WmiObject -Class Win32_DiskDrive).SerialNumber
/dev/disk/by-id/wwn-0x<diskUUID>

After a few tests, i found out that the change of the disk UUID in VMware did not change the information in the operating system inside the vm. To get the actual disk UUID, a reboot of the operating system was the only way i found.

Repair all the disks

With all what it learned, i wrote a quick script to change disk UUIDs of running vms.

The script takes these parameters:

  • filtervmnames
    A filter for vm names which i want to change. The reason was, that i at first want to change only a few vms at a time. Just in case something goes terribly wrong when changing disk UUIDs.
  • withenableduuid
    Without this switch parameter, the script changes only vms which have the advanced setting disk.enableUUID not set. In that case the risk that something happens with the operating systems or the applications is very low.
  • makechanges
    Without this switch parameter, the script makes only a dry run and displays what it would do.

If really something happens to the operating system or the application inside the vm because of this change, the script also appends the disk name and the old diskUUID to the notes field of the vm in vCenter. That way i can revert the change by powering off the vm and manually editing the .vmdk descriptor file back to the old value.

At last, do not forget this is not an official solution from VMware. Run the script at your own risk. I’ve tested it as good as i could and found no negative effects for the changed vms. You have been warned!

# definition of parameters
Param(
    [string]$filtervmnames,
    [switch]$withenableuuid,
    [switch]$makechanges
)

# get all hard disks
Write-Host "Get all hard disks...  " -NoNewline
$alldisks = Get-VM | Get-HardDisk
Write-Host ("Found " + $alldisks.Count)

# filter disks for duplicate uuids and show the "disk.enableUUID" parameter
Write-Host "Filter for duplicate uuids...  " -NoNewline
$diskswithduplicateuuids = $alldisks | Select-Object -Property Parent,Name,@{N="Disk-Uuid";E={$_.ExtensionData.Backing.Uuid}} | Group-Object -Property Disk-Uuid | Where-Object -FilterScript { $_.Count -ne 1 } | Select-Object -ExpandProperty Group
Write-Host ("Found " + $diskswithduplicateuuids.Count)

#########################
# Filter for name of vm #
#########################

if ( $filtervmnames -ne '' )
{

    # filter disks with duplicate uuid for specific vm name
    Write-Host "Filter for wanted vm names"
    $diskswithwantedvmnames = $diskswithduplicateuuids | Where-Object -FilterScript { $_.Parent.Name -like $filtervmnames }

}
else
{

    # use all found duplicate uuid disks
    $diskswithwantedvmnames = $diskswithduplicateuuids

}

#########################
#########################

# get the status of the disk.enableUUID AdvancedSetting of these vms
$diskswithwantedvmnamesanduuidsetting = $diskswithwantedvmnames | Select-Object -Property Parent,Name,Disk-Uuid,@{N="diskenableUUID";E={(Get-AdvancedSetting -Name "disk.enableUUID" -Entity $_.Parent).Value}}

##############################
# Filter for disk.enableUUID #
##############################

if ( $withenableuuid )
{

    # filter disks whos vms did not have the "disk.enableUUID" Parameter
    Write-Host "Filter for parameter"
    $diskstoprocess = $diskswithwantedvmnamesanduuidsetting

}
else
{

    # use all disks regardless of the disk.enableUUID parameter
    $diskstoprocess = $diskswithwantedvmnamesanduuidsetting | Where-Object -FilterScript { $_.diskenableUUID -eq $null}

}

##############################
##############################

# line break for a nice looking output
Write-Host

# get reference to the virtual disk manager
$vdm = Get-View -Id (Get-View ServiceInstance).Content.VirtualDiskManager

# get reference to the datacenter
$mydc = Get-Datacenter

# loop through all diskswithduplicateduuids
foreach ($singledisk in $diskstoprocess)
{
    
    # get the vm object of this disk
    $myvm = $singledisk.Parent

    # get the disk object
    $mydisk = $myvm | Get-HardDisk -Name $singledisk.Name
    Write-Host ("Process " + $myvm.Name + " Disk: " + $singledisk.Name) -ForegroundColor Cyan

    # get the old uuid and split it by "-"
    $oldguid = $mydisk.ExtensionData.Backing.Uuid
    Write-Host "Oldguid is: $oldguid"
    $oldguidsplit = $oldguid.Split("-")

    # create new guid and split it by "-"
    $tempguid = ([guid]::NewGuid()).Guid
    $tempguidsplit = $tempguid.Split("-")

    # fill the tempguid with the first 3 values of the oldguid
    $tempguidsplit[0] = $oldguidsplit[0]
    $tempguidsplit[1] = $oldguidsplit[1]
    $tempguidsplit[2] = $oldguidsplit[2]

    # concatenate to a complete guid
    $tempguid = $tempguidsplit -join "-"

    # check if this tempguid is already in use by another disk
    if ( $alldisks.ExtensionData.Backing.Uuid -contains $tempguid )
    {

        Write-Host "Newly generated GUID is already in use!!!"
        exit

    }

    # convert guid to uuid
    # first cut out all "-". Then replace every 2nd character with the 2nd character and add a " ". Then trim the last " ".
    $tempuuid = ($tempguid.Replace("-",'') -replace '(..)','$1 ').Trim(" ")
    
    # add a "-" in the middle
    $tempuuid = $tempuuid.Substring(0,23) + "-" + $tempuuid.Substring(24,23)

    # get the old uuid
    $olduuid =  $vdm.QueryVirtualDiskUuid($mydisk.ExtensionData.Backing.FileName,$mydc.Id)
    
    # get a fresh vm object, otherwise the notes field would be overwritten if multiple disks has to be changed
    $myvm = Get-VM -Id $myvm.Id

    ########################
    # change the disk uuid #
    ########################
    if ( $makechanges )
    {

        Write-Host "Will set this diskUUID: $tempuuid"
        Set-VM -VM $myvm -Notes ($myvm.Notes + "`n" + $mydisk.Name + ": Old UUID = " + $olduuid) -Confirm:$false
        $vdm.SetVirtualDiskUuid($mydisk.ExtensionData.Backing.FileName,$mydc.Id,$tempuuid)

    }

    ########################
    ########################

    # check if the vm runs in a host cluster and the cluster has more than one host
    if ( $myvm.VMHost.Parent.GetType().Name -eq "ClusterImpl" -and $myvm.VMHost.Parent.ExtensionData.Host.Count -ne 1 )
    {

        # get another host in the cluster but not the host the vm is actual running
        $targetvmhost = $myvm.VMHost.Parent | Get-VMHost | Where-Object -FilterScript { $_.Name -ne $myvm.VMHost.Name } | Get-Random

        if ( $makechanges )
        {

            # move the vm to the target host
            Write-Host ("Move VM " + $myvm.Name + " to host " + $targetvmhost.Name)
            Move-VM -VM $myvm -Destination $targetvmhost

            # wait 30 seconds to grant drs time to rebalancing the cluster
            Write-Host "wait 30 seconds to grant drs time to rebalancing the cluster..."
            Start-Sleep -Seconds 30

        }

    }

    # write empty line
    Write-Host
}

2 Comments

  1. Great post and just what I was looking for!

    I do have 1 problem with running the script – Line 121 setting the $olduuid variable gets the following error

    Cannot convert argument “datacenter”, with value: “System.Object[]”, for “QueryVirtualDiskUuid” to type
    “VMware.Vim.ManagedObjectReference”: “Cannot convert the “System.Object[]” value of type “System.Object[]” to type
    “VMware.Vim.ManagedObjectReference”.”
    At C:\scripts\Duplicate-Disk-UID.ps1:122 char:5
    + $olduuid = $vdm.QueryVirtualDiskUuid($mydisk.ExtensionData.Backin …
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [], MethodException
    + FullyQualifiedErrorId : MethodArgumentConversionInvalidCastArgument

    SysAdminKC
    1. I think the problem is that you have more than one datacenter in your environment. The error says in variable $mydc you have an array (System.Object[]) instead of a single object reference.

      Look into line 74 where $mydc is created. The command “Get-Datacenter” most likely will give you two or more objects of type datacenter in your environment.

      I did not consider the fact that there are more than one datacenter in my script. You will have to query the datacenter for each vm you process.

      My suggestion is to insert the following in line 82:
      $mydc = Get-Datacenter -VM $myvm

      milla

Leave a Reply

Your email address will not be published.

+ ten = nineteen