How to replace a disk in vSAN with the error: «General vSAN error»

During holdays a disk broke in our lab, thats a great opportunity to review what we need to do when a disk fails in vSAN and how to fix an error when we try to remove the diskgroup.

Our platform:

  • Version: vSphere 8.0
  • Hosts: x3
  • Diskgroups: 1 x host
  • Disks: 1 NVMe disk for cache and 2 NVMe capacity disk

Some of the details covered in this articule only apply to traditional vSAN OSA architecture

Prerequisites

First we have to check if we can replace only the faulty disk or if we need to recreate the diskgroup completly. To decide what to do remember:

  • If we need to replace a cache disk we have to recreate the complete diskgroup always.
  • If encryption or deduplication is enabled we also need to recreate the diskgroup, there is no difference if the faulty disk is a cache disk or a capacity disk.
  • In other cases where the faulty disk is a capacity disk we could replace only the faulty one without recreating the diskgroup.

Let’s review some pre-requirements and requisites:

  • We need to be extra careful when changing a disk in RAID 0, its recommended to check vendor instructions before replacing the disk physically. Don’t forget its recoomended to configure disk in the controller as passthrough.
  • When replacing capacity disk it’s recommended to use the same model and size. If we cannot get the same size its recommended to use the same model with the next bigger size available. We need to be carefult with the balancing when using different disk sizes.
  • When we change any kind of disks (capacity or cache) its recommended to use disks with the same or better factor of “endurance” and performance”

Diskgroup rebuild

First we need to do is finding the faulty disk. To do that we enter in the cluster configuration and then into the vSAN disk administrator, here we can check which host has has the disk with errors.

Now we click on “Disks” in the host with the alarm.

In our case the missing disk is the cache, in means we need to remove the diskgroup and recreate it.

Click on the line of dots next to the diskgroup label to open options.

Click on “Remove”

If the diskgroup is removed without alarms we can proceed to the following point where we create the new diskgroup again. In our case we got an error “General vSAN Error”

Log in the host with the faulty disk by SSH, we can double check which disk is missing and the UUID of the diskgroup.

Here we can see we only have capacity disks in the list, its the same info we got from the GUI. Write down the UUID for the next step.

[root@ast-esxi01:~] esxcli vsan storage list
t10.NVMe____WDC_WDS200T2B0C2D00PXH0__________________D97806418B441B00
   Device: t10.NVMe____WDC_WDS200T2B0C2D00PXH0__________________D97806418B441B00
   Display Name: t10.NVMe____WDC_WDS200T2B0C2D00PXH0__________________D97806418B
   Is SSD: true
   VSAN UUID: 52389c3f-fa52-6905-39a7-c5adbfabcd9d
   VSAN Disk Group UUID: 52adc0bc-6971-fe30-4490-c89de109565e
   VSAN Disk Group Name:
   Used by this host: true
   In CMMDS: false
   On-disk format version: 17
   Deduplication: false
   Compression: false
   Checksum: 8683338899751340819
   Checksum OK: true
   Is Capacity Tier: true
   Encryption Metadata Checksum OK: true
   Encryption: false
   DiskKeyLoaded: false
   Is Mounted: true
   Creation Time: Mon Feb  7 13:20:14 2022

t10.NVMe____WDC_WDS200T2B0C2D00PXH0__________________53D306418B441B00
   Device: t10.NVMe____WDC_WDS200T2B0C2D00PXH0__________________53D306418B441B00
   Display Name: t10.NVMe____WDC_WDS200T2B0C2D00PXH0__________________53D306418B
   Is SSD: true
   VSAN UUID: 52e1b7f0-74c6-3ccb-c441-09d7faed25bf
   VSAN Disk Group UUID: 52adc0bc-6971-fe30-4490-c89de109565e
   VSAN Disk Group Name:
   Used by this host: true
   In CMMDS: false
   On-disk format version: 17
   Deduplication: false
   Compression: false
   Checksum: 3520512375592328882
   Checksum OK: true
   Is Capacity Tier: true
   Encryption Metadata Checksum OK: true
   Encryption: false
   DiskKeyLoaded: false
   Is Mounted: true
   Creation Time: Mon Feb  7 13:20:14 2022

Remove the diskgroup using the UUID from the last step.

[root@ast-esxi01:~] esxcli vsan storage remove -u 52adc0bc-6971-fe30-4490-c89de109565e

Using the GUI check the diskgroup has been removed. We should see that there are no disks in use.

We need to create a new diskgroup. Click on “Disks” and “Create diskgroup”

Doublecheck the diskgroup has been created OK and its healthy.

Now we should have 3 disks in use in green.

Last step is to take out the host from maintenance mode. Objects with broken components should start repairing automatically (60 min counter probably is already expired)

Object rebuild

We have a new working diskgroup, we should check all objects are now “healthy”

Click on the virtual object manager in the “Monitor” section of the cluster. If we have broken objects still to be repeared they will appear in red.

It’s recommended to wait until all objets are rebuilt automatically. This can be checked in the section “Resyncing objects”. If we have rebuilding tasks to complete you will see there how many GBs and the time to complete all tasks.

In the case not all objetcs are automatically rebuilt or if we want to force the resync immediately task we can do it in “Shyline Health” section. Do a test first to have updated information and select “vSAN objects status”, there you can click “Repair Objects Immediately”

We should have all components in green now

Extra: Recreate performance service

Sometimes if we wait too long to recover the performance service we cannot recover the missing components even if we force to repair the objects using Skyline Health. In this case the only thing we need to do is to deactivate the service and activate it again to rebuild the database,

Inside the configuration of the cluster go to vSAN Services and edit the performance service to do it. We can deactivate it or change the storage policy there.

There are version where we cannot deactivate it using the GUI, in these cases we need to do it using RVC

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *