Veeam: Getting a list of VMs which are being backed up

If you need to audit your VMWare backups, this script will output a list of successes/failures for Veeam backups.

Any VM where the last backup ended in failure will show in red, and any VMs that are fine will show up in green. The script uses the local database on your Veeam server and doesn’t touch vCenter.

Here’s the script:

asnp "VeeamPSSnapIn" -ErrorAction SilentlyContinue

####################################################################
# Configuration
# vCenter server
$vcenter = "<vCenter Server>"
# To Exclude VMs from report add VM names to be excluded as follows
# $excludevms=@("vm1","vm2")
$excludevms=@()
####################################################################

$vcenterobj = Get-VBRServer -Name $vcenter

# Build hash table with excluded VMs
$excludedvms=@{}
foreach ($vm in $excludevms) {
    $excludedvms.Add($vm, "Excluded")
}

# Get a list of all VMs from vCenter and add to hash table, assume Unprotected
$vms=@{}
foreach ($vm in (Find-VBRObject -Server $vcenterobj | Where-Object {$_.Type -eq "VirtualMachine"}))  {
    if (!$excludedvms.ContainsKey($vm.Name)) {
	write-host -foregroundcolor white "Adding $vm"
        if(!$vms.ContainsKey($vm.Name)) {
		$vms.Add($vm.Name, "Unprotected")
	}
    }
}

# Find all backup job sessions that have ended in the last 24 hours
$vbrsessions = Get-VBRBackupSession | Where-Object {$_.JobType -eq "Backup" -and $_.EndTime -ge (Get-Date).addhours(-24)}

# Find all successfully backed up VMs in selected sessions (i.e. VMs not ending in failure) and update status to "Protected"
foreach ($session in $vbrsessions) {
    foreach ($vm in ($session.gettasksessions() | Where-Object {$_.Status -ne "Failed"} | ForEach-Object { $_ })) {
        if($vms.ContainsKey($vm.Name)) {
            $vms[$vm.Name]="Protected"
        }
    }
}

# Output VMs in color coded format based on status.
foreach ($vm in $vms.Keys)
{
  if ($vms[$vm] -eq "Protected") {
      write-host -foregroundcolor green "$vm is backed up"
  }
}

foreach ($vm in $vms.Keys)
{
  if ($vms[$vm] -eq "Unprotected") {
      write-host -foregroundcolor red "$vm is NOT backed up"
  }
}




Out with the old, in with the new. Upgrading to Nimble.

We had a pair of Dell Equallogic SANs that were getting quite long in the tooth. They worked well enough but performance was starting to become an issue. The ever increasing number of VMs and workload was proving too much. We eventually settled on a Nimble CS220 with two 10GigE interfaces. They were plumbed into a pair of Nexus 5548UPs for a total of 8 iSCSI paths from the VMWare cluster.

Dell PS4000 and PS6000

Dell PS4000 and PS6000

And the shiny new Nimble CS220G, serving storage for the blades in the C7000:

Nimble CS220G

Nimble CS220G with some spare disks

As with most SANs you can add a couple of disk shelves for extra storage, up to 3 for the CS220.

In terms of performance, we did some benchmarks with iometer and managed to get around 22,000 IOPS out of this SAN. Throughput was around 3-400MB/sec. This is using round robin across two 10GigE interfaces. We don’t need anywhere near this performance but it’s nice when vmotioning or cloning. We use jumbo frames but we only saw latency drop once we enabled it, overall performance boost wasn’t massive.

We also did a fair amount of failure testing. Yanking out controllers/disks/cables etc.. It takes about 11 seconds to failover if you remove a controller from the back. Nothing we did managed to break any VMs or make the file systems read only (Linux RHEL 5.x and 6.x). The only thing I didn’t do was disable auto support, which meant a bunch of tickets got created with Nimble when we did the testing. oops.

It’s also worth mentioning that if the management NICs lose connectivity, the SAN will failover to the standby controller. It’ll also shut down if it loses more than 2 disks to prevent data loss.

For data migration, all we really had to do was add the Nimble to the storage cluster and put the Equallogics into maintenance mode. DRS took care of the rest The only issue was a single ghost VM on one of the Dell SANs due to old snapshots.

The management GUI is nice and clean and doesn’t require Java. It uses flash for the performance graphs:

nimble-controller

Failover takes about 11 seconds

nimble-performance

Barely taxing the Nimble

For performance, Nimble recommend you use round robin for all volumes with IOPS per path set to 0. Instead of doing this for every volume, you can set the defaults like this:

# Set a rule to enable round-robin for all new Nimble volumes. This is persistent.
esxcli storage nmp satp rule add --psp=VMW_PSP_RR --satp=VMW_SATP_ALUA --vendor=Nimble

# Set multipathing policy to round robin for all Nimble storage and set IOPS per path to 0 
for i in `esxcli storage nmp device list | awk '/Nimble iSCSI Disk/{print $7}' | sed -e 's/(//' -e 's/)//'`; do esxcli storage nmp device set --device=$i --psp=VMW_PSP_RR; esxcli storage nmp psp roundrobin deviceconfig set --iops=0 --type=iops --device=$i; done;

# List eui for Nimble Volumes
esxcli storage nmp device list | awk '/Nimble iSCSI Disk/{print $7}'

# List the PSP/RR setting for each Nimble Volume - an extension of the command above, allows you to easily check if the IOPS=0 option has been applied to all Nimble Volumes.
for i in `esxcli storage nmp device list | awk '/Nimble iSCSI Disk/{print $7}' | sed -e 's/(//' -e 's/)//'`; do esxcli storage nmp psp roundrobin deviceconfig get --device=$i; done;

VMWare Ghost VM on Datastore

We had an issue recently where we’d migrated a bunch of virtual machines off an old Equallogic SAN, but still had one VM showing up on the old datastore, even though it didn’t even have a folder on there.

After some research it turns out it was an issue with snapshots. The VM had a single snapshot which was originally created on this datastore. Once the snapshot had been deleted, the VM no longer showed up and we were able to unmount the datastore.

We were using VMWare 5.0 and although this is supposed to be supported, it didn’t work.

Patching Done Right

We moved offices recently, which I’ll do a write up for soon, and had the opportunity to redo the patching in our server room. My first choice was to use Neat Patch¬†but the lead time for delivery was far too long and we had to get this done as soon as possible.

I ended up going with a staggered switch layout. Start with a patch panel, then a switch, then patch panel and so on. This means you can use small 0.3m cables to do the patching and you don’t need any sort of cable management taking up valuable rack space. This was the horror that we dealt with previously. You can see we tried to make it somewhat neat but it just doesn’t scale.

Before

Before

After

After

With this config, we pre-patch every single port and put it on a guest VLAN with limited access.

Sysadmin Christmas Tree

Had some spare kit sitting around so I built a tree:

sysadmin_xmas_tree