Cleanup Failed Tasks in SDDC Manager

I was chatting with my colleague Paudie O’Riordan yesterday about PowerVCF as he was doing some testing internally and he mentioned that a great addition would be to have the ability to find, and cleanup failed tasks in SDDC Manager. Some use cases for this would be, cleaning up an environment before handing it off to a customer, or before recording a demo etc.

Currently there isnt a supported public API to delete a failed task so you have to run a curl command on SDDC Manager with the task ID. So getting a list of failed tasks and then running a command to delete each one can take time. See Martin Gustafson’s post on how to do it manually here.

I took a look at our existing code for retrieving tasks (and discovered a bug in the logic that is now fixed in PowerVCF 2.1.5!) and we have the ability to specify -status. So requesting a list of tasks with -status “failed” returns a list. So i put the script below together to retrieve a list of failed tasks, loop through them and delete them. The script requires the following inputs

  • SDDC Manager FQDN. This is the target that is queried for failed tasks
  • SDDC Manager API User. This is the user that is used to query for failed tasks. Must have the SDDC Manager ADMIN role
  • Password for the above user
  • Password for the SDDC Manager appliance vcf user. This is used to run the task deletion. This is not tracked in the credentials DB so we need to pass it.

Once the above variables are populated the script does the following:

  • Checks for PowerVCF (minimum version 2.1.5) and installs if not present
  • Requests an API token from SDDC Manager
  • Queries SDDC Manager for the management domain vCenter Server details
  • Uses the management domain vCenter Server details to retrieve the SDDC Manager VM name
  • Queries SDDC Manager for a list of tasks in a failed state
  • Loops through the list of failed tasks and deletes them from SDDC Manager
  • Verifies the task is no longer present

Here is the script. It is also published here if you would like to enhance it

# Script to cleanup failed tasks in SDDC Manager
# Written by Brian O'Connell - Staff Solutions Architect @ VMware

#User Variables
# SDDC Manager FQDN. This is the target that is queried for failed tasks
$sddcManagerFQDN = "lax-vcf01.lax.rainpole.io"
# SDDC Manager API User. This is the user that is used to query for failed tasks. Must have the SDDC Manager ADMIN role
$sddcManagerAPIUser = "administrator@vsphere.local"
$sddcManagerAPIPassword = "VMw@re1!"
# Password for the SDDC Manager appliance vcf user. This is used to run the task deletion
$sddcManagerVCFPassword = "VMw@re1!"



# DO NOT CHANGE ANYTHING BELOW THIS LINE
#########################################

# Set TLS to 1.2 to avoid certificate mismatch errors
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

# Install PowerVCF if not already installed
if (!(Get-InstalledModule -name PowerVCF -MinimumVersion 2.1.5 -ErrorAction SilentlyContinue)) {
    Install-Module -Name PowerVCF -MinimumVersion 2.1.5 -Force
}

# Request a VCF Token using PowerVCF
Request-VCFToken -fqdn $sddcManagerFQDN -username $sddcManagerAPIUser -password $sddcManagerAPIPassword

# Disconnect all connected vCenters to ensure only the desired vCenter is available
if ($defaultviservers) {
    $server = $defaultviservers.Name
    foreach ($server in $defaultviservers) {            
        Disconnect-VIServer -Server $server -Confirm:$False
    }
}

# Retrieve the Management Domain vCenter Server FQDN
$vcenterFQDN = ((Get-VCFWorkloadDomain | where-object {$_.type -eq "MANAGEMENT"}).vcenters.fqdn)
$vcenterUser = (Get-VCFCredential -resourceType "PSC").username
$vcenterPassword = (Get-VCFCredential -resourceType "PSC").password

# Retrieve SDDC Manager VM Name
if ($vcenterFQDN) {
    Write-Output "Getting SDDC Manager Manager VM Name"
    Connect-VIServer -server $vcenterFQDN -user $vcenterUser -password $vcenterPassword | Out-Null
    $sddcmVMName = ((Get-VM * | Where-Object {$_.Guest.Hostname -eq $sddcManagerFQDN}).Name)              
}

# Retrieve a list of failed tasks
$failedTaskIDs = @()
$ids = (Get-VCFTask -status "Failed").id
Foreach ($id in $ids) {
    $failedTaskIDs += ,$id
}
# Cleanup the failed tasks
Foreach ($taskID in $failedTaskIDs) {
    $scriptCommand = "curl -X DELETE 127.0.0.1/tasks/registrations/$taskID"
    Write-Output "Deleting Failed Task ID $taskID"
    $output = Invoke-VMScript -ScriptText $scriptCommand -vm $sddcmVMName -GuestUser "vcf" -GuestPassword $sddcManagerVCFPassword

# Verify the task was deleted    
    Try {
    $verifyTaskDeleted = (Get-VCFTask -id $taskID)
    if ($verifyTaskDeleted -eq "Task ID Not Found") {
        Write-Output "Task ID $taskID Deleted Successfully"
    }
}
    catch {
        Write-Error "Something went wrong. Please check your SDDC Manager state"
    }
}
Disconnect-VIServer -server $vcenterFQDN -Confirm:$False

As always, comments/feedback welcome!

Part 1: Working With the SRM VAMI API : Retrieving a Session ID

I’ve recently been doing a lot of work with VMware Site Recovery Manager (SRM) and vSphere Replication (vSR) with VMware Cloud Foundation. Earlier this year we (Ken Gould & I) published an early access design for Site Protection & Recovery for VMware Cloud Foundation 4.2. We have been working to refresh & enhance this design for a new release. Part of this effort includes trying to add some automation to assist with the manual steps to speed up time to deploy. SRM & vSR do not have publicly documented VAMI APIs so we set about trying to automate the configuration with a little bit of reverse engineering.

As with most APIs, whether public or private, you must authenticate before you can run an API workflow, so the first task is figuring out how the authentication to perform a workflow works. Typically, if you hit F12 in your browser you will get a developer console that exposes what goes on behind the scenes in a browser session. So to inspect the process, use the browser to perform a manual login, and review the header & response tabs in the developer view. This exposes the Request URL to use, the method (POST) and the required headers (accept: application/json)

The Response tab shows a sessionId which can be used for further configuration API calls in the headers as dr.config.service.sessionid

So with the above information you can use an API client like Postman to retrieve a sessionId with the URL & headers like this

And your VAMI admin user and password in JSON format in the body payload

You can also use the information to retrieve a sessionId using PowerShell

$headers = @{"Content-Type" = "application/json"}
$uri = "https://sfo-m01-srm01.sfo.rainpole.io:5480/configure/requestHandlers/login"
$body = '{"username": "admin","password": "mypassword"}'

$request = Invoke-RestMethod -Method POST -Uri $uri -Headers $headers -body $body

$sessionID = $request.data.sessionId

$sessionId

Keep an eye out for additional posts where we will use the sessionId to perform API based tasks

Checking Password Expiry For VMware Cloud Foundation Management Components

Within a VMware Cloud Foundation instance, SDDC Manager is used to manage the lifecycle of passwords (or credentials). While we provide the ability to rotate (either scheduled or manually) currently there is no easy way to check when a particular password is due to expire, which can lead to appliance root passwords expiring, which will cause all sorts of issues. The ability to monitor expiry is something that is being worked on, but as a stop gap I put together the script below which leverages PowerVCF and also a currently undocumented API for validating credentials.

The script has a function called Get-VCFPasswordExpiry that accepts the following parameters

  • -fqdn (FQDN of the SDDC Manager)
  • -username (SDDC Manager Username – Must have the ADMIN role)
  • -password (SDDC Manager password)
  • -resourceType (Optional parameter to specify a resourceType. If not passed, all resources will be checked. If passed (e.g. VCENTER) then only that resourceType will be checked. Supported resource types are

PowerVCF is a requirement. If you dont already have it run the following

Install-Module -Name PowerVCF

The code takes a while to run as it needs to do the following to check password expiry

  • Connect to SDDC Manager to retrieve an API token
  • Retrieve a list of all credentials
  • Using the resourceID of each credential
    • Perform a credential validation
    • Wait for the validation to complete
    • Parse the results for the expiry details
    • Add all the results to an array and present in a table (Kudos to Ken Gould for assistance with the presentation of this piece!)

In this example script I am returning all non SERVICE user accounts regardless of expiry (SERVICE account passwords are system managed). You could get more granular by adding something like this to only display accounts with passwords due to expire in less than 14 days

if ($validationTaskResponse.validationChecks.passwordDetails.numberOfDaysToExpiry -lt 14) {
               Write-Output "Password for username $($validationTaskResponse.validationChecks.username) expires in $($validationTaskResponse.validationChecks.passwordDetails.numberOfDaysToExpiry) days"
           }

Here is the script content. As always feedback is welcome. Also posted in Github here if anyone wants to fork and improve https://github.com/LifeOfBrianOC/Get-VCFPasswordExpiry

# Script to check the password expiry of VMware Cloud Foundation Credentials
# Written by Brian O'Connell - VMware

#User Variables
$sddcManagerFQDN = "sfo-vcf01.sfo.rainpole.io"
$sddcManagerAdminUser = "administrator@vsphere.local"
$sddcManagerAdminPassword = "VMw@re1!"

# Requires PowerVCF Module
#Requires -Module PowerVCF

Function Get-VCFPasswordExpiry
{

    Param (
        [Parameter (Mandatory = $true)] [ValidateNotNullOrEmpty()] [String]$fqdn,
        [Parameter (Mandatory = $true)] [ValidateNotNullOrEmpty()] [String]$username,
        [Parameter (Mandatory = $true)] [ValidateNotNullOrEmpty()] [String]$password,
        [Parameter (Mandatory = $false)] [ValidateSet("VCENTER", "PSC", "ESXI", "BACKUP", "NSXT_MANAGER", "NSXT_EDGE", "VRSLCM", "WSA", "VROPS", "VRLI", "VRA")] [ValidateNotNullOrEmpty()] [String]$resourceType
    )
# Request an SDDC manager Token
Request-VCFToken -fqdn $fqdn -username $username -password $password
# Build the required headers
$credentialheaders = @{"Content-Type" = "application/json"}
$credentialheaders.Add("Authorization", "Bearer $accessToken")
# Get all credential objects that are not type SERVICE
if (!$PsBoundParameters.ContainsKey("resourceType")) {
$credentials = Get-VCFCredential | where-object {$_.accountType -ne "SERVICE"}
}
else {
    $credentials = Get-VCFCredential -resourceType $resourceType | where-object {$_.accountType -ne "SERVICE"}
}
$validationArray = @()
Foreach ($credential in $credentials) {
    $resourceType = $credential.resource.resourceType
    $resourceID = $credential.resource.resourceId
    $username = $credential.username
    $credentialType = $credential.credentialType
    $body = '[
    {
        "resourceType": "'+$resourceType+'",
        "resourceId": "'+$resourceID+'",
        "credentials": [
            {
                "username": "'+$username+'",
                "credentialType": "'+$credentialType+'"
            }
        ]
    }
]'
    $uri = "https://$sddcManagerFQDN/v1/credentials/validations"
    # Submit a credential validation request
            $response = Invoke-RestMethod -Method POST -URI $uri -headers $credentialheaders -body $body
            $validationTaskId = $response.id

            Do {
                # Keep checking until executionStatus is not IN_PROGRESS
                $validationTaskuri = "https://$sddcManagerFQDN/v1/credentials/validations/$validationTaskId"
                $validationTaskResponse = Invoke-RestMethod -Method GET -URI $validationTaskuri -headers $credentialheaders
            }
            While ($validationTaskResponse.executionStatus -eq "IN_PROGRESS")
            # Build the output
            $validationObject = New-Object -TypeName psobject
            $validationObject | Add-Member -notepropertyname 'Resource Name' -notepropertyvalue $validationTaskResponse.validationChecks.resourceName
            $validationObject | Add-Member -notepropertyname 'Username' -notepropertyvalue $validationTaskResponse.validationChecks.username
            $validationObject | Add-Member -notepropertyname 'Number Of Days To Expiry' -notepropertyvalue $validationTaskResponse.validationChecks.passwordDetails.numberOfDaysToExpiry
            
            Write-Output "Checking Password Expiry for username $($validationTaskResponse.validationChecks.username) from resource $($validationTaskResponse.validationChecks.resourceName)"
            # Add each credential result to the array
            $validationArray += $validationObject
           #break
}
# Print the array
$validationArray
}

# Run the function
Get-VCFPasswordExpiry -fqdn $sddcManagerFQDN -username $sddcManagerAdminUser -password $sddcManagerAdminPassword

# Run the function with resourceType VCENTER
# Get-VCFPasswordExpiry -fqdn $sddcManagerFQDN -username $sddcManagerAdminUser -password $sddcManagerAdminPassword -resourceType VCENTER

Here is a screenshot of the result

Where Are My VMware Cloud Foundation Logs?

From time to time we all need to look at logs, whether its a failed operation or to trace who did what when. In VMware Cloud Foundation there are many different logs, each one serving a different purpose. Its not always clear which log you should look at for each operation so here is a useful reference table.

Log TypeVM Locationlog Location
BringUpCloud Builder

JSON Generator – /opt/vmware/sddc-support/cloud_admin_tools/logs/JsonGenerator.log

Platform Audit – /opt/vmware/sddc-support/cloud_admin_tools/logs/PlatformAudit.log

Bringup – /var/log/vmware/vcf/bringup/vcf-bringup-debug.log
LicensingSDDC Manager/var/log/vmware/vcf/operationsmanager/operationsmanager.log
Network PoolSDDC Manager/var/log/vmware/vcf/commonsvcs/vcf-commonsvcs.log
Host Commission/DecommissionSDDC Manager/var/log/vmware/vcf/operationsmanager/operationsmanager.log
VI (WLD domain)SDDC Manager/var/log/vmware/vcf/domainmanager/domainmanager.log
vRLISDDC Manager/var/log/vmware/vcf/domainmanager/domainmanager.log
/var/log/vmware/vcf/commonsvcs/vcf-commonsvcs.log
vROPSSDDC Manager/var/log/vmware/vcf/domainmanager/domainmanager.log
/var/log/vmware/vcf/commonsvcs/vcf-commonsvcs.log
vRASDDC Manager/var/log/vmware/vcf/domainmanager/domainmanager.log
/var/log/vmware/vcf/commonsvcs/vcf-commonsvcs.log
vRSLCMSDDC Manager/var/log/vmware/vcf/domainmanager/domainmanager.log
/var/log/vmware/vcf/commonsvcs/vcf-commonsvcs.log
Upgrade: /var/log/vmware/vcf/lcm/lcm.log
 
vRSLCM/var/log/vrlcm/vmware_vrlcm.log
LCMSDDC Manager/var/log/vmware/vcf/lcm/lcm.log
API LoginSDDC Manager/var/log/vmware/vcf/commonsvcs/vcf-commonsvcs.log
SoSSDDC Manager/var/log/vmware/vcf/sddc-support/vcf-sos-svcs.log
Certificate OperationsSDDC Manager/var/log/vmware/vcf/operationsmanager/operationsmanager.log

vRealize Suite Lifecycle Manager Logs: The Easy Way

vRealize Suite Lifecycle Manager (vRSLCM) is a one stop shop for lifecycle management  (LCM) of your VMware vRealize Suite (vRA, vRB, vROPs, vRLI) . VMware Validate Designs leverages this via Cloud Builder for initial SDDC deployment but it also covers upgrade from a single interface, reducing the need to jump between interfaces by bringing all LCM tasks into a single UI. This doesn’t come without its challenges however, as vRSLCM is now responsible for aggregating all the install/upgrade logs and presenting them in a coherent manner to the user…which isn’t always the case. vRSLCM logs activity in /var/log/vlcm/vrlcm-server.log but at best you get something like this

GET http://localhost:8080/suite/status/1c4a2929-e09c-4a22-b9f1-2834ec1bd65c: 200 null

Which let’s face it isnt very helpful…or is it? At first glance its just a job ID but thanks to @leahy_s in VMware CMBU I can now make this job ID give me more information in a much more structured way, similar to tail -f. Here’s how

And now you should have some readable JSON, hopefully with some more info on the error you are hitting

 

Managing VMs via the ESXi command line

From time to time a host may be unmanageable from vCenter / web client or you may only have console access. In my case I was bringing up a Dell EMC VxRail. During initial bringup the ESXi hosts do not get a mgmt IP if you do not have DHCP available so management with the web client is not possible. I do have iDRAC access though so can access the console. I needed to see where the VxRail manager VM was running as it comes up during an election process between the hosts. With console access it is still possible to manage VMs using esxcli.

To discover all VMs on a host run the following

  • vim-cmd vmsvc/getallvms

Once you have the output you can use the Vmid to manipulate the powerstate of a VM

  • vim-cmd vmsvc/power.get 2

In my case the VM i wanted was powered off. You can run the following to power it on

  • vim-cmd vmsvc/power.on 2

 

And there you have it. Simple VM management using vim-cmd. Explore what else you can leverage it for here

Quick Fix: The trust relationship between this workstation and the primary domain failed…

We’ve all been there…attempt to open an RDP session to a VM you haven’t connected to in a while and you see the message above! Traditionally the fix for this was to log on as a local admin user, remove the VM from the AD domain (add to workgroup), reboot, log in again, add to AD domain, reboot….well here is a quicker way of resolving the issue with PowerShell.

Modify the username, password and domain controller FQDN and save the following as ResetDomainMembership.ps1 and run on the affected VM as a local administrator


$password = "Password123!" | ConvertTo-SecureString -asPlainText -Force
$username = "domain\administrator"
$credential = New-Object System.Management.Automation.PSCredential($username,$password)
Reset-ComputerMachinePassword -Server dc01.domain.local -credential $credential
shutdown -r -t 0

Tip: If you dont want to include the password in the script as this is a security concern you can use a Read-Host command to prompt the user for the password


$password = Read-Host -asSecureString "Please enter the password"

Cleanup failed requests in vRA UI

From time to time a request in vRA will fail for whatever reason. When this happens you will see the request status as failed on the requests tab. There is a greyed out delete button that for whatever reason cannot be used to delete the failed request even when logged in as a full tenant/iaas/cloud admin.

 

There are several reasons you may want to remove failed requests…maybe you may need to deliver a demo to the CIO on some new functionality and failures in the UI never look good…or maybe you just have mild OCD like me and like to cleanup any failures to restore the illusion of all being good with the world! 🙂 Whatever your reasons here is a procedure that you can use.

Disclaimer: I dont believe this procedure if fully supported by VMware so please proceed with caution.

  • SSH to your primary vRA appliance
  • Run the following to view the contents of /etc/vcac/server.xml
    • less /etc/vcac/server.xml
  • Look for the line with password= and copy everything between the “”. This password will allow you to connect to the vRA PostGres DB

  • Run the following command with the password from the above step
    • vcac-config prop-util -d –p “s2enc~K6RsAv5WGpoAt+qsnZPrKErxZ0kU1npeK/G5iMzyaWI=”
  • Next change to the postgres user
    • su postgres
  • Change to the postgres directory
    • cd /opt/vmware/vpostgres/current/bin
  • Connect to the vcac database
    • ./psql vcac -W
  • Enter the password from server.xml
  • vRA requests are store in the cat_request table. To enable us to delete a request we first need the request id. Query the cat_request table for your request ID using the requestnumber (In my case the offending failed requestnumber is 63, as seen in the first column in the screenshot above. replace with your requestnumber)
    • SELECT id,requestnumber FROM cat_request where requestnumber = ’63’;

vRA XaaS blueprint requests are referenced in 1 further table, cat_requestevent. This entriy must be deleted before you can delete the request.

  • Run the following commands to delete the request.
  • delete from cat_requestevent where request_id =’4dc74fc2-f855-4eb1-94d6-65481b702acd’;
  • delete from cat_request where id =’4dc74fc2-f855-4eb1-94d6-65481b702acd’;

The offending failed request should now be gone from the requests list in vRA!

Add “Press any key to continue..” to a PowerShell script

From time to time it is nice to have a “Press any key to continue..” break point in a script to allow the user to review the status of an operation or just to add a user interaction to acknowledge the completion of an operation. This is especially useful when using a menu based script (see here) where the script will revert back to the menu once an operation is complete making it difficult to see the status of an operation when it completes or any Write-Host messages that may have been displayed. To get around this I use the following PowerShell Function to insert a “Press any key to continue..” break point that will wait for the user to…you guessed it…press the any key! 🙂

I use then when using a PowerShell Menu (See more about that here). You can edit the text in the quotes on line 3 to suite your use case. In my case i am calling the Menu function on line 5 so that when a user presses a key it will revert to the script menu. Simples!


Function anyKey
{
Write-Host -NoNewline -Object 'Press any key to return to the main menu...' -ForegroundColor Yellow
$null = $Host.UI.RawUI.ReadKey('NoEcho,IncludeKeyDown')
Menu
}


Error restoring SRM placeholder VM

I’ve been doing some lab work this week staging a vSphere 6.0U1b with SRM 6.0 environment for some upgrade scenario testing and i hit an issue with SRM 6.0 that i had not seen before. When trying to restore the SRM placeholder VM for a protected VM I was getting the following error

No hosts with hardware version ‘7’ and datastore(s) “NFS02” which are powered on and not in maintenance mode are available

.srm-placeholder

Seemed like a pretty odd error given that my target host is 6.0 and it has the NFS02 datastore mounted. I checked all the obvious to ensure there were no host issues and then went on the KB hunt. Tried the solution outlined here to no avail https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2079084

Also tried this, again no joy. http://pubs.vmware.com/srm-55/index.jsp?topic=%2Fcom.vmware.srm.admin.doc%2FGUID-FE6A85EC-B44E-415A-9C5F-1E17BC846119.html

As a last ditch effort i tried rebooting the target ESXi host and that fixed the issue and I was then able to restore the placeholder VM and continue testing. Not sure on the root cause. This is a fully nested environment, using vSphere Replication & a VNX File appliance so it may just be environmental. Will update this post if i figure it out!

Onwards with testing!