HashiCorp Terraform has become an industry standard, infrastructure-as-code & desired-state configuration tool for managing on-premises and cloud-based entities. If you are not familiar with Terraform, I’ve covered some early general learnings on Terraform in some posts here & here. The internal engineering team are working on a Terraform provider for VCF, so I decided to give it a spin to review its capabilities & test drive it in the lab.
First off what VCF operations is the Provider capable of supporting today:
Deploying a new VCF instance (bring-up)
Commissioning hosts
Creating network pools
Deploying a new VI Workload domain
Creating clusters
Expanding clusters
Adding users
New functionality is being added every week, and as with all new initiatives like this, customer consumption and adoption will drive innovation and progress.
The GitHub repo contains some great example files to get you started. I am going to do a few blog posts on what I’ve learned so far but for now, here are the important links you need if you would like to take a look at the provider
Before you can deploy a vSphere Lifecycle Manager (vLCM) image based cluster in VMware Cloud Foundation, you must first import an image into the Image Management Inventory in SDDC Manager. You can do this via the SDDC Manager UI for a pre existing cluster.
Or you can now use PowerVCF to import the image thanks to the addition of New-VCFPersonality (vLCM images are known as personalities in VCF hence the name of the cmdlet).
The sequence of events to be able to import an image is as follows:
Extract a vLCM image from a host that you wish to use in the workload domain. The host doesn’t need to be in the vCenter or SDDC Manager inventory
Create a temporary cluster in vCenter (must be created in a VCF workload domain) and assign the image from the previous step.
Import the image from the source cluster into SDDC Manager
To achieve step 1 we can use PowerCLI
# Variables
$sourceHostUrl = "https://sfo01-w01-esx01.sfo.rainpole.io"
$sourceHostBuild = "21495797"
$sourceHostRootPassword = "VMw@re1!"
$vcenterFQDN = "sfo-m01-vc01.sfo.rainpole.io"
$ssoUsername = "administrator@vsphere.local"
$ssoPassword = "VMw@re1!"
$vcenterDC = "sfo-m01-dc01"
$sddcManagerFQDN = "sfo-vcf01.sfo.rainpole.io"
# Retrieve the source host thumbprint
$response = [System.Net.WebRequest]::Create($sourceHostUrl)
$response.GetResponse()
$cert = $response.ServicePoint.Certificate
$sourceHostThumbprint = $cert.GetCertHashString() -replace '(..(?!$))','$1:'
# Connect to vCenter and import the image from the source host to the depot
connect-viserver -server $vcenterFQDN -user $vcenterUsername -password $vcenterPassword
$OfflineHostCredentials = Initialize-SettingsDepotsOfflineHostCredentials -HostName $sourceHostUrl -UserName "root" -Password $sourceHostRootPassword -Port 443 -SslThumbPrint $sourceHostThumbprint
$OfflineConnectionSpec = Initialize-SettingsDepotsOfflineConnectionSpec -AuthType "USERNAME_PASSWORD" -HostCredential $OfflineHostCredentials
Invoke-CreateFromHostDepotsOfflineAsync -SettingsDepotsOfflineConnectionSpec $SettingsDepotsOfflineConnectionSpec
# Create a temporary cluster and assign the image
$LcmImage = Get-LcmImage -Type BaseImage | where {$_.Version -match $sourceHostBuild}
$clusterID = (New-Cluster -Location $vcenterDC -Name 'vLCM-Cluster' -HAEnabled -DrsEnabled -BaseImage $LcmImage).ExtensionData.MoRef.Value
# Import the image to SDDDC Manager
Request-VCFToken -fqdn $sddcManagerFQDN -username $ssoUsername -password $ssoPassword
$vCenterID = (Get-VCFvCEnter | where {$_.fqdn -match $vcenterFQDN}).id
New-VCFPesonality -name "21495797" -vCenterId $vCenterID -clusterId $clusterID
That should import the new image into the SDDC Manager image repo for use creating a vLCM image based workload domain.
Since the introduction of subscription based licensing for VMware Cloud Foundation (VCF+) there are now 2 licensing modes in VCF (Perpetual or Subscription). To make it easier to identify the subscription status of the system and each workload domain we have added support for Get-VCFLicenseMode into the latest release of PowerVCF 2.3.0.1004.
Historically with VMware Cloud Foundation (VCF), the vRealize Suite (or the following list of products) have been part of the VCF bill of materials (BOM) and their versions tightly linked to a specific VCF release:
vRealize Suite Lifecycle Manager (vRSLCM)
Workspace ONE Access (WS1A)
vRealize Operations (vROPs)
vRealize Log Insight (vRLI)
vRealize Automation (vRA)
All of the above products were deployed by SDDC Manager using VCF install bundles that were downloaded via SDDC Manager. Their versions were static entries on the bill of materials (BOM) for each VCF release and you couldn’t upgrade to a higher version (to pick up a new feature etc) until the next major VCF release that included the version you were looking for. This clearly wasn’t working for our customer base who need to be more agile with how they manage their private & hybrid cloud infrastructure.
So to help with this we introduced Flexible vRealize Suite product upgrades. See the snippet from the VCF 4.4 release notes below
So what does this actually mean? Well for one, vRSLCM is now the only bundle that is distributed via SDDC Manager. All other products are downloaded natively in vRSLCM. vRSLCM product support packs (PSPAKs) then enable you to upgrade to newer versions of the vRealize Suite. vRSLCM manages the supported matrix to ensure you cant deploy an untested combination, however to determine what support pack is required to enable the various upgrade paths you need to review multiple sets of release notes. In an attempt to help drive clarity a new VMware KB article has been created KB88829, this can be used to work out the upgrade path for vRealize Suite Lifecycle Manager that is needed to then be able to deploy a later release of vRealize Suite product on top of VMware Cloud Foundation.
I was chatting with my colleague Paudie O’Riordan yesterday about PowerVCF as he was doing some testing internally and he mentioned that a great addition would be to have the ability to find, and cleanup failed tasks in SDDC Manager. Some use cases for this would be, cleaning up an environment before handing it off to a customer, or before recording a demo etc.
Currently there isnt a supported public API to delete a failed task so you have to run a curl command on SDDC Manager with the task ID. So getting a list of failed tasks and then running a command to delete each one can take time. See Martin Gustafson’s post on how to do it manually here.
I took a look at our existing code for retrieving tasks (and discovered a bug in the logic that is now fixed in PowerVCF 2.1.5!) and we have the ability to specify -status. So requesting a list of tasks with -status “failed” returns a list. So i put the script below together to retrieve a list of failed tasks, loop through them and delete them. The script requires the following inputs
SDDC Manager FQDN. This is the target that is queried for failed tasks
SDDC Manager API User. This is the user that is used to query for failed tasks. Must have the SDDC Manager ADMIN role
Password for the above user
Password for the SDDC Manager appliance vcf user. This is used to run the task deletion. This is not tracked in the credentials DB so we need to pass it.
Once the above variables are populated the script does the following:
Checks for PowerVCF (minimum version 2.1.5) and installs if not present
Requests an API token from SDDC Manager
Queries SDDC Manager for the management domain vCenter Server details
Uses the management domain vCenter Server details to retrieve the SDDC Manager VM name
Queries SDDC Manager for a list of tasks in a failed state
Loops through the list of failed tasks and deletes them from SDDC Manager
Verifies the task is no longer present
Here is the script. It is also published here if you would like to enhance it
# Script to cleanup failed tasks in SDDC Manager
# Written by Brian O'Connell - Staff Solutions Architect @ VMware
#User Variables
# SDDC Manager FQDN. This is the target that is queried for failed tasks
$sddcManagerFQDN = "lax-vcf01.lax.rainpole.io"
# SDDC Manager API User. This is the user that is used to query for failed tasks. Must have the SDDC Manager ADMIN role
$sddcManagerAPIUser = "administrator@vsphere.local"
$sddcManagerAPIPassword = "VMw@re1!"
# Password for the SDDC Manager appliance vcf user. This is used to run the task deletion
$sddcManagerVCFPassword = "VMw@re1!"
# DO NOT CHANGE ANYTHING BELOW THIS LINE
#########################################
# Set TLS to 1.2 to avoid certificate mismatch errors
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
# Install PowerVCF if not already installed
if (!(Get-InstalledModule -name PowerVCF -MinimumVersion 2.1.5 -ErrorAction SilentlyContinue)) {
Install-Module -Name PowerVCF -MinimumVersion 2.1.5 -Force
}
# Request a VCF Token using PowerVCF
Request-VCFToken -fqdn $sddcManagerFQDN -username $sddcManagerAPIUser -password $sddcManagerAPIPassword
# Disconnect all connected vCenters to ensure only the desired vCenter is available
if ($defaultviservers) {
$server = $defaultviservers.Name
foreach ($server in $defaultviservers) {
Disconnect-VIServer -Server $server -Confirm:$False
}
}
# Retrieve the Management Domain vCenter Server FQDN
$vcenterFQDN = ((Get-VCFWorkloadDomain | where-object {$_.type -eq "MANAGEMENT"}).vcenters.fqdn)
$vcenterUser = (Get-VCFCredential -resourceType "PSC").username
$vcenterPassword = (Get-VCFCredential -resourceType "PSC").password
# Retrieve SDDC Manager VM Name
if ($vcenterFQDN) {
Write-Output "Getting SDDC Manager Manager VM Name"
Connect-VIServer -server $vcenterFQDN -user $vcenterUser -password $vcenterPassword | Out-Null
$sddcmVMName = ((Get-VM * | Where-Object {$_.Guest.Hostname -eq $sddcManagerFQDN}).Name)
}
# Retrieve a list of failed tasks
$failedTaskIDs = @()
$ids = (Get-VCFTask -status "Failed").id
Foreach ($id in $ids) {
$failedTaskIDs += ,$id
}
# Cleanup the failed tasks
Foreach ($taskID in $failedTaskIDs) {
$scriptCommand = "curl -X DELETE 127.0.0.1/tasks/registrations/$taskID"
Write-Output "Deleting Failed Task ID $taskID"
$output = Invoke-VMScript -ScriptText $scriptCommand -vm $sddcmVMName -GuestUser "vcf" -GuestPassword $sddcManagerVCFPassword
# Verify the task was deleted
Try {
$verifyTaskDeleted = (Get-VCFTask -id $taskID)
if ($verifyTaskDeleted -eq "Task ID Not Found") {
Write-Output "Task ID $taskID Deleted Successfully"
}
}
catch {
Write-Error "Something went wrong. Please check your SDDC Manager state"
}
}
Disconnect-VIServer -server $vcenterFQDN -Confirm:$False
Along with the release of VMware Cloud Foundation 4.3.1, we are excited to announce the general availability of the Site Protection & Disaster Recovery for VMware Cloud Foundation Validated Solution. The solution documentation, intro and other associated collateral can be found on the Cloud Platform Tech Zone here.
The move from VMware Validated Designs to VMware Validated Solutions has been covered by my team mate Gary Blake in detail here so I wont go into that detail here. Instead I will concentrate on the work Ken Gould and I (along with a supporting team) have been working to deliver for the past few months.
The Site Protection & Disaster Recovery for VMware Cloud Foundation Validated Solution includes the following to deliver an end-to-end validated way to protect your mission critical applications. You get a set of documentation that is tailored to the solution that includes: design objectives, a detailed design including not just design decisions, but the justifications & implications of those decisions, detailed implementation steps with PowerShell alternatives for some steps to speed up time to deploy, operational guidance on how to use the solution once its deployed, solution interoperability between it and other Validated Solutions, an appendix containing all the solution design decisions in one easy place for review, and finally, a set of frequently asked questions that will be updated for each release.
Disaster recovery is a huge topic for everyone lately. Everything from power outages to natural disasters to ransomware and beyond can be classed as a disaster, and regardless of the type of disaster you must be prepared. To adequately plan for business continuity in the event of a disaster you must protect your mission critical applications so that they may be recovered. In a VMware Cloud Foundation environment, cloud operations and automation services are delivered by vRealize Lifecycle Manager, vRealize Operations Manager & vRealize Automation, with authentication services delivered by Workspace ONE Access.
To provide DR for our mission critical apps we leverage 2 VCF instances with NSX-T federation between them. The primary VCF instance runs the active NSX-T global manager and the recovery VCF instance runs the standby NSX-T global manager. All load balancing services are served from the protected instance, with a standby load balancer (disconnected from the recovery site NSX Tier-1 until required, to avoid IP conflicts) in the recovery instance. Using our included PowerShell cmdlets you can quickly create and configure the standby load balancer to mimic your active load balancer, saving you a ton of manual UI clicks.
In the (hopefully never) event of the need to failover the cloud management applications, you can easily bring the standby load balancer online to enable networking services for the failed over applications.
Using Site recovery Manager (SRM) you can run planned migrations or disaster recovery migrations. With a single set of SRM recovery plans, regardless of the scenario, you will be guided through the recovery process. In this post I will cover what happens in the event of a disaster.
When a disaster occurs on the protected site (once the panic subsides) there are a series of tasks you need to perform to bring those mission critical apps back online.
First? Fix the network! Log into the passive NSX Global Manager (GM) on the recovery site and promote the GM to Active. (Note: This can take about 10-15 mins)
To cover the case of an accidental “Force Active” click..we’ve built in the “Are you absolutely sure this is what you want to do?” prompt!
Once the promotion operation completes our standby NSX GM is now active, and can be used to manage the surviving site NSX Local Manager (LM)
Once the recovery site GM is active we need to ensure that the cross-instance NSX Tier-1 is now directing the egress traffic via the recovery site. To do this we must update the locations on the Tier-1. Navigate to GM> Tier-1 gateways > Cross Instance Tier-1. Under Locations, make the recovery location Primary.
The next step is to ensure we have an active load balancer running in the recovery site to ensure our protected applications come up correctly. To do this log into what is now our active GM, select the recovery site NSX Local Manager (LM), and navigate to Networking > Load Balancing. Edit the load balancer and attach it to the recovery site standalone Tier-1.
At this point we are ready to run our SRM recovery plans. The recommended order for running the recovery plans (assuming you have all of the protected components listed below) is as follows. This ensures lifecycle & authentication services (vRSLCM & WSA) are up before the applications that depend on them (vROPS & vRA)
vRSLCM – WSA – RP
Intelligent Operations Management RP
Private Cloud Automation RP
I’m not going to go through each recovery plan in detail here. They are documented in the Site Protection and Disaster Recovery Validated Solution. In some you will be prompted to verify this or that along the way to ensure successful failover.
The main thing in a DR situation is, DO NOT PANIC. And what is the best way to getting to a place where you DO NOT PANIC? Test your DR plans…so when you see this…
Your reaction is this…
Trust the plan…test the plan…relax…you have a plan!
Hopefully this post was useful..if you want to learn more please reach out in the comments…if you’re attending VMworld and would like to learn more or ask some questions, please drop into our Meet The Experts session on Thursday.
PowerVCF 2.1.1 is now available on the PowerShell Gallery here or from GitHub here. This release includes support for VMware Cloud Foundation 4.1 with some new and updated cmdlets. Highlights below
Category
cmdlet Name
Description
Comment
Users and Groups
Get-VCFCredential
Retrieves a list of credentials.
UPDATE: Added support for the vRealize Suite credentials
Bundles
Start-VCFBundleUpload
Starts upload of bundle to SDDC Manager
UPDATE: Allows the import of a bundle based on offline download.
Federation
New-VCFFederationInvite
Invite new member to VCF Federation
UPDATE: Added support to specify if the new system is a MEMBER or CONTROLLER.
SDDC
Start-CloudBuilderSDDCValidation
Starts validation on VMware Cloud Builder
UPDATE: Added support for individual validation tasks.
Workspace ONE Access
Get-VCFWSA
Get details of the existing Workspace ONE Access
NEW
vRealize Automation
Get-VCFvRA
Get details of the the existing vRealize Automation
NEW
vRealize Operations
Get-VCFvROPs
Get details of the existing vRealize Operations Manager
NEW
vRealize Operations
Set-VCFvROPs
Connect or disconnect Workload Domains to vRealize Operations Manager
NEW
vRealize Log Insight
Get-VCFvRLI
Get details of the existing vRealize Log Insight
NEW
vRealize Log Insight
Set-VCFvRLI
Connect or disconnect Workload Domains to vRealize Log Insight
NEW
vRealize Suite Lifecycle
Get-VCFvRSLCM
Get details of the existing vRealize Suite Lifecycle Manager
UPDATE: Fixed an issue with the API URI and addressed response output
Traditionally VMware Cloud Foundation (VCF) has followed the hybrid approach when it comes to SSL certificate management. Hybrid mode essentially means using CA signed certs for the vCenter Server machineSSL cert, and VMCA signed certs for the solution user certs. In this mode, ESXi host certs are VMCA managed also. You then have the option to integrate with an external Microsoft CA or continue to use VMCA for all certs. If you decide to integrate with a Microsoft CA, ESXi host certs remain VMCA managed. This is not always ideal as some customers require all components on the network to be signed by a known & trusted CA. Up until the recent 4.1 VMware Cloud Foundation (VCF) release it was not possible to use custom CA signed certs on your ESXi hosts, as hybrid mode would overwrite your CA signed ESXi certs with VMCA signed certs. There is a great blog post here on how to manually enable CA signed certs here but with VCF 4.1 it is now supported to do this via the API during bringup. The procedure is as follows:
Install the ESXi hosts that will be used for bringup with the ESXi version on the Bill Of Materials for 4.1
Install your custom CA signed certs on each host that will be used for the management domain
Log in to the ESXi Shell, either directly from the DCUI or from an SSH client, as a user with administrator privileges.
In the directory /etc/vmware/ssl, rename the existing certificates using the following commands.
mv rui.crt orig.rui.crt
mv rui.key orig.rui.key
Copy the certificates that you want to use to /etc/vmware/ssl.
Rename the new certificate and key to rui.crt and rui.key.
Restart the host management agents by running the following commands
Repeat the above steps for all management domain hosts
To ensure that SDDC Manager is aware that you are using custom certs you need to add a flag in the bringup json along with the PEM encoded signing chain certificate, so that it is added to the SDDC Manager keystore. This will ensure the certificates are trusted. The API guide for 4.1 provides an example json spec here. Pay particular attention to this section
You can then follow the steps outlined in the API guide to deploy the management domain using the Cloud Builder API. Note that once custom mode is enabled, all future workload domains that you create must also use signed certs.
From time to time we all need to look at logs, whether its a failed operation or to trace who did what when. In VMware Cloud Foundation there are many different logs, each one serving a different purpose. Its not always clear which log you should look at for each operation so here is a useful reference table.