We have a single on-premises VCSA 6.5 instance that recently ran into the certificate expiration detailed in this KB:
https://kb.vmware.com/s/article/76719
All the certificates have been regenerated using the certificate-tool via the CLI, and now show up as up-to-date using the one-liner in the above KB (they were all previously expired a week ago):
STORE MACHINE_SSL_CERT
Alias : __MACHINE_CERT
Not After : Aug 18 19:56:50 2022 GMT
STORE TRUSTED_ROOTS
Alias : 9bd7b30bcb1dcecfe2491a3e91fcd3dd756f347f
Not After : Aug 1 13:58:01 2028 GMT
Alias : c0af9d76ae9fab214298c6b11d4efb72f64b6c13
Not After : Aug 13 18:18:55 2030 GMT
Alias : ac50bb369ff7dce7e8c372b9b3e50f6e3aaaa528
Not After : Aug 13 18:20:03 2030 GMT
Alias : 3e816060d6322a45114eac30798edbf1a4a1397d
Not After : Aug 13 18:28:26 2030 GMT
Alias : 074ddc83baeea4c6588f3f11837ed4fc77b25220
Not After : Aug 13 19:21:38 2030 GMT
Alias : 4bbaf83d23a818f2e8122b60ca0edc6dabf76d7d
Not After : Aug 13 19:33:49 2030 GMT
STORE TRUSTED_ROOT_CRLS
Alias : a45f284d7b9325005381b1b14d3ac3c823e104c9
Alias : 4b3b32cf9bb0d212aa6551bdd97dd3aaf029dde5
Alias : 02c60981250d68d94e1fcd31c93d0c50ae26d531
Alias : c4df908ec94dc3b1b774ca4a8768acfdbee90e59
Alias : f65b7ab274c5d949e8e914101797260d9e40fd70
Alias : 84d8635a51db3a011bab257873555c6776381d37
STORE machine
Alias : machine
Not After : Aug 18 19:12:42 2022 GMT
STORE vsphere-webclient
Alias : vsphere-webclient
Not After : Aug 18 19:12:43 2022 GMT
STORE vpxd
Alias : vpxd
Not After : Aug 18 19:12:43 2022 GMT
STORE vpxd-extension
Alias : vpxd-extension
Not After : Aug 18 19:12:44 2022 GMT
STORE SMS
Alias : sms_self_signed
Not After : Aug 7 14:06:21 2028 GMT
STORE BACKUP_STORE
Alias : bkp___MACHINE_CERT
Not After : Aug 18 19:11:39 2022 GMT
Alias : bkp_machine
Not After : Aug 18 19:12:42 2022 GMT
Alias : bkp_vsphere-webclient
Not After : Aug 18 19:12:43 2022 GMT
Alias : bkp_vpxd
Not After : Aug 18 19:12:43 2022 GMT
Alias : bkp_vpxd-extension
Not After : Aug 18 19:12:44 2022 GMT
When I try to start all services now, it returns the following after ~5 minutes:
Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start vpxd-svcs, vapi-endpoint services. Error: Operation timed out
When using service-control to start just the vpxd-svcs service by itself, it returns the following error:
Perform start operation. vmon_profile=None, svc_names=['vmware-vpxd-svcs'], include_coreossvcs=False, include_leafossvcs=False
2020-08-18T21:10:50.484Z Service vpxd-svcs state STOPPED
Error executing start on service vpxd-svcs. Details {
"resolution": null,
"detail": [
{
"args": [
"vpxd-svcs"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vpxd-svcs'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
Service-control failed. Error {
"resolution": null,
"detail": [
{
"args": [
"vpxd-svcs"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vpxd-svcs'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
The web UI returns the following 503 error (which it has been returning since the certs expired):
503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000056033c080640] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)
Can anyone point me to what log files specifically I need to be looking at to diagnose this and figure out what keeps the service from starting? I've already covered the following:
- It's not a disk space / log rotation issue
- It's not the postgre DB (for which I found a few threads, but it's starting properly in our instance)
Our last resort is to simply wipe and reinstall VCSA, but I'd like to avoid it if this is possible to fix.