Some time ago, I received a message from users, that they can’t create new services on TKG clusters. The new services had an unusual “pending” status. We didn’t make any major changes to the infrastructure, no upgrades etc…situation like: it worked before, now it doesn’t work. They will use the same method to deploy objects on cluster as before. So, I started looking into this case…
1. Overview of this problem
Let’s start with a simple yaml definition of LoadBalancer service.
Service is created, but it is in a <pending> state and don’t has any External-IP. External-IP is a IP address from the Frontend/VIP network. As you can see, different service running in the same namespace has attached IP. Other services in other namespaces has it too. The problem appears a few days ago. Moreover, developers can’t access to them applications using exposed IP or FQDNs.
After execute “kubectl describe” command on this service, we can see two Events.
Look at the Warning event:
Warning SyncLoadBalancerFailed 6s (x5 over 81s) service-controller Error syncing load balancer: failed to ensure load balancer: VirtualMachineService IP not found
What does it mean? It can be a few possibilities:
– wrong definition of the service,
– network/firewall rules problem,
– Frontend IP pool has been exhausted,
– sync problem in AKO between the gateway and the service object,
– incompatible Avi version with an existing vSphere with Tanzu infrastructure
– and maybe something more…;)
I checked all these things. Everything looks fine, no network problems, there are a lot of free IPs to use and Avi version is compatible. After that, I look into the most obvious place – Workload Management status.
There is a 99.99% chance that the answer to the existing problem here is – the certificate is invalid. Let’s see what it looks like in the Avi.
That’s correct, certificate has expired on 10.10.2023. This date is matching with an error from Workload Management and information from developers. So, the fix should be easy. We need to create a new certificate, replace it in Avi and in Supervisor configuration.
2. Create a new SSL/TLS certificate in Avi
1. Go to the Administration->Settings->Access Settings. Click pencil icon and find SSL/TLS Certificate. Delete it with an “X” and Save changes. After a while and a few “F5” clicks, new certs will be generated.
Before we proceed with the next steps, delete existing, expired cert from Templates->Security->SSL/Certificates section.
2. Than, back to the “Access Settings” section.
Click again pencil icon and delete 2 new generated certs. In a blank SSL/TLS space, choose “Create Certificate” option.
3. You need to fill three places:
– Name -> it can be the same as in previous cert
– Common name -> it can be the same as in previous cert
– Subject Alternate Name (SAN) -> here you need to type IP address of the Avi LB
4. Go to Templates->Security->SSL/Certificates. New cert is visible. Now, download (copy to clipboard) certificate data.
3. Replace certificate in the Supervisor
1. Log in to the vCenter. Go to Workload-Management->choose Supervisor->Configure
Under Supervisor section choose Network and expand Load Balancer settings.
Click edit near Server Certificate filed, paste a new cert data and save changes.
2. Supervisor is Configuring and after a while, you should see green Running status.
3. That’s it! Now, we can create a new services and problem is fixed.
4. Wrapping up
At the end I want to mention, this solution work in my case. In your situation, problem can exist in a different part of the system. But, at first always look at the most simple and obvious places before digging deeper. In this case, you have an answer in the error message and the fix is quick and easy.