Cloud Run / GKE & Istio: network latency comparison

guillaume blaquiere
Google Cloud - Community
6 min readJul 2, 2023

--

Managed services come with tradeoffs. You gain in agility, speed and scalability but you lose the control and the optimization of the infrastructure layer.
Sometimes worse, you inherited additional and useless layers (from your point of view) which can reduce the infrastructure efficiency

Indeed, going serverless is new for many architect teams and questions are many. My company, Carrefour, is not an exception and GKE is mostly recommended because less disruptive that serverless technology.

In that context, the architecture team ask me what the latency metrics are for service to service communication in Cloud Run compared to GKE

So, let’s try to compare them!

Disclaimer: I’m not a Kubernetes expert and I could have missed some pieces, add useless module or misconfigured things. Any feedback is welcomed!!

The network features

Cloud Run is a managed service on top of Google Cloud infrastructure and best practices. It offers:

To implement the same layers with GKE, I chose to use Istio with

And finally, to simplify the deployment and management, and not being an expert on Kubernetes, I chose to use GKE Autopilot with ASM (Anthos Service Mesh) which install Istio on GKE for me!

I chose not to use Knative in that test. It is not compliant with GKE autopilot and increase the complexity for a few difference, even if Knative routing could introduce additional (small) latency.

The test app and protocol

The app design is composed of 3 services, all based on the same container in Go. You can find the code here

  • The ping service receives the target URL (service) to invoke and a number of iteration. It logs the round trip duration to call the target service (pong)
    The ping service has a service account ping-sa
  • The pong services do nothing and immediately return an HTTP 200
  • The Pong Secured only allows ping-sa to reach it out
  • The Pong Unsecured allows anyone to reach it (no authentication check)

There is 2 relevant metrics to extract during the test:

  • The first request should be the slowest. It acquire the identity token and perform the first HTTPS handshake.
  • The average latency after 1000 requests to enough samples, and to dilute the first slow request.

Before the test you have to build the single container that you will use. You can use Cloud Build for that

gcloud builds submit --tag=gcr.io/<projectid>/latency

Replace the <projectid> by your own project ID

Cloud Run test

Cloud Run is a fully managed service. You have nothing to perform on your project except to activate the API.

gcloud services enable run.googleapis.com

Deployment

Use the following command to deploy the services on Cloud Run

# Create the service account for the Ping service
gcloud iam service-accounts create ping-sa

# Deploy the Ping service with its service account
gcloud run deploy latency-ping \
--service-account=ping-sa@<projectid>.iam.gserviceaccount.com \
--image=gcr.io/<projectid>/latency \
--allow-unauthenticated --region=us-central1 --platform=managed

# Deploy the Pong secured services
gcloud run deploy latency-pong-sec \
--image=gcr.io/<projectid>/latency \
--region=us-central1 --platform=managed

# Authorize only Ping SA to invoke Pong secured service
gcloud run services add-iam-policy-binding latency-pong-sec \
--member=serviceAccount:ping-sa@<projectid>.iam.gserviceaccount.com \
--role=roles/run.invoker --region=us-central1 --platform=managed

# Deploy the Pong unsecured services (allow allUsers)
gcloud run deploy latency-pong-unsec \
--image=gcr.io/<projectid>/latency \
--allow-unauthenticated --region=us-central1 --platform=managed

To simplify the test, I set the ping service publicly accessible. You can set it private if you prefer.
Replace the
<projectid> by your own project ID

Running the test

Use the following command to run the test on Cloud Run

# Invoke Ping with Pong secured URL. 
# The Google HTTP client is used to manage the security
curl "https://latency-ping-<projectHash>-uc.a.run.app/ping?\
url=https://latency-pong-sec-<projectHash>-uc.a.run.app/pong\
&useGoogleClient=true&nbcall=10"

# Invoke Ping with Pong unsecured URL
curl "https://latency-ping-<projectHash>-uc.a.run.app/ping?\
url=https://latency-pong-unsec-<projectHash>-uc.a.run.app/pong\
&useGoogleClient=false&nbcall=10"

Cloud Run URL being not deterministic, replace the <projectHash> by the provided URL when you deployed your Cloud Run.

GKE and Istio test

GKE autopilot is a managed service by Google Cloud. You have to activate the APIs to use it.

gcloud services enable container.googleapis.com 

Deployment

Before deploying the container on GKE autopilot, you have to configure a GKE autopilot cluster in your project with ASM installed on it..
You will need additional binaries: asmcli and jq

# Create the GKE autopilot cluster
gcloud container clusters create-auto latency --region=us-central1 \
--network=projects/<projectid>/global/networks/default

# Register the cluster on Anthos Service Mesh
asmcli install \
--project_id <projectid> \
--cluster_name latency \
--cluster_location us-central1 \
--enable_all \
--managed \
--ca mesh_ca

# Get the cluster credential
gcloud container clusters get-credentials latency --location us-central1

# Set the sidecar auto injection
kubectl label namespace default istio-injection- \
istio.io/rev=asm-managed --overwrite

# Deploy the yaml files
kubectl apply -f ./gke

Replace the <projectid> by your own project ID
The
/gke folder is in the GitHub repository here

The GKE YAML files are the following

  • deploy-ping.yaml: Create the service account, create the deployment, create the Ping service
  • deploy-pong.yaml: create the deployment and create the Pong services (secured and unsecured)
  • istio-policies.yaml: Force the mTLS communication between Ping and Pong services
  • istio-auth-policies.yaml: Define communication authorization. ping-sa required to call Pong Secured service. No authorization required for Pong unsecured service.
    You can try to change the configuration to validate the correct behavior!

Running the test

Use the following command to run the test on Cloud Run

# Get the loadbalancer IP and put it in variable
export LB_IP=$(kubectl get svc ping-service -o json | \
jq -r ".status.loadBalancer.ingress[0].ip")

# Invoke Ping with Pong secured URL
curl "http://${LB_IP}/ping?\
url=http://pong-sec-service.default.svc.cluster.local/pong\
&useGoogleClient=false&nbcall=10"

# Invoke Ping with Pong unsecured URL
curl "http://${LB_IP}/ping?\
url=http://pong-unsec-service.default.svc.cluster.local/pong\
&useGoogleClient=false&nbcall=10"

It could take a few minute to provision the load balancer for the Ping service.

Don’t forget to delete your cluster after the tests!!

gcloud container clusters delete latency --region=us-central1

Conclusion

I ran test over 1000 requests for secured and unsecured services in the 2 environments. Here a summary of the result

  • The first request latency is roughly the same, about 100ms to acquire the credential and perform the TLS handshake
  • The latency is not really impacted if you request secured or unsecured services.
  • There is a latency difference of 7ms between GKE and Cloud Run.

Having the environment deployed, I chose to test cross service latency

  • Call Cloud Run unsecured service from GKE ping service
  • Call GKE exposed service (ping in this case) from Cloud Run ping service

Here the commands

# GKE Ping -> Cloud Run unsecured
curl "http://${LB_IP}/ping?\
url=https://latency-pong-unsec-<projectHash>.a.run.app/pong\
&useGoogleClient=false&nbcall=10"

# Cloud Run -> GKE Ping service
curl "https://latency-ping-<projectHash>-uc.a.run.app/ping?\
url=http://${LB_IP}/pong&useGoogleClient=false&nbcall=10"

And the result table

  • The Cloud Run service latency is the same and does not depend on the source of the call
  • The access to GKE from outside increase the latency.
    Indeed, at the opposite of the previous test, the request does not come from inside the GKE cluster, but from outside, from Cloud Run. This time the request go through the GKE ping service exposition, through a load balancer. It increases the latency by 0.5ms.

I also performed an irrelevant test to check the latency of external website (https://tranlate.google.fr in my case). The latency is the same (around 80ms)

Finally, in a context of service to service calls, there is a real difference in term of latency.

But, is 7ms latency important?

As always, it depends on your use case!

  • For near real time (trading, ad services, gaming,…), 7ms is an eternity and is not acceptable
  • For websites, it’s invisible to the user, as long as your backend doesn’t have to call dozens of services in the same transaction/user action.
  • For asynchronous use cases, it’s totally fair.

In my current professional context, 7 ms is totally fine and I will continue to promote and use Cloud Run.
But if the latency matters for your use case, using Cloud Native solution is still the best solution!

--

--

guillaume blaquiere
Google Cloud - Community

GDE cloud platform, Group Data Architect @Carrefour, speaker, writer and polyglot developer, Google Cloud platform 3x certified, serverless addict and Go fan.