Google Cloud VertexAI Agent: the chatbot, easier - First overview

guillaume blaquiere
Google Cloud - Community
10 min readApr 21, 2024

--

The LLM (Large Language Model) ecosystem evolves very fast and each actor focuses their efforts on different areas with fine-tuning, specific models and other innovations.

Weeks before Cloud Next, I tested a new feature that I only saw in videos before: VertexAI App Builder conversation

I already found this product totally crazy!!

At Cloud Next 2024, it has been rebranded as VertexAI Agent Builder. Agent has been a large part of the opening keynote.

I would like to share a first overview of this incredible service and show you why it is so revolutionary!

A conversation service

First of all, Agent is an evolution of Dialogflow, the Google Cloud service to build a conversational chatbot. You can notice this relationship in the documentation URL for instance:

https://cloud.google.com/dialogflow/vertex/docs/concept/agents

That’s why the integration options are the same as Dialogflow and I won’t explore this part, only the Agent creation and configuration

With Agent Builder, the way to define a chatbot is totally different compared to Dialogflow. You simply create playbooks!

A playbook to play a bot

A playbook describes the tasks that your agent does. It’s very simple!! In natural “English” language, tell your agent what it has to do, what it can’t, how it behaves with the users.

In short, a playbook is a prompt that defines your agent

When you have defined your first playbook, you can immediately test it. It’s very intuitive. You can test conversation, change your playbook, try it again, undo entries, view debug,…

Here a dummy example written in 5 minutes

- You are an always happy chatbot
- you greet nicely the user and ask what you can do for them
- your favorite quote is "Hello World", and it's what you prefer to say by default.
- In your eyes, the weather is always sunny, because it's the source of your happiness
- You can't speak about your family, it's a forbidden topic

And the result with a conversation

Up to now, nothing really revolutionary, you can reproduce the same thing with any other LLMs.

The things become exciting when you can use tools and other playbooks

Reference other playbooks

One useful command is the capacity to re-route the user to a dedicated agent to achieve a specific task, like an operator that redirects the call to colleagues in the dedicated service: purchase, order follow up, claims,…

For this, a redirect is noted like this

- if the user wants to talk with the claims service, redirect to ${agent: claims}

Really easy!! If you are familiar with Dialogflow, it’s like creating a flow

The current downside

However, I was unable to transfer context and variables to the next agent. For instance, I would like to identify the user with an “Authenticator Agent” and then forward the user identity to other agents. But it’s impossible. Maybe because the service is too new.

The consequence is that all the agents must implement the same prompt to be able to authenticate the user.

Or I do not have the correct pattern to implement this factorization!

The tool configuration

The tools are for me the real game changer. Most of the time, when you write a Agent to interact with a user, a prompted conversation is cool, but you would like to extend it:

  • Ground the agent knowledge with your own documentation/knowledge base, something similar to RAG (Retrieval Augmented Generation), but often complex to achieve
  • Achieve mathematical operation or transformation
  • Interact with the external world through API to get data, or to achieve task

The datastore configuration

For the first item, you can create a datastore. It’s based on VertexAI Enterprise search that indexes multiple sources and types of documents and allows you to search them at the speed of Google Search.

From there, you can add this datastore to your agent, and the answer will be automatically enriched from this datastore indexation. Nothing special to perform to use this feature except defining the correct datastore in your agent!

An out-of-the-box RAG feature in a few easy steps!

The code based extension

You can use code based extension to ask the agent to generate python code and to execute it for you. This time, you have to mention the ${tool: <>} special token to indicate to the agent when to use the tool.

You can ask for mathematical operation

Or for transformation

Don’t forget to activate the code interpreter feature and add it in your prompt

In my opinion, this tool is useful for doing managed operation, I mean operations defined in the prompt and not directly offer to the user.
For instance, calculate tax amount on a product.

Opening the code interpreter tool to the user can lead to unexpected behavior and response. Or simply, do not work at all!

The API control

When the built-in RAG and code interpreter are not enough, you can build and interact with APIs, yours or existing ones. I chose to experiment with the Cloud Run APIs.

I detail below the experimentation.

The Cloud Run playbook

This playbook has been my playground. 3 steps:

  • Get the user project ID, list them if needed to help the user
  • Get the user preferred region, list them if needed
  • List the Cloud Run services in that region for the selected project.

In 3 bullet points, my prompt is almost over!

- greet the user and be friendly to handle its request
- Step 1: Ask the user for providing the project ID.
- If the user wants to know the available project-id, use the ${TOOL: list-project} to list them
- Confirm the projectId with the user
- Step 2: Ask the user for providing Cloud Run service location
- If the user wants to know the available location, use the ${TOOL: list-cloudrun-location}.
- Confirm the location with the user
- Step 3: Ask the user for providing Cloud Run service name
- If the user wants to know the existing Cloud Run service name, use the ${TOOL: List-cloudrun-services} . List the service name

You can mention 3 customs tools here. Let’s explore them

The tool creation

You can create custom tools of 3 types

  • Datastore: not used and not explored (it’s only a first overview). I supposed it’s for using a datastore dedicated to a specific intent.
  • Function: I tried it, but I never understood how it works and when to use it. Need more experimentation on it
  • OpenAPI: Set your API spec, and use it. Ridiculously easy! I only used it!

Here, the 3 OpenApi tools I created. I only used Google Cloud APIs and Google Cloud does not provide OpenAPI spec. So I wrote them manually before a friend, Ivan Beauvais, shared this GitHub with me.

list-project

openapi: 3.0.3
info:
title: List Google Cloud Project
description: List all the project in the current Google Cloud context
version: 1.0.11
servers:
- url: https://cloudresourcemanager.googleapis.com/v1/projects
paths:
/:
get:
summary: List all the projects
operationId: listProject
responses:
'200':
description: list of projects.
content:
application/json:
schema:
type: object
properties:
projects:
type: array
items:
$ref: '#/components/schemas/Project'
'400':
description: Invalid status value
components:
schemas:
Project:
type: object
properties:
projectNumber:
type: string
projectId:
type: string
lifecycleState:
type: string
name:
type: string
createTime:
type: string
format: date-time
parent:
type: object
properties:
type:
type: string
id:
type: string
labels:
type: object
additionalProperties:
type: string

list-cloudrun-location

openapi: 3.0.0
info:
title: Cloud Run Locations API
version: 1.0.0
description: The Cloud Run Locations API.
servers:
- url: https://run.googleapis.com
paths:
/v1/projects/{project}/locations:
get:
summary: Lists Cloud Run locations for a project.
operationId: listLocations
description: Returns a list of locations that serve resources for the specified project.
parameters:
- name: project
in: path
required: true
description: the projectId.
schema:
type: string
responses:
'200':
description: OK
content:
application/json:
schema:
type: object
properties:
locations:
type: array
items:
$ref: '#/components/schemas/Location'
nextPageToken:
type: string
description: The standard List next-page token.
'403':
description: Forbidden
'404':
description: Not Found

components:
schemas:
Location:
type: object
properties:
name:
type: string
description: The location name.
locationId:
type: string
description: The canonical ID for this location.

labels:
type: object
description: Cross-service attributes for the location.
additionalProperties:
type: string
metadata:
type: object
description: Service-specific metadata. For example the available capacity at the given location.
additionalProperties:
type: object

list-cloudrun-services

openapi: 3.0.0
info:
title: Cloud Run Services API
version: v2
servers:
- url: https://run.googleapis.com
paths:
/v2/projects/{project}/locations/{location}/services:
get:
summary: List Cloud Run services
operationId: listServices
description: Lists Cloud Run Services resources.
parameters:
- name: project
in: path
required: true
description: the projectId.
schema:
type: string

- name: location
in: path
required: true
description: the region of the Cloud Run service.
schema:
type: string
responses:
'200':
description: OK
content:
application/json:
schema:
type: object

You can notice the parameters definitions. It’s very important to describe them correctly to allow the LLM to match the data of the conversation context with those parameters and fill the correct values.
And the magic happens! It works amazingly well!

In addition, and again to help the LLM, you have to take care of the description of the tool that will belong to the context

You can also notice I focused my attention on the query definition. The response part can be a simple object or an incomplete response definition.

The magic of LLM is the understanding of the JSON response, as long as your fields are correctly named. You can override the automatic inference by describing only the required/mandatory fields to help the LLM.

The authentication mechanism

Of course, to let the Agent access the APIs, you must configure the correct authentication of it.

In that case, I used the service agent identity (again, the legacy Dialogflow service agent). Remember:

  • Use access token when you reach a Google APIs
  • Use an identity token when you reach YOUR APIs (on Cloud Run or Cloud Functions for instance!)

I granted it at the organization level as

  • Project viewer to list all my projects
  • Cloud Run admin to access the Cloud Run data in all my projects

Try it out

Now you can play with the Agent and try different conversations

Here a step by step conversation

But you can also set all the variables in a single prompt and ask to list the services

It can detect all the required parameters from the context and correctly invoke the tools.

The mind-blowing feature and downsides

I already mentioned one of the mind-blowing features:
As long as your JSON response is self-sufficient, the LLM can infer your expectation and pick the useful fields without detailed response description.
Super useful when you want to use non-OpenAPI APIs

Another incredible thing is the capacity to achieve not prompted actions. For instance, in the list of locations, you can ask to display only the US location, or the European location.
It can filter them on its own! Magic!

You can also ask for a specific formatting

You can also ask to change the value from the context

Finally, you have a multi-language agent at no effort, not even a configuration! Here a sample in French (my native language)

In my opinion, a “wrong-good idea” would be to define the prompt in non-english language, as here in French

Ok, it works well, but, because most of the API, JSON fields name and other information are in English, I personally feared misinterpretation. It might be a wrong assumption from me, but I would prefer to recommend prompts in English.

However, I faced an hard-to-debug error: Failed to generate response

The authorization is OK, the API spec works great. So why???
After 2 days of investigating it, I finally understood that it was because the API JSON response was too large and the Agent crashed… Only deduction, no other clues or help…
Again, it’s only the beginning of the Agent Builder but you could discover flaws like that and lose time stupidly!

Go to the next level

After this first discovery, I think I will need 1 or 2 hours to build an Agent similar to Cloud Run, instead of 1 week with dialogflow.
It’s totally crazy how it is simple, powerful, and can achieve much more than you expect!

So now, I no longer want to code, I only want to create an Agent for any operation or organizational processes that I have to achieve!!

And you, how do you envision using the agents?

--

--

guillaume blaquiere
Google Cloud - Community

GDE cloud platform, Group Data Architect @Carrefour, speaker, writer and polyglot developer, Google Cloud platform 3x certified, serverless addict and Go fan.