第5章：ツール利用

第5章：ツール利用

これまで、主に言語モデル間の相互作用のオーケストレーションとエージェントの内部ワークフロー内の情報フローの管理に関与するエージェンティックパターンを討論してきた（Chaining、Routing、Parallelization、Reflection）。しかし、エージェントが本当に有用であり、現実世界またはノチェーン外部システムと相互作用するために、彼らは外部API、データベース、サービスと相互作用するか、コードを実行する能力が必要である。

sit-xinli

May 2, 2026

0 upvotes

0 downloads

0 views

ai agent llm openai gemini

View source

# 第5章：ツール利用ツール利用パターンの概要これまで、主に言語モデル間の相互作用のオーケストレーションとエージェントの内部ワークフロー内の情報フローの管理に関与するエージェンティックパターンを討論してきた（Chaining、Routing、Parallelization、Reflection）。しかし、エージェントが本当に有用であり、現実世界またはノチェーン外部システムと相互作用するために、彼らは外部API、データベース、サービスと相互作用するか、コードを実行する能力が必要である。ツール利用パターンは、多くの場合、関数呼び出しと呼ばれるメカニズムを通じて実装され、エージェントが外部API、データベース、サービスと相互作用するか、コードを実行することを可能にする。これにより、LLMがエージェントの中核にあり、ユーザーのリクエストまたはタスクの現在の状態に基づいて、特定の外部機能をいつどのように使用するかを決定できる。プロセスは通常、以下を含む： 1. **ツール定義：** 外部機能または機能はLLMに記述され、理解することができるように定義される。この説明には、機能の目的、その名前、およびそれが受け入れるパラメータと、そのタイプと説明が含まれる。 2. **LLM決定：** LLMはユーザーのリクエストと利用可能なツール定義を受け取る。リクエストとツールの理解に基づいて、LLMはツールの呼び出しがリクエストを満たすために必要であるかどうかを決定する。 3. **関数呼び出し生成：** LLMがツールを使用することを決定した場合、それは構造化された出力（多くの場合JSONオブジェクト）を生成し、呼び出すツール、およびユーザーのリクエストから抽出されたパラメータ（パラメータ）の引数の名前を指定する。 4. **ツール実行：** エージェンティックフレームワークまたはオーケストレーションレイヤーは、この構造化された出力をインターセプトする。要求されたツールを特定し、提供された引数で実際の外部機能を実行する。 5. **観測/結果：** ツール実行からの出力または結果はエージェントに返される。 6. **LLM処理（オプションだが一般的）：** LLMはツールの出力をコンテキストとして受け取り、ユーザーに対する最終応答を定式化するか、ワークフロー内の次のステップを決定するために使用する（別のツールの呼び出し、リフレクション、または最終回答の提供が含まれる可能性がある）。このパターンは基本的である、なぜなら、LLMの訓練データの制限を破り、最新の情報にアクセスでき、内部的に実行できない計算を実行でき、ユーザー固有のデータと相互作用し、現実世界のアクションをトリガーできるためである。関数呼び出しは、LLMの推論能力と利用可能な外部機能の広大な配列の間のギャップを橋渡しする技術的メカニズムである。「関数呼び出し」は、特定の事前に定義されたコード関数を呼び出すことをうまく記述するが、「ツール呼び出し」のより拡張的な概念を考慮することが有用である。このより広い用語は、エージェントの機能が単純な関数実行をはるかに超えて拡張できることを認識する。「ツール」は従来の関数である可能性がありますが、複雑なAPIエンドポイント、データベースへのリクエスト、または別の専門化されたエージェントに向けられた指示である可能性もあります。この視点により、例えば主要なエージェントが複雑なデータ分析タスクを専任の「分析家エージェント」に委任するか、その APIを通じて外部ナレッジベースをクエリするより洗練されたシステムを想像することができる。「ツール呼び出し」の観点から考えることは、デジタルリソースと他の知的実体の多様なエコシステムをオーケストレーターとして機能するエージェントの完全な可能性をより良くキャプチャする。 LangChain、LangGraph、Google Agent Developer Kit（ADK）などのフレームワークは、ツールを定義し、エージェントワークフローに統合するためのサポート提供し、多くの場合、Geminiシリーズまたは OpenAIシリーズなどの最新LLMの本来の関数呼び出し機能を活用する。これらのフレームワークの「キャンバス」に、ツールを定義し、（通常LLMエージェント）エージェントをこれらのツールを認識して使用できるように構成する。ツール利用は、強力でインタラクティブで外部認識エージェントを構築するためのコーナーストーンパターンである。実践的応用とユースケースツール利用パターンは、エージェントがテキスト生成を超えて行動を実行するか、特定の動的情報を取得する必要があるほぼすべてのシナリオに適用可能である： 1. **外部ソースからの情報取得：** LLMの訓練データに存在しない現実時間データまたは情報にアクセスする。 > ● **ユースケース：** 気象エージェント。 > > ○ **ツール：** ロケーションを取得し、現在の気象条件を返す気象API。 > > ○ **エージェントフロー：** ユーザーが「ロンドンの天気は何ですか？」と尋ねると、LLMは気象ツールの必要性を特定し、「ロンドン」を使用してツールを呼び出し、ツールはデータを返し、LLMはデータをユーザー友好的な応答にフォーマットする。 2. **データベースとAPIとの相互作用：** 構造化データに対するクエリ、更新、またはその他の操作を実行する。 > ● **ユースケース：** 電子商取引エージェント。 > > ○ **ツール：** API呼び出しは、製品在庫をチェックし、注文ステータスを取得し、支払いを処理する。 > > ○ **エージェントフロー：** ユーザーが「製品Xは在庫にありますか？」と尋ねると、LLMは在庫APIを呼び出し、ツールは在庫数を返し、LLMはユーザーに在庫ステータスを伝える。 3. **計算とデータ分析の実行：** 外部計算機、データ分析ライブラリ、または統計ツールを使用する。 > ● **ユースケース：** 金融エージェント。 > > ○ **ツール：** 電卓機能、株式市場データAPI、スプレッドシートツール。 > > ○ **エージェントフロー：** ユーザーが「AAPLの現在の価格は何であり、150ドルで100株を購入した場合の潜在的な利益を計算しますか？」と尋ねると、LLMは株式APIを呼び出し、現在の価格を取得し、電卓ツールを呼び出し、結果を取得し、応答をフォーマットする。 4. **通信の送信：** メール、メッセージを送信するか、外部通信サービスにAPI呼び出しを行う。 > ● **ユースケース：** パーソナルアシスタントエージェント。 > > ○ **ツール：** メール送信API。 > > ○ **エージェントフロー：** ユーザーが「明日の会議についてジョンにメールを送る」と言うと、LLMはメールツールを呼び出し、リクエストから抽出された受信者、件名、および本文を呼び出す。 5. **コード実行：** 安全な環境でコードスニペットを実行して、特定のタスクを実行する。 > ● **ユースケース：** コーディングアシスタントエージェント。 > > ○ **ツール：** コードインタプリタ。 > > ○ **エージェントフロー：** ユーザーがPythonスニペットを提供し、「このコードは何をしますか？」と尋ねると、LLMはインタプリタツールを使用してコードを実行し、その出力を分析する。 6. **他のシステムまたはデバイスの制御：** スマートホームデバイス、IoTプラットフォーム、または他の接続システムと相互作用する。 > ● **ユースケース：** スマートホームエージェント。 > > ○ **ツール：** スマートライトを制御するAPIです。 > > ○ **エージェントフロー：** ユーザーが「リビングルームのライトをオフにする」と言うと、LLMはスマートホームツールをコマンドとターゲットデバイスで呼び出す。ツール利用は、言語モデルをテキストジェネレータからエージェント感覚、推理、デジタルまたは物理世界における行動が可能なエージェントに変換する（Fig. 1を参照） ![](../media/images/image13.jpg) > Fig.1: エージェントがツールを使用するいくつかの例ハンズオンコード例（LangChain） LangChainフレームワーク内でのツール利用の実装は、2段階プロセスである。最初に、1つまたは複数のツールが定義され、通常は既存のPython関数または他の実行可能なコンポーネントをカプセル化することによって。その後、これらのツールは言語モデルにバインドされ、それにより、モデルは、ユーザーのクエリを満たすために外部機能呼び出しが必要であると判断した場合、構造化されたツール使用リクエストを生成する能力を付与される。以下の実装は、まず、情報取得ツールをシミュレートするシンプルな機能を定義してこの原則を実証する。その後、エージェントはユーザー入力に応じてこのツールをレバレッジするように構成される。この例の実行には、コアLangChainライブラリおよびモデル固有のプロバイダーパッケージのインストールが必要である。さらに、APIキーで設定された、選択された言語モデルサービスとの適切な認証は必要な前提条件である。 ```python import os, getpass import asyncio import nest_asyncio from typing import List from dotenv import load_dotenv import logging from langchain_google_genai import ChatGoogleGenerativeAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.tools import tool as langchain_tool from langchain.agents import create_tool_calling_agent, AgentExecutor # UNCOMMENT # Prompt the user securely and set API keys as an environment variables os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API key: ") os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ") try: # A model with function/tool calling capabilities is required. llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0) print(f"✅ Language model initialized: {llm.model}") except Exception as e: print(f"🛑 Error initializing language model: {e}") llm = None # --- Define a Tool --- @langchain_tool def search_information(query: str) -> str: """ Provides factual information on a given topic. Use this tool to find answers to phrases like 'capital of France' or 'weather in London?'. """ print(f"\n--- 🛠 Tool Called: search_information with query: '{query}' ---") # Simulate a search tool with a dictionary of predefined results. simulated_results = { "weather in london": "The weather in London is currently cloudy with a temperature of 15°C.", "capital of france": "The capital of France is Paris.", "population of earth": "The estimated population of Earth is around 8 billion people.", "tallest mountain": "Mount Everest is the tallest mountain above sea level.", "default": f"Simulated search result for '{query}': No specific information found, but the topic seems interesting." } result = simulated_results.get(query.lower(), simulated_results["default"]) print(f"--- TOOL RESULT: {result} ---") return result tools = [search_information] # --- Create a Tool-Calling Agent --- if llm: # This prompt template requires an `agent_scratchpad` placeholder for the agent's internal steps. agent_prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ]) # Create the agent, binding the LLM, tools, and prompt together. agent = create_tool_calling_agent(llm, tools, agent_prompt) # AgentExecutor is the runtime that invokes the agent and executes the chosen tools. # The 'tools' argument is not needed here as they are already bound to the agent. agent_executor = AgentExecutor(agent=agent, verbose=True, tools=tools) async def run_agent_with_tool(query: str): """Invokes the agent executor with a query and prints the final response.""" print(f"\n--- 🏃 Running Agent with Query: '{query}' ---") try: response = await agent_executor.ainvoke({"input": query}) print("\n--- ✅ Final Agent Response ---") print(response["output"]) except Exception as e: print(f"\n🛑 An error occurred during agent execution: {e}") async def main(): """Runs all agent queries concurrently.""" tasks = [ run_agent_with_tool("What is the capital of France?"), run_agent_with_tool("What's the weather like in London?"), run_agent_with_tool("Tell me something about dogs.") # Should trigger the default tool response ] await asyncio.gather(*tasks) nest_asyncio.apply() asyncio.run(main()) ``` コードは、LangChainライブラリとGoogle Geminiモデルを使用してツール呼び出しエージェントをセットアップする。search_informationツールを定義し、特定のクエリに対する事実的な回答を提供することをシミュレートする。ツールは「weather in london」、「capital of france」、「population of earth」に対して事前に定義された応答を持ち、他のクエリに対する既定の応答がある。ChatGoogleGenerativeAIモデルは初期化され、ツール呼び出し機能を持つことを確認する。ChatPromptTemplateはエージェントの相互作用をガイドするために作成される。create_tool_calling_agent関数は言語モデル、ツール、およびプロンプトをエージェントに結合するために使用される。AgentExecutorはエージェントの実行とツール呼び出しを管理するために設定される。run_agent_with_tool非同期関数はエージェントクエリを呼び出すプロセスを簡素化するために定義される。main非同期関数は、複数のクエリを並行実行するために準備される。これらのクエリは、search_informationツールの特定と既定の応答の両方をテストするように設計されている。最後に、asyncio.run(main())呼び出しは、すべてのエージェントタスクを実行する。コードはエージェントセットアップと実行を続行する前にLLM初期化が成功したことを確認する。ハンズオンコード例（CrewAI）このコードはCrewAIフレームワーク内で関数呼び出し（ツール）を実装する方法の実践的な例を提供する。エージェントに情報を検索するツールを備えた簡単なシナリオをセットアップする。例は、このエージェントとツール使用してシミュレートされた株価を取得することを具体的に実証する。 ```python # pip install crewai langchain-openai import os from crewai import Agent, Task, Crew from crewai.tools import tool import logging # --- Best Practice: Configure Logging --- # A basic logging setup helps in debugging and tracking the crew's execution. logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') # --- Set up your API Key --- # For production, it's recommended to use a more secure method for key management # like environment variables loaded at runtime or a secret manager. # # Set the environment variable for your chosen LLM provider (e.g., OPENAI_API_KEY) # os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY" # os.environ["OPENAI_MODEL_NAME"] = "gpt-4o" # --- 1. Refactored Tool: Returns Clean Data --- # The tool now returns raw data (a float) or raises a standard Python error. # This makes it more reusable and forces the agent to handle outcomes properly. @tool("Stock Price Lookup Tool") def get_stock_price(ticker: str) -> float: """ Fetches the latest simulated stock price for a given stock ticker symbol. Returns the price as a float. Raises a ValueError if the ticker is not found. """ logging.info(f"Tool Call: get_stock_price for ticker '{ticker}'") simulated_prices = { "AAPL": 178.15, "GOOGL": 1750.30, "MSFT": 425.50, } price = simulated_prices.get(ticker.upper()) if price is not None: return price else: # Raising a specific error is better than returning a string. # The agent is equipped to handle exceptions and can decide on the next action. raise ValueError(f"Simulated price for ticker '{ticker.upper()}' not found.") # --- 2. Define the Agent --- # The agent definition remains the same, but it will now leverage the improved tool. financial_analyst_agent = Agent( role='Senior Financial Analyst', goal='Analyze stock data using provided tools and report key prices.', backstory="You are an experienced financial analyst adept at using data sources to find stock information. You provide clear, direct answers.", verbose=True, tools=[get_stock_price], # Allowing delegation can be useful, but is not necessary for this simple task. allow_delegation=False, ) # --- 3. Refined Task: Clearer Instructions and Error Handling --- # The task description is more specific and guides the agent on how to react # to both successful data retrieval and potential errors. analyze_aapl_task = Task( description=( "What is the current simulated stock price for Apple (ticker: AAPL)? " "Use the 'Stock Price Lookup Tool' to find it. " "If the ticker is not found, you must report that you were unable to retrieve the price." ), expected_output=( "A single, clear sentence stating the simulated stock price for AAPL. " "For example: 'The simulated stock price for AAPL is $178.15.' " "If the price cannot be found, state that clearly." ), agent=financial_analyst_agent, ) # --- 4. Formulate the Crew --- # The crew orchestrates how the agent and task work together. financial_crew = Crew( agents=[financial_analyst_agent], tasks=[analyze_aapl_task], verbose=True # Set to False for less detailed logs in production ) # --- 5. Run the Crew within a Main Execution Block --- # Using a __name__ == "__main__": block is a standard Python best practice. def main(): """Main function to run the crew.""" # Check for API key before starting to avoid runtime errors. if not os.environ.get("OPENAI_API_KEY"): print("ERROR: The OPENAI_API_KEY environment variable is not set.") print("Please set it before running the script.") return print("\n## Starting the Financial Crew...") print("---------------------------------") # The kickoff method starts the execution. result = financial_crew.kickoff() print("\n---------------------------------") print("## Crew execution finished.") print("\nFinal Result:\n", result) if __name__ == "__main__": main() ``` このコードは、Crew.aiライブラリを使用した簡単なアプリケーションを実証する金融分析タスクをシミュレートする。カスタムツール、get_stock_priceを定義し、事前定義されたティッカーの株価検索をシミュレートする。ツールは有効なティッカーに対して浮動小数点数を返すか、無効なティッカーに対して値エラーを発生させるように設計されている。Crew.aiエージェント、financial_analyst_agentは、上級金融アナリストの役割で作成される。このエージェントにはget_stock_priceツールが付与される。タスク、analyze_aapl_taskは、ツールを使用してAAPLのシミュレートされた株価を見つけるようにエージェントに指示することを目指めて定義される。タスク説明には、ツール使用時の成功と失敗の両方のケースの処理方法についての明確な指示が含まれている。Crewが組み立てられ、financial_analyst_agentとanalyze_aapl_taskを含む。verbose設定は、実行時にエージェントとクルーの詳細なログを提供するために有効化される。スクリプトのメイン部分は、標準if __name__ == "__main__": ブロック内でkickoff()メソッドを使用して、クルーのタスクを実行する。開始する前に、エージェントが機能するために必要なOPENAI_APIKEYスイッパロンメント変数が設定されているかどうかをチェックする。クルーの実行結果はコンソールに出力される。コードは、より良い実行追跡のためのログ記録設定も含む。本番環境により安全なメソッドが推奨されるが、環境変数のAPIキー管理を使用する。要するに、コアロジックは、Crew.aiでツール、エージェント、およびタスクを定義して、協調的なワークフローを作成する方法を実証する。ハンズオンコード（Google ADK） Google Agent Developer Kit（ADK）は、エージェントの機能に直接組み込むことができるネイティブ統合ツールのライブラリを含む。 **Google search：** そのようなコンポーネントの主な例はGoogle Searchツールである。このツールはGoogle検索エンジンへのダイレクトインターフェースを提供し、ウェブ検索を実行し、外部情報を取得するエージェントを装備する。 ```python from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools import google_search from google.genai import types import nest_asyncio import asyncio # Define variables required for Session setup and Agent execution APP_NAME="Google Search_agent" USER_ID="user1234" SESSION_ID="1234" # Define Agent with access to search tool root_agent = ADKAgent( name="basic_search_agent", model="gemini-2.0-flash-exp", description="Agent to answer questions using Google Search.", instruction="I can answer your questions by searching the internet. Just ask me anything!", tools=[google_search] # Google Search is a pre-built tool to perform Google searches. ) # Agent Interaction async def call_agent(query): """ Helper function to call the agent with a query. """ # Session and Runner session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=root_agent, app_name=APP_NAME, session_service=session_service) content = types.Content(role='user', parts=[types.Part(text=query)]) events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) nest_asyncio.apply() asyncio.run(call_agent("what's the latest ai news?")) ``` このコードは、Google ADKを使用してGoogleサーチをツールとして利用する基本的なエージェントを作成および使用する方法を実証する。まず、IPython、google.adk、google.genaiから必要なライブラリをインポートする。アプリケーション名、ユーザーID、セッションIDの定数が定義される。エージェントインスタンス、「basic_search_agent」は、その目的を示す説明と指示で作成される。Google検索ツールを使用するように設定され、Google ADKによって提供される事前構築されたツールである。InMemorySessionService（第8章を参照）がエージェントのセッション管理のために初期化される。指定されたアプリケーション、ユーザー、セッションIDの新しいセッションが作成される。Runnerはエージェントとセッションサービスをリンクすることでインスタンス化され、エージェント相互作用を実行する責任を負う。ヘルパー関数call_agentが定義され、エージェントへのクエリ送信のプロセスを簡略化し、応答の処理を行う。call_agent内で、ユーザーのクエリがtypes.Contentオブジェクトとしてフォーマットされ、役割が「user」である。runner.runメソッドはユーザーID、セッションID、および新しいメッセージコンテンツで呼び出される。runner.runメソッドはエージェントのアクションと応答を表すイベントのリストを返す。コードはこれらのイベントを反復処理して、最終応答を見つける。イベントが最終応答として識別される場合、その応答のテキストコンテンツが抽出される。抽出されたエージェント応答がコンソールに出力される。最後に、call_agent関数がクエリ「what's the latest ai news?」で呼び出され、エージェントの動作を実証する。 **コード実行：** Google ADKは、専門化されたタスク用の統合コンポーネントを備える。built_in_code_execution ツールはエージェントにサンドボックスPythonインタプリタを提供する。これにより、モデルは計算タスクを実行し、データ構造を操作し、手続き的スクリプトを実行するコードを書き実行することができる。このような機能は、確定的ロジックと正確な計算を必要とする問題に対処するために重要であり、確率的言語生成の範囲外である。 ```python import os, getpass import asyncio import nest_asyncio from typing import List from dotenv import load_dotenv import logging from google.adk.agents import Agent as ADKAgent, LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools import google_search from google.adk.code_executors import BuiltInCodeExecutor from google.genai import types # Define variables required for Session setup and Agent execution APP_NAME="calculator" USER_ID="user1234" SESSION_ID="session_code_exec_async" # Agent Definition code_agent = LlmAgent( name="calculator_agent", model="gemini-2.0-flash", code_executor=BuiltInCodeExecutor(), instruction="""You are a calculator agent. When given a mathematical expression, write and execute Python code to calculate the result. Return only the final numerical result as plain text, without markdown or code blocks. """, description="Executes Python code to perform calculations.", ) # Agent Interaction (Async) async def call_agent_async(query): # Session and Runner session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=code_agent, app_name=APP_NAME, session_service=session_service) content = types.Content(role='user', parts=[types.Part(text=query)]) print(f"\n--- Running Query: {query} ---") final_response_text = "No final text response captured." try: # Use run_async async for event in runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content): print(f"Event ID: {event.id}, Author: {event.author}") # --- Check for specific parts FIRST --- # has_specific_part = False if event.content and event.content.parts and event.is_final_response(): for part in event.content.parts: # Iterate through all parts if part.executable_code: # Access the actual code string via .code print(f" Debug: Agent generated code:\n```python\n{part.executable_code.code}\n```") has_specific_part = True elif part.code_execution_result: # Access outcome and output correctly print(f" Debug: Code Execution Result: {part.code_execution_result.outcome} - Output:\n{part.code_execution_result.output}") has_specific_part = True # Also print any text parts found in any event for debugging elif part.text and not part.text.isspace(): print(f" Text: '{part.text.strip()}'") # Do not set has_specific_part=True here, as we want the final response logic below # --- Check for final response AFTER specific parts --- text_parts = [part.text for part in event.content.parts if part.text] final_result = "".join(text_parts) print(f"==> Final Agent Response: {final_result}") except Exception as e: print(f"ERROR during agent run: {e}") print("-" * 30) # Main async function to run the examples async def main(): await call_agent_async("Calculate the value of (5 + 7) * 3") await call_agent_async("What is 10 factorial?") # Execute the main async function try: nest_asyncio.apply() asyncio.run(main()) except RuntimeError as e: # Handle specific error when running asyncio.run in an already running loop (like Jupyter/Colab) if "cannot be called from a running event loop" in str(e): print("\nRunning in an existing event loop (like Colab/Jupyter).") print("Please run `await main()` in a notebook cell instead.") # If in an interactive environment like a notebook, you might need to run: # await main() else: raise e # Re-raise other runtime errors ``` このスクリプトはGoogle ADKを使用してPythonコードを記述して実行することで数学的問題を解く、計算機として機能するエージェントを作成する。これはビルトイン_code_execution ツールを備えた電卓として機能することをLlmAgent特に指示する。プライマリロジックはcall_agent_async関数に存在し、ユーザーのクエリをエージェントのランナーに送信し、結果のイベントを処理する。この関数内で、非同期ループがイベント内を反復処理し、デバッグ用に生成されたPythonコードおよびその実行結果を出力する。コードはこれらの中間ステップと最終イベント（数値回答を含む）を慎重に区別する。最後に、main関数は2つの異なる数学式でエージェントを実行し、計算を実行する能力を実証する。 **エンタープライズサーチ：** このコードは、google.adkライブラリを使用したGoogle ADKアプリケーションを定義する。Python内で特にVSearchAgentを使用し、指定されたVertex AI SearchデータストアをクエリしてQがあるかを回答するために設計される。コードはVSearchAgent名「q2_strategy_vsearch_agent」を初期化し、説明、使用するモデル（「gemini-2.0-flash-exp」）、およびVertex AI SearchデータストアのID提供する。DATASTORE_IDは環境変数として設定されることが期待される。その後、エージェントのRunner、InMemorySessionServiceを使用して会話履歴を管理するための設定をする。非同期関数call_vsearch_agent_asyncはエージェントと相互作用するために定義される。この関数はクエリを取得し、メッセージコンテンツオブジェクトを構築し、ランナーのrun_asyncメソッドを呼び出してクエリをエージェントに送信する。その後、関数はエージェントの応答をコンソールにストリーミングして戻す。これはデータストアからのソース属性を含む最終応答に関する情報も出力する。エラー処理は、エージェント実行中に例外をキャッチしるために含まれ、不正なデータストアIDまたは欠落した権限などの潜在的な問題に関する有益なメッセージを提供する。別の非同期関数run_vsearch_exampleは、例のクエリを持つエージェントを呼び出す方法を実証するために提供される。メイン実行ブロックはDATASTORE_IDが設定されているかどうかをチェックし、asyncioを使用して例を実行する。イベントループがすでに実行されている場合のチェック、Jupyterノートブックのような環境での処理を含む。 ```python import asyncio from google.genai import types from google.adk import agents from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService import os # --- Configuration --- # Ensure you have set your GOOGLE_API_KEY and DATASTORE_ID environment variables # For example: # os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY" # os.environ["DATASTORE_ID"] = "YOUR_DATASTORE_ID" DATASTORE_ID = os.environ.get("DATASTORE_ID") # --- Application Constants --- APP_NAME = "vsearch_app" USER_ID = "user_123" # Example User ID SESSION_ID = "session_456" # Example Session ID # --- Agent Definition (Updated with the newer model from the guide) --- vsearch_agent = agents.VSearchAgent( name="q2_strategy_vsearch_agent", description="Answers questions about Q2 strategy documents using Vertex AI Search.", model="gemini-2.0-flash-exp", # Updated model based on the guide's examples datastore_id=DATASTORE_ID, model_parameters={"temperature": 0.0} ) # --- Runner and Session Initialization --- runner = Runner( agent=vsearch_agent, app_name=APP_NAME, session_service=InMemorySessionService(), ) # --- Agent Invocation Logic --- async def call_vsearch_agent_async(query: str): """Initializes a session and streams the agent's response.""" print(f"User: {query}") print("Agent: ", end="", flush=True) try: # Construct the message content correctly content = types.Content(role='user', parts=[types.Part(text=query)]) # Process events as they arrive from the asynchronous runner async for event in runner.run_async( user_id=USER_ID, session_id=SESSION_ID, new_message=content ): # For token-by-token streaming of the response text if hasattr(event, 'content_part_delta') and event.content_part_delta: print(event.content_part_delta.text, end="", flush=True) # Process the final response and its associated metadata if event.is_final_response(): print() # Newline after the streaming response if event.grounding_metadata: print(f" (Source Attributions: {len(event.grounding_metadata.grounding_attributions)} sources found)") else: print(" (No grounding metadata found)") print("-" * 30) except Exception as e: print(f"\nAn error occurred: {e}") print("Please ensure your datastore ID is correct and that the service account has the necessary permissions.") print("-" * 30) # --- Run Example --- async def run_vsearch_example(): # Replace with a question relevant to YOUR datastore content await call_vsearch_agent_async("Summarize the main points about the Q2 strategy document.") await call_vsearch_agent_async("What safety procedures are mentioned for lab X?") # --- Execution --- if __name__ == "__main__": if not DATASTORE_ID: print("Error: DATASTORE_ID environment variable is not set.") else: try: asyncio.run(run_vsearch_example()) except RuntimeError as e: # This handles cases where asyncio.run is called in an environment # that already has a running event loop (like a Jupyter notebook). if "cannot be called from a running event loop" in str(e): print("Skipping execution in a running event loop. Please run this script directly.") else: raise e ``` 全体的に、このコードはVertex AI Searchを活用して、指定されたデータストアに保存された情報に基づいて質問に答える会話型AI アプリケーション構築のための基本的なフレームワークを提供する。エージェントの定義、ランナーのセットアップ、非同期エージェント相互作用と応答ストリーミングの実装方法を実証する。焦点は、特定のデータストアから情報を取得し、ユーザークエリに対応する情報を合成することである。 **頂点拡張：** Vertex AI拡張は、モデルが外部APIに接続してリアルタイムデータ処理とアクション実行するための構造化APIラッパーである。拡張機能は、エンタープライズグレードのセキュリティ、データプライバシー、パフォーマンス保証を提供する。コード生成と実行、ウェブサイトクエリ、プライベートデータストアからの情報分析などのタスクに使用できる。Googleは、Code InterpreterおよびVertex AI Searchなどの一般的なユースケースの事前構築拡張を提供し、カスタムを作成するオプションを提供する。拡張機能の主な利点には、強力なエンタープライズコントロールおよび他のGoogleプロダクトとの統合が含まれる。拡張機能と関数呼び出しの主な違いは実行にある：Vertex AIは拡張機能を自動的に実行し、関数呼び出しはユーザーまたはクライアントによる手動実行を要求する。一目でわかること **何：** LLMは強力なテキストジェネレータだが、根本的に外世界から切断されている。それらの知識は静的であり、訓練データに限定され、アクションを実行するか現実時間情報を取得する能力を欠いている。この固有の制限は、外部API、データベース、またはサービスとの相互作用を必要とするタスクを完了することを防ぐ。これらの外部システムへのブリッジがなければ、現実世界の問題を解決するための有用性は大幅に制限される。 **なぜ：** ツール利用パターンは、関数呼び出しを通じてしばしば実装され、この問題への標準化されたソリューションを提供する。利用可能な外部機能またはツールをLLMが理解できる方法で記述することで機能する。ユーザーのリクエストに基づいて、エージェントティックLLMは、ツールが必要であるかどうかを決定し、どのツールを呼び出すか、およびどの引数でを指定する構造化データオブジェクト（JSONなど）を生成できる。オーケストレーションレイヤーはこの関数呼び出しを実行し、結果を取得し、LLMにフィードバックを行う。これにより、LLMは最新の外部情報またはアクションの結果を最終応答に組み込むことができ、効果的にそれに行動を実行する能力を付与する。 **経験則：** エージェントがLLMの内部知識を超えて、外部世界と相互作用する必要がある場合、ツール利用パターンを使用する。これは、現実時間データ（例えば、天気、株価をチェック）を必要とするタスク、プライベート情報またはプロプライエタリ情報にアクセスする（例えば、会社のデータベースをクエリ）、正確な計算を実行する、コードを実行する、または他のシステム（例えば、メール送信、スマートデバイス制御）でアクションをトリガーするなどのタスクに不可欠である。 **ビジュアルサマリー：** ![](../media/images/image14.jpg) Fig.2: ツール利用デザインパターン重要なポイント - ツール利用（関数呼び出し）により、エージェントが外部システムと相互作用し、動的情報にアクセスできる。 - これには、LLMが理解できるように、明確な説明とパラメータでツールを定義することが含まれる。 - LLMはツールをいつ使用するかを決定し、構造化された関数呼び出しを生成する。 - エージェンティックフレームワークは実際のツール呼び出しを実行し、結果をLLMに返す。 - ツール利用は、現実世界のアクションを実行し、最新の情報を提供できるエージェントを構築するために不可欠である。 - LangChainは@toolデコレーターを使用してツール定義を簡素化し、ツール呼び出しエージェント作成とAgentExecutorを構築するため提供する。 - Google ADKはGoogle Search、Code Execution、Vertex AI Search Toolなど、多くの有用な事前構築ツールを持つ。結論ツール利用パターンは、大規模言語モデルの機能スコープをそれらの固有のテキスト生成機能を超えて拡張するための重要なアーキテクチャ原則である。エージェントに外部ソフトウェアおよびデータソースと相互作用する能力を装備することで、このパラダイムにより、エージェントがアクションを実行し、計算を実行し、他のシステムから情報を取得することが可能になる。このプロセスは、モデルがユーザーのクエリを満たすために必要に応じて外部ツール呼び出しの構造化リクエストを生成することを含む。LangChain、Google ADK、Crew AIなどのフレームワークは、これらの外部ツールの統合を促進する、構造化抽象とコンポーネントを提供する。これらのフレームワークはモデルにツール仕様を公開し、その後のツール使用リクエストをパースするプロセスを管理する。これは、外部デジタル環境と相互作用および行動を実行できる洗練されたエージェンティックシステムの開発を簡素化する。参考文献 1. LangChain Documentation (Tools): > https://python.langchain.com/docs/integrations/tools/ 2. Google Agent Developer Kit (ADK) Documentation (Tools): > https://google.github.io/adk-docs/tools/ 3. OpenAI Function Calling Documentation: > https://platform.openai.com/docs/guides/function-calling 4. CrewAI Documentation (Tools): https://docs.crewai.com/concepts/tools

Related Documents

WordPress AI Client - Coding Agent Guide

AGENTS.md — Cross-Platform Agent Instructions

Contributor Guidelines for the `ee` editor

Light Manager Air Integration Guidelines