OpenVINO と LangChain を使用して ReAct エージェントを作成#

この Jupyter ノートブックは、ローカルへのインストール後にのみ起動できます。

LLM は、トレーニングされた知識とコンテキストとして提供される追加の知識に限定されているため、有用な情報に提供された知識が欠けている場合、モデルは “回り道” をしても他のソースでそれを見つけることができません。これが、エージェントの概念を導入する理由です。

エージェントの主な考え方は、言語モデルを使用して、実行する一連のアクションを選択することです。エージェントは、言語モデルが推論エンジンとして使用され、どのアクションをどの順序で実行するか決定します。エージェントは、LLM によって実行され、検索エンジン、データベース、ウェブサイトなど一連のツールと統合されたアプリケーションとして考えることができます。エージェント内では、LLM は、ユーザー入力に基づいて、要求を満たすために必要な一連のアクションを計画および実行できる推論エンジンです。

LangChain は、言語モデルを活用したアプリケーションを開発するフレームワークです。LangChain には、さまざまなユースケースに最適化された多数のビルトイン・エージェントが付属しています。

このノートブックでは、OpenVINO と LangChain を使用して AI エージェントを段階的に作成する方法を説明します。

目次:

要件
ツールを作成
プロンプト・テンプレートを作成
LLM を作成します。
- モデルのダウンロード
- LLM の推論デバイスを選択
エージェントを作成
エージェントを実行
インタラクティブなデモ

必要条件#

import os 

os.environ["GIT_CLONE_PROTECTION_ACTIVE"] = "false" 

%pip install -Uq pip 
%pip uninstall -q -y optimum optimum-intel 
%pip install --pre -Uq openvino openvino-tokenizers[transformers] --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly 
%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu\ 
"git+https://github.com/huggingface/optimum-intel.git"\ 
"git+https://github.com/openvinotoolkit/nncf.git"\ 
"torch>=2.1"\ 
"datasets"\ 
"accelerate"\ 
"gradio>=4.19"\ 
"transformers>=4.38.1" "langchain>=0.2.3" "langchain-community>=0.2.4" "wikipedia"

ツールを作成#

まず、呼び出すツールを作成する必要があります。この例では、基本的な計算を行う 3 つのカスタム関数を作成します。カスタムツールの作成に関する詳細情報。

from langchain_core.tools import tool 

@tool 
def multiply(first_int: int, second_int: int) -> int: 
    """Multiply two integers together.""" 
    return first_int * second_int 

@tool 
def add(first_int: int, second_int: int) -> int:
    "Add two integers." 
    return first_int + second_int 

@tool 
def exponentiate(base: int, exponent: int) -> int:
    "Exponentiate the base to the exponent power." 
    return base**exponent

print(f"name of `multiply` tool: {multiply.name}") 
print(f"description of `multiply` tool: {multiply.description}")

name of multiply tool: multiply 
description of multiply tool: Multiply two integers together.

ツールは、エージェント、チェーン、または LLM が世界と対話するのに使用できるインターフェイスです。これらはいくつかの要素を組み合わせます:

ツールの名前
ツールの説明
ツールへの入力内容の JSON スキーマ
呼び出す関数
ツールの結果をユーザーに直接返すかどうか

これですべてが作成されたので、下流で使用するツールのリストを作成できます。

tools = [multiply, add, exponentiate]

プロンプト・テンプレートを作成#

言語モデルのプロンプトは、モデルの応答をガイドするためユーザーが提供する一連の指示または入力であり、コンテキストを理解して、質問への回答、文章の完成、会話への参加など、関連性があり一貫性のある言語ベースの出力を生成するのに役立ちます。

エージェントによって推論を促すスタイルは異なります。この例では、ReAct エージェントと一般的なプロンプト・テンプレートを使用します。組み込みエージェントの完全なリストについては、エージェント・タイプを参照してください。

ReAct プロンプトは、人間が書いたテキストの推論トレースとアクション、およびアクションに応じた環境の観察を含む、タスク解決の軌跡で構成されています。ReAct プロンプトは直感的で設計が柔軟であり、質問への回答からオンライン・ショッピングまで、さまざまなタスクで最先端の少数ショットのパフォーマンスを実現します。

エージェントのプロンプト・テンプレートでは、入力はユーザーのクエリーであり、agent_scratchpad は以前のエージェント・ツールの呼び出しと対応するツールの出力を含む一連のメッセージである必要があります。

PREFIX = """[INST]Respond to the human as helpfully and accurately as possible. You have access to the following tools:""" 

FORMAT_INSTRUCTIONS = """Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input). 

Valid "action" values: "Final Answer" or {tool_names} 

Provide only ONE action per $JSON_BLOB, as shown: ``` 
{{{{ 
  "action": $TOOL_NAME, 
  "action_input": $INPUT }}}} 
``` 
Follow this format: 

Question: input question to answer 
Thought: consider previous and subsequent steps 
Action: 
``` 
$JSON_BLOB 
``` 
Observation: action result 
...(repeat Thought/Action/Observation N times) 
Thought: I know what to respond 
Action: ``` 
{{{{ 
  "action": "Final Answer", 
  "action_input": "Final response to human" 
}}}} 
```[/INST]"""
 
SUFFIX = """Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:. Thought:[INST]""" 

HUMAN_MESSAGE_TEMPLATE = "{input}\n\n{agent_scratchpad}"

LLM を作成#

大規模言語モデル (LLM) は、LangChain のコア・コンポーネントです。LangChain は独自の LLM を提供するのではなく、さまざまな LLM と対話する標準インターフェイスを提供します。この例では、エージェント・パイプラインの LLM として Mistral-7B-Instruct-v0.3 を選択します。

Mistral-7B-Instruct-v0.3 - Mistral-7B-Instruct-v0.3 大規模言語モデル (LLM) は、Mistral-7B-v0.3 を微調整したバージョンです。モデルの詳細については、モデルカード、論文およびリリースのブログポストを参照してください。>注: デモでモデルを実行するには、ライセンス契約に同意する必要があります。>Hugging Face Hub の登録ユーザーである必要があります。HuggingFace モデルカードにアクセスし、利用規約をよく読み、同意ボタンをクリックしてください。以下のコードを実行するには、アクセストークンを使用する必要があります。アクセストークンの詳細については、ドキュメントのこのセクションを参照してください。次のコードを使用して、ノートブック環境の Hugging Face Hub にログインできます:

## 事前トレーニング済みモデルにアクセスするには、huggingfacehub にログインします 

from huggingface_hub import notebook_login, whoami 

try: 
    whoami() 
    print('Authorization token already provided') 
except OSError: 
    notebook_login()

モデルのダウンロード#

LLM をローカルで実行するには、最初のステップでモデルをダウンロードする必要があります。CLI からモデルを OpenVINO IR 形式にエクスポートし、ローカルフォルダーからモデルを読み込むことができます。

from pathlib import Path 

model_id = "mistralai/Mistral-7B-Instruct-v0.3" 
model_path = "Mistral-7B-Instruct-v0.3-ov-int4" 

if not Path(model_path).exists():
     !optimum-cli export openvino --model {model_id} --task text-generation-with-past --trust-remote-code --weight-format int4 {model_path}

LLM の推論デバイスを選択#

import openvino as ov 
import ipywidgets as widgets 

core = ov.Core() 

support_devices = core.available_devices 
if "NPU" in support_devices: 
    support_devices.remove("NPU") 

device = widgets.Dropdown( 
    options=support_devices + ["AUTO"], 
    value="CPU", 
    description="Device:", 
    disabled=False, 
) 

device

Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU')

OpenVINO モデルは、LangChain の HuggingFacePipeline クラスを通じてローカルで実行できます。OpenVINO を使用してモデルをデプロイするには、backend="openvino" パラメーターを指定し、バックエンド推論フレームワークとして OpenVINO をトリガーします。詳細はこちらをご覧ください。

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline 
from transformers.generation.stopping_criteria import StoppingCriteriaList, StoppingCriteria 

class StopSequenceCriteria(StoppingCriteria): 
    """ 
    This class can be used to stop generation whenever a sequence of tokens is encountered.

    Args: 
        stop_sequences (`str` or `List[str]`):
             The sequence (or list of sequences) on which to stop execution. 
        tokenizer: 
             The tokenizer used to decode the model outputs.
     """ 

    def __init__(self, stop_sequences, tokenizer): 
        if isinstance(stop_sequences, str): 
            stop_sequences = [stop_sequences] 
        self.stop_sequences = stop_sequences 
        self.tokenizer = tokenizer 

    def __call__(self, input_ids, scores, **kwargs) -> bool: 
        decoded_output = self.tokenizer.decode(input_ids.tolist()[0]) 
        return any(decoded_output.endswith(stop_sequence) for stop_sequence in self.stop_sequences) 

ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""} 
stop_tokens = ["Observation:"] 

ov_llm = HuggingFacePipeline.from_model_id( 
    model_id=model_path, 
    task="text-generation", 
    backend="openvino", 
    model_kwargs={ 
        "device": device.value, 
        "ov_config": ov_config, 
        "trust_remote_code": True, 
    }, 
    pipeline_kwargs={"max_new_tokens": 2048}, 
) 
ov_llm = ov_llm.bind(skip_prompt=True, stop=["Observation:"]) 

tokenizer = ov_llm.pipeline.tokenizer 
ov_llm.pipeline._forward_params["stopping_criteria"] = 
StoppingCriteriaList([StopSequenceCriteria(stop_tokens, tokenizer)])

2024-06-07 23:17:16.804739: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on.You may see slightly different numerical results due to floating-point round-off errors from different computation orders.To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-06-07 23:17:16.807973: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-07 23:17:16.850235: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 
2024-06-07 23:17:16.850258: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 
2024-06-07 23:17:16.850290: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 
2024-06-07 23:17:16.859334: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-07 23:17:17.692415: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers The argument trust_remote_code is to be used along with export=True. It will be ignored. Compiling the model to GPU ...

活性化の動的量子化と CPU 上の KV キャッシュ量子化により、推論速度がさらに向上します。これらのオプションは、次のように ov_config で有効にできます:

ov_config = { 
    "KV_CACHE_PRECISION": "u8", 
    "DYNAMIC_QUANTIZATION_GROUP_SIZE": "32", 
    "PERFORMANCE_HINT": "LATENCY", 
    "NUM_STREAMS": "1", 
    "CACHE_DIR": "", 
}

エージェントを作成#

ツール、プロンプト・テンプレートおよび LLM を定義したので、agent_executor を作成できます。

エージェント・エグゼキューターはエージェントのランタイムです。これは実際にエージェントを呼び出し、選択したアクションを実行し、アクション出力をエージェントに返して、これらを繰り返します。

from langchain.agents import AgentExecutor, StructuredChatAgent 

agent = StructuredChatAgent.from_llm_and_tools( 
    ov_llm, 
    tools, 
    prefix=PREFIX, 
    suffix=SUFFIX, 
    human_message_template=HUMAN_MESSAGE_TEMPLATE, 
    format_instructions=FORMAT_INSTRUCTIONS, 
) 
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

エージェントを実行#

これで、数学クエリーを使用してエージェントを実行できるようになりました。最終回答を得る前に、エージェント・エグゼキューターは推論とアクションの中間ステップも生成します。これらのメッセージの形式はプロンプト・テンプレートに従います。

agent_executor.invoke({"input": "Take 3 to the fifth power and multiply that by the sum of twelve and three, then square the whole result"})

> Entering new AgentExecutor chain... Thought: I can use the exponentiate and add tools to solve the first part, and then use the multiply tool for the second part, and finally the exponentiate tool again to square the result. 

Action: 
` 
{ 
  "action": "exponentiate", 
  "action_input": {"base": 3, "exponent": 5} 
} 
` 

Observation: Observation: 243 
Thought: Now I need to add twelve and three 

Action: 
` 
{ 
  "action": "add", 
  "action_input": {"first_int": 12, "second_int": 3} 
} 
` 

Observation: Observation: 15 
Thought: Now I need to multiply the result by 243 

Action: 
` 
{ 
  "action": "multiply", 
  "action_input": {"first_int": 243, "second_int": 15} 
} 
`

Observation: Observation: 3645 
Thought: Finally, I need to square the result 

Action: 
` 
{ 
  "action": "exponentiate", 
  "action_input": {"base": 3645, "exponent": 2} 
} 
` 

Observation: Observation: 13286025 
Thought: I know what to respond 

Action: 
` 
{ 
  "action": "Final Answer", 
  "action_input": "The final answer is 13286025" 
} 
` 

> Finished chain.

{'input': 'Take 3 to the fifth power and multiply that by the sum of twelve and three, then square the whole result', 'output': 'The final answer is 13286025'}

インタラクティブなデモ#

Gradio を使用してインタラクティブなエージェントを作成してみます。

ビルトインツールを使用#

LangChain は、すべてのビルトインツールのリストを提供しています。この例では、Wikipedia Python パッケージを使用して、エージェントで生成されたキーワードを照会します。

from langchain_community.tools import WikipediaQueryRun 
from langchain_community.utilities import WikipediaAPIWrapper 
from langchain_core.pydantic_v1 import BaseModel, Field 
from langchain_core.callbacks import CallbackManagerForToolRun 
from typing import Optional 

class WikipediaQueryRunWrapper(WikipediaQueryRun): 
    def _run( 
        self, 
        text: str, 
        run_manager: Optional[CallbackManagerForToolRun] = None, 
    ) -> str: 
        """Use the Wikipedia tool.""" 
        return self.api_wrapper.run(text) 

api_wrapper = WikipediaAPIWrapper(top_k_results=2, doc_content_chars_max=1000) 

class WikiInputs(BaseModel): 
    """inputs to the wikipedia tool.""" 

    text: str = Field(description="query to look up on wikipedia.") 

wikipedia = WikipediaQueryRunWrapper( 
    description="A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.", 
    args_schema=WikiInputs, 
    api_wrapper=api_wrapper, 
)

wikipedia.invoke({"text": "OpenVINO"})

'Page: OpenVINOnSummary: OpenVINO is an open-source software toolkit for optimizing and deploying deep learning models. It enables programmers to develop scalable and efficient AI solutions with relatively few lines of code. It supports several popular model formats and categories, such as large language models, computer vision, and generative AI.nActively developed by Intel, it prioritizes high-performance inference on Intel hardware but also supports ARM/ARM64 processors and encourages contributors to add new devices to the portfolio.nBased in C++, it offers the following APIs: C/C++, Python, and Node.js (an early preview).nOpenVINO is cross-platform and free for use under Apache License 2.0.nnPage: Stable DiffusionnSummary: Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is considered to be a part of the ongoing artificial intelligence boom.nIt is primarily used to generate detailed images conditioned on text descriptions, t'

カスタムツールを作成#

この例では、画像生成と天候照会用の 2 つのカスタムツールを作成します。

import urllib.parse 
import json5 

@tool 
def painting(prompt: str) -> str: 
    """ 
    AI painting (image generation) service, input text description, and return the image URL drawn based on text information.
    """ 
    prompt = urllib.parse.quote(prompt) 
    return json5.dumps({"image_url": f"https://image.pollinations.ai/prompt/{prompt}"}, ensure_ascii=False) 

painting.invoke({"prompt": "a cat"})

'{image_url: "https://image.pollinations.ai/prompt/a%20cat"}'

@tool 
def weather( 
    city_name: str, 
) -> str: 
    """ 
    Get the current weather for `city_name` 
    """ 

    if not isinstance(city_name, str): 
        raise TypeError("City name must be a string") 

    key_selection = { 
        "current_condition": [ 
            "temp_C", 
            "FeelsLikeC", 
            "humidity", 
            "weatherDesc", 
            "observation_time", 
        ], 
    } 
    import requests 

    resp = requests.get(f"https://wttr.in/{city_name}?format=j1") 
    resp.raise_for_status() 
    resp = resp.json() 
    ret = {k: {_v: resp[k][0][_v] for _v in v} for k, v in key_selection.items()} 

    return str(ret) 

weather.invoke({"city_name": "London"})

"{'current_condition': {'temp_C': '9', 'FeelsLikeC': '8', 'humidity': '93', 'weatherDesc': [{'value': 'Sunny'}], 'observation_time': '04:39 AM'}}"

Gradio UI を使用して AI エージェント・デモを作成#

tools = [wikipedia, painting, weather] 

agent = StructuredChatAgent.from_llm_and_tools( 
    ov_llm, 
    tools, 
    prefix=PREFIX, 
    suffix=SUFFIX, 
    human_message_template=HUMAN_MESSAGE_TEMPLATE, 
    format_instructions=FORMAT_INSTRUCTIONS, 
) 
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

import gradio as gr 

examples = [ 
    ["Based on current weather in London, show me a picture of Big Ben through its URL"], 
    ["What is OpenVINO ?"], 
    ["Create an image of pink cat and return its URL"], 
    ["How many people live in Canada ?"], 
    ["What is the weather like in New York now ?"], 
] 

def partial_text_processor(partial_text, new_text): 
    """ 
    helper for updating partially generated answer, used by default 

    Params: 
        partial_text: text buffer for storing previosly generated text 
        new_text: text update for the current step 
    Returns: 
        updated text string 

    """ 
    partial_text += new_text 
    return partial_text 

def user(message, history): 
    """ 
    callback function for updating user messages in interface on submit button click 

    Params: 
        message: current message 
        history: conversation history 
    Returns:         None 
    """ 
    # ユーザーのメッセージを会話履歴に追加 
    return "", history + [[message, ""]] 

def bot(history): 
    """ 
    callback function for running chatbot on submit button click 

    Params: 
        history: conversation history 

    """ 
    partial_text = "" 

    for new_text in agent_executor.stream( 
        {"input": history[-1][0]}, 
    ): 
        if "output" in new_text.keys(): 
            partial_text = partial_text_processor(partial_text, new_text["output"]) 
            history[-1][1] = partial_text 
            yield history 

def request_cancel(): 
    ov_llm.pipeline.model.request.cancel() 

with gr.Blocks( 
    theme=gr.themes.Soft(), 
    css=".disclaimer {font-variant-caps: all-small-caps;}", 
) as demo: 
    names = [tool.name for tool in tools] 
    gr.Markdown(f"""<h1><center>OpenVINO Agent for {str(names)}</center></h1>""") 
    chatbot = gr.Chatbot(height=500) 
    with gr.Row(): 
        with gr.Column(): 
            msg = gr.Textbox( 
                label="Chat Message Box", 
                placeholder="Chat Message Box", 
                show_label=False, 
                container=False, 
            ) 
        with gr.Column():
            with gr.Row(): 
                submit = gr.Button("Submit") 
                stop = gr.Button("Stop") 
                clear = gr.Button("Clear") 
    gr.Examples(examples, inputs=msg, label="Click on any example and press the 'Submit' button") 

submit_event = msg.submit( 
    fn=user, 
    inputs=[msg, chatbot], 
    outputs=[msg, chatbot], 
    queue=False,
 ).then( 
    fn=bot, 
    inputs=[ 
        chatbot, 
    ], 
    outputs=chatbot, 
    queue=True, 
) 
submit_click_event = submit.click( 
    fn=user, 
    inputs=[msg, chatbot], 
    outputs=[msg, chatbot], 
    queue=False, 
).then( 
    fn=bot, 
    inputs=[ 
        chatbot, 
    ], 
    outputs=chatbot, 
    queue=True, 
) 
stop.click( 
    fn=request_cancel, 
    inputs=None, 
    outputs=None, 
    cancels=[submit_event, submit_click_event], 
    queue=False, 
) 
clear.click(lambda: None, None, chatbot, queue=False) 

# リモートで起動する場合は、server_name と server_port を指定 
# demo.launch(server_name='your server name', server_port='server port in int') 
# プラットフォーム上で起動する際に問題がある場合は、起動メソッドに share=True を渡すことができます: 
# demo.launch(share=True) 
# インターフェイスの公開共有可能なリンクを作成。詳細はドキュメントをご覧ください: https://gradio.app/docs/ 
demo.launch()

> Entering new AgentExecutor chain... Thought: I need to use the weather tool to get the current weather in London, then use the painting tool to generate a picture of Big Ben based on the weather information. 

Action: 
` 
{ 
  "action": "weather", 
  "action_input": "London" 
} 
`
 
Observation: Observation: {'current_condition': {'temp_C': '9', 'FeelsLikeC': '8', 'humidity': '93', 'weatherDesc': [{'value': 'Sunny'}], 'observation_time': '04:39 AM'}} 
Thought: I have the current weather in London. Now I can use the painting tool to generate a picture of Big Ben based on the weather information. 

Action: 
` 
{ 
  "action": "painting", 
  "action_input": "Big Ben, sunny day" 
} 
`
 
Observation: Observation: {image_url: "https://image.pollinations.ai/prompt/Big%20Ben%2C%20sunny%20day"} 
Thought: I have the image URL of Big Ben on a sunny day. Now I can respond to the human with the image URL. 

Action: 
` 
{ 
  "action": "Final Answer", 
  "action_input": "Here is the image of Big Ben on a sunny day: 
https://image.pollinations.ai/prompt/Big%20Ben%2C%20sunny%20day" 
} 
` 
Observation: > Finished chain.

# gradio インターフェイスを停止するにはこのセルを実行 
demo.close()