OpenVINO™ による感情分析

この Jupyter ノートブックはオンラインで起動でき、ブラウザーのウィンドウで対話型環境を開きます。ローカルにインストールすることもできます。次のオプションのいずれかを選択します。

Binder Google Colab GitHub

感情分析とは、自然言語処理、テキスト分析、計算言語学、および生体認証を使用して、感情の状態と主観的な情報を体系的に識別、抽出、定量化、および学習することです。このノートブックでは、OpenVINO を使用してシーケンス分類モデルを変換して実行する方法を説明します。

目次

インポート

%pip install "openvino>=2023.1.0" transformers --extra-index-url https://download.pytorch.org/whl/cpu
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu
Requirement already satisfied: openvino>=2023.1.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (2023.3.0)
Requirement already satisfied: transformers in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (4.37.2)
Requirement already satisfied: numpy>=1.16.6 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from openvino>=2023.1.0) (1.23.5)
Requirement already satisfied: openvino-telemetry>=2023.2.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from openvino>=2023.1.0) (2023.2.1)
Requirement already satisfied: filelock in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (3.13.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (0.20.3)
Requirement already satisfied: packaging>=20.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (23.2)
Requirement already satisfied: pyyaml>=5.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (2023.12.25)
Requirement already satisfied: requests in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (2.31.0)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (0.15.1)
Requirement already satisfied: safetensors>=0.4.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (0.4.2)
Requirement already satisfied: tqdm>=4.27 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from transformers) (4.66.1)
Requirement already satisfied: fsspec>=2023.5.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers) (2023.10.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers) (4.9.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->transformers) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->transformers) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->transformers) (2.2.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->transformers) (2024.2.2)
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Note: you may need to restart the kernel to use updated packages.
import warnings
from pathlib import Path
import time
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import numpy as np
import openvino as ov

モデルの初期化

Hugging Face のトランスベースの DistilBERT ベースのケースなしファインチューニング SST-2 モデルを使用します。

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(
                                        pretrained_model_name_or_path=checkpoint
)

トークナイザーの初期化

テキスト前処理は、テキストベースの入力データをクリーンアップして、モデルに入力できるようにします。トークン化により、段落と文がより小さな単位に分割され、意味をより簡単に割り当てることができます。これには、データのクリーニングと単語へのトークンまたは ID の割り当てが含まれます。これにより、類似した単語が類似したベクトルを持つベクトル空間で単語が表現されます。これは、モデルが文のコンテキストを理解するのに役立ちます。ここでは AutoTokenizer を使用します。
- Hugging Face の事前トレーニング済みトークナイザー:

tokenizer = AutoTokenizer.from_pretrained(
                                        pretrained_model_name_or_path=checkpoint
)

モデルを OpenVINO 中間表現形式に変換

モデル変換 API は、トレーニング環境とデプロイ環境の間の移行を容易にし、静的モデル分析を実行して、エンドポイント・ターゲット・デバイス上で最適に実行されるようにディープラーニング・モデルを調整します。

import torch

ir_xml_name = checkpoint + ".xml"
MODEL_DIR = "model/"
ir_xml_path = Path(MODEL_DIR) / ir_xml_name

MAX_SEQ_LENGTH = 128
input_info = [(ov.PartialShape([1, -1]), ov.Type.i64), (ov.PartialShape([1, -1]), ov.Type.i64)]
default_input = torch.ones(1, MAX_SEQ_LENGTH, dtype=torch.int64)
inputs = {
                                        "input_ids": default_input,
                                        "attention_mask": default_input,
}

ov_model = ov.convert_model(model, input=input_info, example_input=inputs)
ov.save_model(ov_model, ir_xml_path)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py:246: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
                                        mask, torch.tensor(torch.finfo(scores.dtype).min)

OpenVINO™ ランタイムは、非同期または同期により各種デバイス上でモデルを実行できるように推論要求メカニズムを使用します。モデルグラフは引数として OpenVINO API に送信され、推論要求が作成されます。デフォルトの推論モードは AUTO ですが、要件と利用可能なハードウェアに応じて変更できます。ドキュメントでさまざまな推論モードとその使用法を調査できます。

core = ov.Core()

推論デバイスの選択

OpenVINO を使用して推論を実行するためにドロップダウン・リストからデバイスを選択します

import ipywidgets as widgets

device = widgets.Dropdown(
                                            options=core.available_devices + ["AUTO"],
                                            value='AUTO',
                                            description='Device:',
                                            disabled=False,
)

device
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')
warnings.filterwarnings("ignore")
compiled_model = core.compile_model(ov_model, device.value)
infer_request = compiled_model.create_infer_request()
def softmax(x):
    """
    Defining a softmax function to extract
    the prediction from the output of the IR format
    Parameters: Logits array
    Returns: Probabilities
    """

                                            e_x = np.exp(x - np.max(x))
                                            return e_x / e_x.sum()

推論

def infer(input_text):
    """
    Creating a generic inference function
    to read the input and infer the result
    into 2 classes: Positive or Negative.
    Parameters: Text to be processed
    Returns: Label: Positive or Negative.
    """

                                        input_text = tokenizer(
                                        input_text,
                                        truncation=True,
                                        return_tensors="np",
                                        )
                                        inputs = dict(input_text)
                                        label = {0: "NEGATIVE", 1: "POSITIVE"}
                                        result = infer_request.infer(inputs=inputs)
                                        for i in result.values():
                                        probability = np.argmax(softmax(i))
                                        return label[probability]

単一の入力文の場合

input_text = "I had a wonderful day"
start_time = time.perf_counter()
result = infer(input_text)
end_time = time.perf_counter()
total_time = end_time - start_time
print("Label: ", result)
print("Total Time: ", "%.2f" % total_time, " seconds")
Label:  POSITIVE
Total Time:  0.02  seconds

テキストファイルから読み取る

# Fetch `notebook_utils` module
import urllib.request
urllib.request.urlretrieve(
                                            url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
                                            filename='notebook_utils.py'
)
from notebook_utils import download_file

# Download the text from the openvino_notebooks storage
vocab_file_path = download_file(
                                            "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/text/food_reviews.txt",
                                            directory="data"
)
data/food_reviews.txt:   0%|          | 0.00/71.0 [00:00<?, ?B/s]
start_time = time.perf_counter()
with vocab_file_path.open(mode='r') as f:
                                            input_text = f.readlines()
                                            for lines in input_text:
                                            print("User Input: ", lines)
                                            result = infer(lines)
                                            print("Label: ", result, "\n")
end_time = time.perf_counter()
total_time = end_time - start_time
print("Total Time: ", "%.2f" % total_time, " seconds")
User Input:  The food was horrible.

Label:  NEGATIVE

User Input:  We went because the restaurant had good reviews.
Label:  POSITIVE

Total Time:  0.03  seconds