NNCF PTQ API を使用した Wav2Vec 音声認識モデルの量子化#

この Jupyter ノートブックはオンラインで起動でき、ブラウザーのウィンドウで対話型環境を開きます。ローカルにインストールすることもできます。次のオプションのいずれかを選択します:

Google ColabGitHub

このチュートリアルでは、トレーニング後のモード (微調整パイプラインを使用しない) で NNCF (ニューラル・ネットワーク圧縮フレームワーク) 8 ビット量子化を使用し、Wav2Vec2 として知られる音声認識モデルに INT8 量子化を適用する方法を示します。このノートブックは、LibriSpeech ASR corpus でトレーニングされ、微調整された Wav2Vec2-Base-960h PyTorch モデルを使用します。チュートリアルは、カスタムモデルとデータセットに拡張できるように設計されています。これは次の手順で構成されます:

  • Wav2Vec2 モデルと LibriSpeech データセットをダウンロードして準備します。

  • データの読み込みと精度検証の機能を定義します。

  • モデルの量子化。

  • 元の PyTorch モデル、OpenVINO FP16 および INT8 モデルの精度を比較します。

  • 元のモデルと量子化されたモデルのパフォーマンスを比較します。

目次:

%pip install -q "openvino>=2023.3.0" "nncf>=2.7" 
%pip install datasets "torchmetrics>=0.11.0" "torch>=2.1.0" --extra-index-url https://download.pytorch.org/whl/cpu 
%pip install -q soundfile librosa "transformers>=4.36.2" --extra-index-url https://download.pytorch.org/whl/cpu
Note: you may need to restart the kernel to use updated packages. 
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu 
Requirement already satisfied: datasets in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (2.20.0) 
Requirement already satisfied: torchmetrics>=0.11.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (1.4.0.post0) 
Requirement already satisfied: torch>=2.1.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (2.3.1+cpu) 
Requirement already satisfied: filelock in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (3.15.4) 
Requirement already satisfied: numpy>=1.17 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (1.23.5) 
Requirement already satisfied: pyarrow>=15.0.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (16.1.0) 
Requirement already satisfied: pyarrow-hotfix in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (0.6) 
Requirement already satisfied: dill<0.3.9,>=0.3.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (0.3.8) 
Requirement already satisfied: pandas in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (2.0.3) 
Requirement already satisfied: requests>=2.32.2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (2.32.3) 
Requirement already satisfied: tqdm>=4.66.3 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (4.66.4) 
Requirement already satisfied: xxhash in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (3.4.1) 
Requirement already satisfied: multiprocess in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (0.70.16) 
Requirement already satisfied: fsspec<=2024.5.0,>=2023.1.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from fsspec[http]<=2024.5.0,>=2023.1.0->datasets) (2024.5.0) 
Requirement already satisfied: aiohttp in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (3.9.5) 
Requirement already satisfied: huggingface-hub>=0.21.2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (0.23.4) 
Requirement already satisfied: packaging in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (24.1) 
Requirement already satisfied: pyyaml>=5.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from datasets) (6.0.1) 
Requirement already satisfied: lightning-utilities>=0.8.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torchmetrics>=0.11.0) (0.11.3.post0) 
Requirement already satisfied: typing-extensions in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torchmetrics>=0.11.0) (4.12.2) 
Requirement already satisfied: sympy in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch>=2.1.0) (1.13.0) 
Requirement already satisfied: networkx in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch>=2.1.0) (3.1) 
Requirement already satisfied: jinja2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch>=2.1.0) (3.1.4) 
Requirement already satisfied: aiosignal>=1.1.2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp->datasets) (1.3.1) 
Requirement already satisfied: attrs>=17.3.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp->datasets) (23.2.0) 
Requirement already satisfied: frozenlist>=1.1.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp->datasets) (1.4.1) 
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp->datasets) (6.0.5) 
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp->datasets) (1.9.4) 
Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp->datasets) (4.0.3) 
Requirement already satisfied: setuptools in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from lightning-utilities>=0.8.0->torchmetrics>=0.11.0) (70.3.0) 
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests>=2.32.2->datasets) (3.3.2) 
Requirement already satisfied: idna<4,>=2.5 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests>=2.32.2->datasets) (3.7) 
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests>=2.32.2->datasets) (2.2.2) 
Requirement already satisfied: certifi>=2017.4.17 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests>=2.32.2->datasets) (2024.7.4) 
Requirement already satisfied: MarkupSafe>=2.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from jinja2->torch>=2.1.0) (2.1.5) 
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pandas->datasets) (2.9.0.post0) 
Requirement already satisfied: pytz>=2020.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pandas->datasets) (2024.1) 
Requirement already satisfied: tzdata>=2022.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pandas->datasets) (2024.1) 
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from sympy->torch>=2.1.0) (1.3.0) 
Requirement already satisfied: six>=1.5 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.16.0) 
Note: you may need to restart the kernel to use updated packages. Note: you may need to restart the kernel to use updated packages.

インポート#

import numpy as np 
import openvino as ov 
import torch 
import IPython.display as ipd 

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

設定#

from pathlib import Path 

# データとモデルのディレクトリー、モデルソース URL、およびモデルのファイル名を設定
MODEL_DIR = Path("model") 
MODEL_DIR.mkdir(exist_ok=True)

モデルの準備#

以下を実行します: - 事前トレーニングされた Wav2Vec2 モデルをダウンロードしてアンパックします。- モデル・トランスフォーメーション API を実行して、モデルを PyTorch 表現から OpenVINO 中間表現 (OpenVINO IR) に変換します。

torch_model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h", ctc_loss_reduction="mean") 
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0.Downloads always resume when possible.If you want to force a new download, use force_download=True. 
  warnings.warn( 
Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v'] 
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). 
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed'] 
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
BATCH_SIZE = 1 
MAX_SEQ_LENGTH = 30480 
ov_model = ov.convert_model(torch_model, example_input=torch.zeros([1, MAX_SEQ_LENGTH], dtype=torch.float)) 

ir_model_path = MODEL_DIR / "wav2vec2_base.xml" 
ov.save_model(ov_model, ir_model_path)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/modeling_utils.py:4371: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0.Please use model.hf_quantizer.is_trainable instead 
  warnings.warn( 
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:588: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect.We can't record the data flow of Python values, so this value will be treated as a constant in the future.This means that the trace might not generalize to other inputs! 
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len): /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:627: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect.We can't record the data flow of Python values, so this value will be treated as a constant in the future.This means that the trace might not generalize to other inputs! 
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
['input_values']

LibriSpeech データセットの準備#

デモでモデルの評価をスピードアップするため、LibriSpeech データセットの短いダミーバージョン (patrickvonplaten/librispeech_asr_dummy) を使用します。モデルの精度は論文の報告と異なる場合があります。元の精度を再現するには、librispeech_asr データセットを使用します。

from datasets import load_dataset 

dataset = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", 
split="validation", trust_remote_code=True) 
test_sample = dataset[0]["audio"] 

# オーディオをモデルの入力値に変換する前処理関数を定義 
def map_to_input(batch): 
    preprocessed_signal = processor( 
        batch["audio"]["array"], 
        return_tensors="pt", 
        padding="longest", 
        sampling_rate=batch["audio"]["sampling_rate"], 
    ) 
    input_values = preprocessed_signal.input_values 
    batch["input_values"] = input_values 
    return batch 

# データセットに前処理関数を適用し、不要になったオーディオカラムを削除してメモリーを節約 
dataset = dataset.map(map_to_input, batched=False, remove_columns=["audio"])

量子化の実行#

NNCF は、精度の低下を最小限に抑えながら、OpenVINO でニューラル・ネットワーク推論を最適化する一連の高度なアルゴリズムを提供します。

事前トレーニングされた FP16 モデルとキャリブレーション・データセットから量子化モデルを作成します。最適化プロセスには次の手順が含まれます:

  1. 量子化用のデータセットを作成します。

  2. nncf.quantize を実行して、最適化されたモデルを取得します。nncf.quantize 関数は、モデル量子化のインターフェイスを提供します。OpenVINO モデルのインスタンスと量子化データセットが必要です。オプションで、量子化プロセスの追加パラメーター (量子化のサンプル数、プリセット、無視される範囲など) を提供できます。より正確な結果を得るには、ignored_scope パラメーターを使用して、後処理サブグラフの操作を浮動小数点精度に保つ必要があります。詳細については、量子化パラメーターの調整を参照してください。このモデルでは、精度制御による量子化の結果に基づいて、無視されるスコープが実験的に選択されました。仕組みを理解するには、次のノートブックを確認してください。

  3. ov.save_model 関数を使用して OpenVINO IR モデルをシリアル化します。

import nncf 
from nncf.parameters import ModelType 

def transform_fn(data_item): 
    """ 
    Extract the model's input from the data item. 
    The data item here is the data item that is returned from the data source per iteration. 
    This function should be passed when the data item cannot be used as model's input.
    """ 
    return np.array(data_item["input_values"]) 

calibration_dataset = nncf.Dataset(dataset, transform_fn) 

quantized_model = nncf.quantize( 
    ov_model, 
    calibration_dataset, 
    model_type=ModelType.TRANSFORMER, # モデルに追加のトランスパターンを指定 
    ignored_scope=nncf.IgnoredScope( 
        names=[ 
            "__module.wav2vec2.feature_extractor.conv_layers.1.conv/aten::_convolution/Convolution", 
            "__module.wav2vec2.feature_extractor.conv_layers.2.conv/aten::_convolution/Convolution", 
            "__module.wav2vec2.feature_extractor.conv_layers.3.conv/aten::_convolution/Convolution", 
            "__module.wav2vec2.feature_extractor.conv_layers.0.conv/aten::_convolution/Convolution",
        ], 
    ), 
)
INFO:nncf:NNCF initialized successfully.Supported frameworks detected: torch, tensorflow, onnx, openvino
2024-07-13 03:28:09.157115: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on.You may see slightly different numerical results due to floating-point round-off errors from different computation orders.
To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-07-13 03:28:09.189633: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-13 03:28:09.765303: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Output()
Output()
INFO:nncf:4 ignored nodes were found by names in the NNCFGraph 
INFO:nncf:36 ignored nodes were found by names in the NNCFGraph 
INFO:nncf:50 ignored nodes were found by names in the NNCFGraph 
INFO:nncf:Not adding activation input quantizer for operation: 2 
__module.wav2vec2.feature_extractor.conv_layers.0.conv/aten::_convolution/Convolution 
INFO:nncf:Not adding activation input quantizer for operation: 134 
__module.wav2vec2.feature_extractor.conv_layers.1.conv/aten::_convolution/Convolution 
INFO:nncf:Not adding activation input quantizer for operation: 160 
__module.wav2vec2.feature_extractor.conv_layers.2.conv/aten::_convolution/Convolution 
INFO:nncf:Not adding activation input quantizer for operation: 186 
__module.wav2vec2.feature_extractor.conv_layers.3.conv/aten::_convolution/Convolution
Output()
Output()
MODEL_NAME = "quantized_wav2vec2_base" 
quantized_model_path = Path(f"{MODEL_NAME}_openvino_model/{MODEL_NAME}_quantized.xml") 
ov.save_model(quantized_model, quantized_model_path)

推論パイプラインを使用したモデルの使用例#

初期 (FP16) モデルと量子化 (INT8) モデルはどちらも使い方は全く同じです。

最初にデータセットから 1 つの例を取り出して、その推論手順を示します。

ipd.Audio(test_sample["array"], rate=16000)

OpenVINO でモデルを実行するには、推論デバイスを選択する必要があります。ドロップダウン・リストから利用可能なデバイスを 1 つ選択します:

import ipywidgets as widgets 

core = ov.Core() 
device = widgets.Dropdown( 
    options=core.available_devices + ["AUTO"], 
    value="AUTO", 
    description="Device:", 
    disabled=False, 
) 

device
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

次に、量子化モデルを推論パイプラインにロードします。

compiled_model = core.compile_model(model=quantized_model, device_name=device.value) 

input_data = np.expand_dims(test_sample["array"], axis=0)

次に、推測を行います。

predictions = compiled_model(input_data)[0] 
predicted_ids = np.argmax(predictions, axis=-1) 
transcription = processor.batch_decode(torch.from_numpy(predicted_ids)) 
print(transcription)
['A MAN SAID TO THE UNIVERSE SIR I EXIST']

データセット上のモデルの精度を検証#

モデルの精度評価には、Word Error Rate メトリックを使用できます。Word Error Rate (WER) は、発話された総単語数に対するトランスクリプト内のエラーの割合です。音声テキスト変換において WER が低いことは、音声認識の精度が高いことを意味します。

WER の計算には torchmetrics ライブラリーを使用します。

from torchmetrics import WordErrorRate 
from tqdm.notebook import tqdm 

# PyTorch の推論関数 
def torch_infer(model, sample): 
    logits = model(torch.Tensor(sample["input_values"])).logits 
    # argmax を取得してデコード 
    predicted_ids = torch.argmax(logits, dim=-1) 
    transcription = processor.batch_decode(predicted_ids) 
    return transcription 

# openvino の推論関数 
def ov_infer(model, sample): 
    logits = model(np.array(sample["input_values"]))[0] 
    predicted_ids = np.argmax(logits, axis=-1) 
    transcription = processor.batch_decode(torch.from_numpy(predicted_ids)) 
    return transcription 

def compute_wer(dataset, model, infer_fn): 
    wer = WordErrorRate() 
    for sample in tqdm(dataset):
        # サンプルで推論関数を実行 
        transcription = infer_fn(model, sample) 
        # サンプル結果のメトリックを更新 
        wer.update(transcription, [sample["text"]]) 
    # メトリック計算を確定 
    result = wer.compute() 
    return result

ここでは、トークナイザー decode_logits を使用して、予測確率をテキストにデコードするだけです。

あるいは、transformers パッケージのビルトイン Wav2Vec2Processor トークナイザーを使用します。

ここで、元の PyTorch モデル、OpenVINO IR モデル、および量子化モデルの WER を計算します。

compiled_fp32_ov_model = core.compile_model(ov_model, device.value) 

pt_result = compute_wer(dataset, torch_model, torch_infer) 
ov_result = compute_wer(dataset, compiled_fp32_ov_model, ov_infer) 
int8_ov_result = compute_wer(dataset, compiled_model, ov_infer) 
print(f"[PyTorch] Word Error Rate: {pt_result:.4f}") 
print(f"[OpenVino FP16] Word Error Rate: {ov_result:.4}") 
print(f"[OpenVino INT8] Word Error Rate: {int8_ov_result:.4f}")
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-727/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:62: FutureWarning: Importing WordErrorRate from torchmetrics was deprecated and will be removed in 2.0.  Import WordErrorRate from torchmetrics.text instead.
   _future_warning(
0%|          | 0/73 [00:00<?, ?it/s]
0%|          | 0/73 [00:00<?, ?it/s]
0%|          | 0/73 [00:00<?, ?it/s]
[PyTorch] Word Error Rate: 0.0530 
[OpenVino FP16] Word Error Rate: 0.05304 
[OpenVino INT8] Word Error Rate: 0.0548

元のモデルと量子化モデルのパフォーマンスを比較#

最後に、Benchmark ツールを使用して、FP16INT8 モデルの推論パフォーマンスを測定します。

: より正確なパフォーマンスを得るには、他のアプリケーションを閉じて、ターミナル/コマンドプロンプトで benchmark_app を実行することを推奨します。benchmark_app -m model.xml -d CPU を実行して、CPU で非同期推論のベンチマークを 1 分間実行します。GPU でベンチマークを行うには、CPUGPU に変更します。benchmark_app --help を実行すると、すべてのコマンドライン・オプションの概要が表示されます。

# 推論 FP16 モデル (OpenVINO IR) 
! benchmark_app -m $ir_model_path -shape [1,30480] -d $device.value -api async
[Step 1/11] Parsing and validating input arguments 
[ INFO ] Parsing input parameters 
[Step 2/11] Loading OpenVINO Runtime 
[ WARNING ] Default duration 120 seconds is used for unknown device AUTO 
[ INFO ] OpenVINO: 
[ INFO ] Build .................................2024.4.0-16028-fe423b97163 
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] AUTO 
[ INFO ] Build .................................2024.4.0-16028-fe423b97163 
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration 
[ WARNING ] Performance hint was not explicitly specified in command line.Device(AUTO) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files 
[ INFO ] Loading model files 
[ INFO ] Read model took 19.41 ms 
[ INFO ] Original model I/O parameters: 
[ INFO ] Model inputs: 
[ INFO ] 46 , input_values (node: input_values) : f32 / [...] / [?,?]
[ INFO ] Model outputs: 
[ INFO ]     logits (node: __module.lm_head/aten::linear/Add) : f32 / [...] / [?,?,32] 
[Step 5/11] Resizing model to match image sizes and given batch 
[ INFO ] Model batch size: 1 
[ INFO ] Reshaping model: '46': [1,30480] 
[ INFO ] Reshape model took 5.86 ms 
[Step 6/11] Configuring input of the model 
[ INFO ] Model inputs: 
[ INFO ] 46 , input_values (node: input_values) : f32 / [...] / [1,30480] 
[ INFO ] Model outputs: 
[ INFO ]     logits (node: __module.lm_head/aten::linear/Add) : f32 / [...]/ [1,95,32] 
[Step 7/11] Loading the model to the device 
[ INFO ] Compile model took 514.50 ms 
[Step 8/11] Querying optimal runtime parameters 
[ INFO ] Model: 
[ INFO ]     NETWORK_NAME: Model0 
[ INFO ]     EXECUTION_DEVICES: ['CPU'] 
[ INFO ]     PERFORMANCE_HINT: PerformanceMode.THROUGHPUT 
[ INFO ]     OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6 
[ INFO ]     MULTI_DEVICE_PRIORITIES: CPU 
[ INFO ]     CPU: 
[ INFO ]       AFFINITY: Affinity.CORE 
[ INFO ]       CPU_DENORMALS_OPTIMIZATION: False 
[ INFO ]       CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0 
[ INFO ]       DYNAMIC_QUANTIZATION_GROUP_SIZE: 32 
[ INFO ]       ENABLE_CPU_PINNING: True 
[ INFO ]       ENABLE_HYPER_THREADING: True 
[ INFO ]       EXECUTION_DEVICES: ['CPU'] 
[ INFO ]       EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE 
[ INFO ]       INFERENCE_NUM_THREADS: 24 
[ INFO ]       INFERENCE_PRECISION_HINT: <Type: 'float32'> 
[ INFO ]       KV_CACHE_PRECISION: <Type: 'float16'> 
[ INFO ]       LOG_LEVEL: Level.NO 
[ INFO ]       MODEL_DISTRIBUTION_POLICY: set() 
[ INFO ]       NETWORK_NAME: Model0 
[ INFO ]       NUM_STREAMS: 6 
[ INFO ]       OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6 
[ INFO ]       PERFORMANCE_HINT: THROUGHPUT 
[ INFO ]       PERFORMANCE_HINT_NUM_REQUESTS: 0 
[ INFO ]       PERF_COUNT: NO 
[ INFO ]       SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE 
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM 
[ INFO ] LOADED_FROM_CACHE: False 
[ INFO ] PERF_COUNT: False 
[Step 9/11] Creating infer requests and preparing input tensors 
[ WARNING ] No input files were given for input '46'!. This input will be filled with random values! 
[ INFO ] Fill input '46' with random values 
[Step 10/11] Measuring performance (Start inference asynchronously, 6 inference requests, limits: 120000 ms duration) 
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 72.27 ms 
[Step 11/11] Dumping statistics report 
[ INFO ] Execution Devices:['CPU'] 
[ INFO ] Count: 5466 iterations 
[ INFO ] Duration: 120184.09 ms 
[ INFO ]     Latency: 
[ INFO ]     Median: 131.47 ms 
[ INFO ]     Average: 131.76 ms 
[ INFO ]     Min: 62.30 ms 
[ INFO ]     Max: 297.95 ms 
[ INFO ] Throughput: 45.48 FPS
# 推論 INT8 モデル (OpenVINO IR) 
! benchmark_app -m $quantized_model_path -shape [1,30480] -d $device.value -api async
[Step 1/11] Parsing and validating input arguments 
[ INFO ] Parsing input parameters 
[Step 2/11] Loading OpenVINO Runtime 
[ WARNING ] Default duration 120 seconds is used for unknown device AUTO 
[ INFO ] OpenVINO: 
[ INFO ] Build .................................2024.4.0-16028-fe423b97163 
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] AUTO 
[ INFO ] Build .................................2024.4.0-16028-fe423b97163 
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration 
[ WARNING ] Performance hint was not explicitly specified in command line.Device(AUTO) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files 
[ INFO ] Loading model files 
[ INFO ] Read model took 25.29 ms 
[ INFO ] Original model I/O parameters: 
[ INFO ] Model inputs: 
[ INFO ] 46 , input_values (node: input_values) : f32 / [...]/ [?,?]
[ INFO ] Model outputs: 
[ INFO ]     logits (node: __module.lm_head/aten::linear/Add) : f32 / [...]/ [?,?,32] 
[Step 5/11] Resizing model to match image sizes and given batch 
[ INFO ] Model batch size: 1 
[ INFO ] Reshaping model: '46': [1,30480] 
[ INFO ] Reshape model took 7.64 ms 
[Step 6/11] Configuring input of the model 
[ INFO ] Model inputs: 
[ INFO ] 46 , input_values (node: input_values) : f32 / [...]/ [1,30480] 
[ INFO ] Model outputs: 
[ INFO ]     logits (node: __module.lm_head/aten::linear/Add) : f32 / [...]/ [1,95,32] 
[Step 7/11] Loading the model to the device 
[ INFO ] Compile model took 1086.09 ms 
[Step 8/11] Querying optimal runtime parameters 
[ INFO ] Model: 
[ INFO ]     NETWORK_NAME: Model0 
[ INFO ]     EXECUTION_DEVICES: ['CPU'] 
[ INFO ]     PERFORMANCE_HINT: PerformanceMode.THROUGHPUT 
[ INFO ]     OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6 
[ INFO ]     MULTI_DEVICE_PRIORITIES: CPU 
[ INFO ]     CPU:
[ INFO ]       AFFINITY: Affinity.CORE 
[ INFO ]       CPU_DENORMALS_OPTIMIZATION: False 
[ INFO ]       CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0 
[ INFO ]       DYNAMIC_QUANTIZATION_GROUP_SIZE: 32 
[ INFO ]       ENABLE_CPU_PINNING: True 
[ INFO ]       ENABLE_HYPER_THREADING: True 
[ INFO ]       EXECUTION_DEVICES: ['CPU'] 
[ INFO ]       EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE 
[ INFO ]       INFERENCE_NUM_THREADS: 24 
[ INFO ]       INFERENCE_PRECISION_HINT: <Type: 'float32'> 
[ INFO ]       KV_CACHE_PRECISION: <Type: 'float16'> 
[ INFO ]       LOG_LEVEL: Level.NO 
[ INFO ]       MODEL_DISTRIBUTION_POLICY: set() 
[ INFO ]       NETWORK_NAME: Model0 
[ INFO ]       NUM_STREAMS: 6 
[ INFO ]       OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6 
[ INFO ]       PERFORMANCE_HINT: THROUGHPUT 
[ INFO ]       PERFORMANCE_HINT_NUM_REQUESTS: 0 
[ INFO ]       PERF_COUNT: NO 
[ INFO ]       SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE 
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM 
[ INFO ] LOADED_FROM_CACHE: False 
[ INFO ] PERF_COUNT: False 
[Step 9/11] Creating infer requests and preparing input tensors 
[ WARNING ] No input files were given for input '46'!.This input will be filled with random values! 
[ INFO ] Fill input '46' with random values 
[Step 10/11] Measuring performance (Start inference asynchronously, 6 inference requests, limits: 120000 ms duration) 
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 54.98 ms 
[Step 11/11] Dumping statistics report 
[ INFO ] Execution Devices:['CPU'] 
[ INFO ] Count: 8232 iterations 
[ INFO ] Duration: 120070.65 ms 
[ INFO ] Latency: 
[ INFO ]     Median: 87.27 ms 
[ INFO ]     Average: 87.36 ms 
[ INFO ]     Min: 65.95 ms 
[ INFO ]     Max: 107.03 ms 
[ INFO ] Throughput: 68.56 FPS