NNCF を使用した ImageBind モデルのトレーニング後の量子化

この Jupyter ノートブックは、ローカルへのインストール後にのみ起動できます。

GitHub

このチュートリアルは、NNCF (Neural Network Compression Framework) から 8 ビットのトレーニング後の量子化を適用してモデルを高速化し、OpenVINO™ ツールキットを介して量子化されたモデルを推論する方法を示します。

最適化プロセスには次の手順が含まれます。

  1. ノートブックから変換された OpenVINO モデルを NNCF で量子化します。
  2. 入力データの例について、変換されたモデルと量子化されたモデルの間で確率行列を比較します。
  3. 変換および量子化されたモデルのモデルサイズを比較します。
  4. 変換および量子化されたモデルのパフォーマンスを比較します。

注: 最初に 239-image-bind-convert ノートブックを実行して、量子化に使用される OpenVINO IR モデルを生成する必要があります。

目次

必要条件

%pip install -q datasets librosa soundfile "openvino>=2023.1.0" "nncf"
DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
from pathlib import Path

repo_dir = Path("ImageBind")

if not repo_dir.exists():
    raise RuntimeError('This notebook should be run after 239-image-bind-convert notebook')

%cd {repo_dir}
/home/ea/work/openvino_notebooks/notebooks/239-image-bind/ImageBind

量子化の作成と初期化

NNCF は、モデルグラフに量子化レイヤーを追加し、トレーニング・データセットのサブセットを使用してこれらの追加の量子化レイヤーのパラメーターを初期化することで、トレーニング後の量子化を可能にします。このフレームワークは、元のトレーニング・コードへの変更が最小限になるように設計されています。量子化は最も単純なシナリオであり、いくつかの変更が必要です。

最適化プロセスには次の手順が含まれます。

  1. 量子化用のデータセットを作成します。
  2. nncf.quantize を実行して、量子化されたモデルを取得します。
  3. openvino.save_model 関数を使用して INT8 モデルをシリアル化します。
from pathlib import Path
from imagebind.models.imagebind_model import ModalityType

modalities = [ModalityType.TEXT, ModalityType.VISION, ModalityType.AUDIO]
fp_model_paths = {modality: Path(f"image-bind-{modality}") / f"image-bind-{modality}.xml" for modality in modalities}
int8_model_paths = {modality: Path(f"image-bind-{modality}") / f"image-bind-{modality}_int8.xml" for modality in modalities}
/home/ea/work/ov_venv/lib/python3.8/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
/home/ea/work/ov_venv/lib/python3.8/site-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead.
  warnings.warn(
/home/ea/work/ov_venv/lib/python3.8/site-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead.
  warnings.warn(

データセットの準備

Conceptual Captions データセットは、キャプションの注釈が付けられた約 330 万枚の画像で構成されています。データセットは、画像およびテキストモデルを量子化するために使用されます。

import imagebind.data as data
import os
import requests
import tempfile

from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

def check_text_data(data):
    """
    Check if the given data is text-based.
    """
    if isinstance(data, str):
        return True
    if isinstance(data, list):
        return all(isinstance(x, str) for x in data)
    return False

def collate_fn(examples, image_column="image_url", text_column="caption"):
    """
    Collates examples into a batch for processing.
    Preprocesses each example by loading and transforming image and text data.
    Checks if the text data in the example is valid by calling the `check_text_data` function.
    Downloads the image specified by the URL in the image_column of the example dictionary.
    Constructs and returns a dictionary representing the collated batch with the following keys:
       - "pixel_values": The pixel values of the preprocessed example.
       - "input_ids": The transformed text data of the preprocessed example.
    """
    assert len(examples) == 1
    example = examples[0]
    if not check_text_data(example[text_column]):
        raise ValueError("Text data is not valid")

    url = example[image_column]
    with tempfile.TemporaryDirectory() as tempdir:
        f_name = os.path.join(tempdir, 'image.jpg')
        try:
            response = requests.get(url, verify=False, timeout=20)
            with open(f_name, "wb") as file:
                file.write(response.content)
            pixel_values = data.load_and_transform_vision_data([f_name], "cpu")
        except Exception:
            print(f"Can't load image from url: {url}")
            return None

    text = data.load_and_transform_text([example[text_column]], "cpu")

    return {
        "pixel_values": pixel_values,
        "input_ids": text
    }
from datasets import load_dataset
import itertools
import torch
from tqdm.notebook import tqdm

def collect_vision_text_data(dataloader, init_steps):
    """
    This function collects vision and text data from a dataloader for a specified number of initialization steps.
    It iterates over the dataloader, fetching batches and storing the relevant vision and text data.
    Returns a tuple containing the collected vision_data and text_data lists.
    """
    text_data = []
    vision_data = []
    print(f"Fetching {init_steps} for the initialization...")
    counter = 0
    for batch in tqdm(dataloader):
        if counter == init_steps:
            break
        with torch.no_grad():
            if batch:
                counter += 1
                text_data.append(batch["input_ids"].to("cpu"))
                vision_data.append(batch["pixel_values"].to("cpu"))
    return vision_data, text_data

def prepare_vision_text_dataset(opt_init_steps=300):
    """
    Prepares a vision-text dataset for quantization by collecting vision and text data.
    """
    dataset = load_dataset("conceptual_captions", streaming=True)
    train_dataset = dataset["train"].shuffle(seed=0)
    dataloader = torch.utils.data.DataLoader(train_dataset, collate_fn=collate_fn, batch_size=1)
    vision_data, text_data = collect_vision_text_data(dataloader, opt_init_steps)
    return vision_data, text_data

ESC-50 データセットは、ImageBind モデルのオーディオ・モダリティーを量子化するために使用されます。データセットは、環境音分類のベンチマーク方法に適した 2000 件の環境音声録音のラベル付きコレクションです。データセットは、50 のセマンティック・クラスに編成された 5 秒間の録音で構成されています。

import numpy as np
import torchaudio

def collect_audio_data(dataloader, init_steps=300):
    """
    This function collects audio data from a dataloader for a specified number of initialization steps.
    It iterates over the dataloader, fetching batches and storing them in a list.
    """
    audio_data = []
    for _, batch in tqdm(zip(range(init_steps), itertools.islice(dataloader, 0, init_steps))):
        with torch.no_grad():
            audio_data.append(batch)
    return audio_data

def prepare_audio_dataset():
    """
    Prepares an "ashraq/esc50" audio dataset for quantization by collecting audio data.
    Collects audio data from the dataloader by calling the `collect_audio_data` function.
    Returns a list containing the collected calibration audio data batches.
    """
    audio_dataset = load_dataset("ashraq/esc50", streaming=True)
    train_dataset = audio_dataset["train"].shuffle(seed=42, buffer_size=1000)

    def collate_fn(examples):
        assert len(examples) == 1
        with tempfile.TemporaryDirectory() as tempdir:
            f_name = os.path.join(tempdir, 'audio.wav')
            audio_data = examples[0]['audio']['array']
            sample_rate = examples[0]['audio']["sampling_rate"]
            audio_data = torch.from_numpy(audio_data).to(torch.float32).unsqueeze(0)
            torchaudio.save(f_name, audio_data, sample_rate)
            return data.load_and_transform_audio_data([f_name], "cpu")

    dataloader = torch.utils.data.DataLoader(train_dataset, collate_fn=collate_fn, batch_size=1)
    calibration_data = collect_audio_data(dataloader)
    return calibration_data

事前トレーニングされた FP16 モデルから量子化モデルを作成します。

vision_data, text_data = prepare_vision_text_dataset()
Fetching 300 for the initialization...
0it [00:00, ?it/s]
Can't load image from url: http://homeklondike.org/wp-content/uploads/2015/06/2-Bright-living-room-in-the-attic1.jpg
Can't load image from url: http://www.lovemeinitaly.com/wp-content/uploads/cache/images/2018/01/4A-e1491723576743/4A-e1491723576743-1964759082.jpg
Can't load image from url: https://i0.wp.com/childphotocompetition.com/wp-content/uploads/2016/02/Agnieszka_He%E2%80%8E_childphotocompetition.jpg
Can't load image from url: https://magankonoski.com/wp-content/uploads/2016/05/MaganKonoskiFineArtWeddingandLifestylePhotographer-25-683x1024.jpg
Can't load image from url: http://www.huahin-home-property.com/wp-content/uploads/2016/11/2immobilier-real-eatate-huahin-maison-a-vendre-condo-for-salerent-The-Autumm-Khao-takibe.jpg
Can't load image from url: http://www.americanclassichomes.com/blog/wp-content/uploads/2015/04/Alki_SB_Kitchen_internet.jpg
Can't load image from url: http://assets.nydailynews.com/polopoly_fs/1.110031.1313943805!/img/httpImage/image.jpg_gen/derivatives/article_750/alg-fencer-sara-harvey-browne-2-jpg.jpg
Can't load image from url: http://static.panoramio.com/photos/large/34107183.jpg
Can't load image from url: https://odis.homeaway.com/odis/listing/2f9f1d46-0559-4811-95ed-c97cc8608793.c10.jpg
Can't load image from url: https://odis.homeaway.com/odis/listing/75953842-3278-42a1-91ef-2bb2be2ecb05.c10.jpg
Can't load image from url: https://ak6.picdn.net/shutterstock/videos/2504486/thumb/1.jpg
Can't load image from url: http://www.buro247.my/thumb/625x960_0/galleries/2017/10/lady-dior-art-2-19.jpg
Can't load image from url: http://oneindiaonepeople.com/wp-content/uploads/2014/02/13.jpg
Can't load image from url: http://www.johnsoncitypress.com/image/2016/10/27/640x_cCM_q30/XC-Region-A-AA-JPG.jpg
Can't load image from url: http://fromthedeckchair.com/wp-content/uploads/2013/06/ftdc_norwegianpearl-0737.jpg
Can't load image from url: http://thedailyquotes.com/wp-content/uploads/2015/04/could-be-another-broken-heart-love-daily-quotes-sayings-pictures.jpg
Can't load image from url: https://www.popsci.com/sites/popsci.com/files/styles/1000_1x_/public/vizdata_map_key.jpg?itok=7myhqx2P
Can't load image from url: https://www.interlatesystems.com/img/1166/183.jpg
Can't load image from url: https://i1.wp.com/dailynexus.com/wp-content/uploads/2016/10/HalloweenWeekend_KennethSong-4-1024x671.jpg?resize=1024%2C671
Can't load image from url: https://odis.homeaway.com/odis/listing/d81ed29b-f448-444a-9048-ed9cc9fe666a.c10.jpg
Can't load image from url: http://exploresrilanka.lk/wp-content/uploads/2016/04/BTI37666.jpg
Can't load image from url: http://www.tampabay.com/storyimage/HI/20170528/ARTICLE/305289727/AR/0/AR-305289727.jpg
Can't load image from url: http://wewegombel.me/photo/558689/IMG_7994.jpg
Can't load image from url: http://www.thedonkeysanctuary.ie/sites/ireland/files/styles/large/public/press/259-1445414098.jpg?itok=dwa9kRh_
Can't load image from url: https://thumb1.shutterstock.com/display_pic_with_logo/3816881/478955293/stock-vector-abstract-pattern-in-the-memphis-style-of-large-white-spots-and-little-green-with-black-dots-on-a-478955293.jpg
Can't load image from url: http://media.santabanta.com/images/picsms/2016/sms-16401.jpg
Can't load image from url: https://lookaside.fbsbx.com/lookaside/crawler/media/?media_id=657209177718359
Can't load image from url: http://www.blogbeen.com/wp-content/uploads/2017/09/-mesmerizing-bathroom-tiles-11-jpg-bathroom-full-version-helulis-.jpg
Can't load image from url: https://6e58e2e225bb143c019e-e234a4d870c026b5f56b4446f6e62d64.ssl.cf1.rackcdn.com/a9ad7fa8-cf6c-4d2b-bbc6-591e0fd0cb2f.jpg
Can't load image from url: http://wewegombel.me/photo/487654/img_8173.jpg
Can't load image from url: http://s1.ibtimes.com/sites/www.ibtimes.com/files/styles/lg/public/2011/06/04/109074-an-african-giant-pouch-rat-is-watched-by-his-handler-at-a-laboratory-i.jpg
Can't load image from url: http://nnimgt-a.akamaihd.net/transform/v1/crop/frm/w9qsSAVumVxqyCiyw3G2iR/d9d78dda-7d5d-4420-9f3d-a1d44813c251.jpg/r0_64_960_604_w1200_h678_fmax.jpg
Can't load image from url: https://www.thenational.ae/image/policy:1.197226:1499310330/image/jpeg.jpg?f=16x9&w=1024&$p$f$w=2589da4
Can't load image from url: https://ak4.picdn.net/shutterstock/videos/14101994/thumb/1.jpg?i10c=img.resize(height:160)
Can't load image from url: http://sanpancholife.com/photos/home/2386/super/5005683111355530342.jpeg
Can't load image from url: https://media.gettyimages.com/photos/two-bottles-of-pills-one-knocked-over-with-contents-spilling-out-and-picture-id73740799?s=612x612
Can't load image from url: https://www.thestar.com/content/dam/thestar/entertainment/music/2017/04/17/prince-was-prescribed-oxycodone-under-another-name-court-document/prince-07.jpg.size.custom.crop.891x650.jpg
Can't load image from url: http://photos.mycapture.com/TWCM/1473481/41921058E.jpg
Can't load image from url: http://xboxhut.com/wp-content/uploads/2016/05/simple-bathroom-designs-grey-modern-double-sink-bathroom-vanities60-37.jpg
Can't load image from url: http://seanverret.com/wp-content/uploads/2012/07/20120710_104349.jpg
Can't load image from url: http://neveradulldayinpoland.com/wp-content/uploads/2014/04/DSC_3434-1024x682.jpg
Can't load image from url: http://wewegombel.me/photo/687156/watercolor-christmas-tree-isolated-white-background-texture-paper-new-year-christmas-card-template-62641882.jpg
Can't load image from url: http://expatedna.com/wp-content/uploads/2015/06/City-in-the-sky-by-Expat-Edna.jpg
Can't load image from url: https://lookaside.fbsbx.com/lookaside/crawler/media/?media_id=1291121264312721
Can't load image from url: https://i0.wp.com/cindi-keller.com/wp-content/uploads/2014/09/cindi-keller_2014-08-15_15.07.29_ronda-spain.jpg?w=400&h=533&crop&ssl=1
Can't load image from url: http://www.robinhoodshow.com/clients/17668/8642054_org.jpg
Can't load image from url: https://www.101india.com/sites/default/files/image-upload/blogs/TravelandFood/29NovSecretDevkundWaterfalls/Inline%204%20%3C%20Sunrise%20at%20the%20river%20behind%20the%20farmhouse%20%3E.jpg
Can't load image from url: http://www.nextavenue.org/wp-content/uploads/2017/05/image-3-w1024-750x485.jpg
Can't load image from url: http://nnimgt-a.akamaihd.net/transform/v1/crop/frm/342N54ExNnUCDyWzghgYbSC/cd538c73-466c-4e05-8202-0892dceb8a44.jpg/r401_321_5388_3369_w1200_h678_fmax.jpg
Can't load image from url: https://www.universetoday.com/wp-content/uploads/2016/05/Earth-magnetosphere-ESA-Medialab.jpg
Can't load image from url: https://c5eeb468edc90bcfda59-8477d1500ace5389b08f6bb1cc2fee82.ssl.cf5.rackcdn.com/837712-residential-x722qn-o.jpg
Can't load image from url: https://ak3.picdn.net/shutterstock/videos/7414963/thumb/1.jpg
import logging
import nncf
import openvino as ov

nncf.set_log_level(logging.ERROR)

core = ov.Core()

def quantize_openvino_model(modality, calibration_data):
    model_path = fp_model_paths[modality]
    if not os.path.exists(model_path):
        raise RuntimeError(f"Model: {model_path} not found. \
                             First run 239-image-bind-convert notebook to convert model to OpenVINO IR.")

    model = core.read_model(model_path)
    quantized_model = nncf.quantize(
        model=model,
        calibration_dataset=calibration_data,
        model_type=nncf.ModelType.TRANSFORMER,
        # remove ignored_scope for nncf>=2.6.0 (PR with fix https://github.com/openvinotoolkit/nncf/pull/1953)
        ignored_scope=nncf.IgnoredScope(types=["ReduceL2"])
    )
    ov.save_model(quantized_model, int8_model_paths[modality])
    return quantized_model
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino

視覚モダリティー用の ImageBind モデルを量子化します。

注: 量子化は時間とメモリーを消費する操作です。以下の量子化コードを実行すると、時間がかかる場合があります。

if len(vision_data) == 0:
    raise RuntimeError(
        'Calibration dataset is empty. Please check internet connection and try to download images manually from the URLs above.'
    )

vision_dataset = nncf.Dataset(vision_data)
vision_quantized_model = quantize_openvino_model(modality=ModalityType.VISION, calibration_data=vision_dataset)
2023-10-26 13:34:25.166422: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-10-26 13:34:25.203294: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-26 13:34:26.097309: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Statistics collection: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:18<00:00,  3.81it/s]
Applying Smooth Quant: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 129/129 [00:13<00:00,  9.69it/s]
Statistics collection: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [03:03<00:00,  1.64it/s]
Applying Fast Bias correction: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:23<00:00,  5.54it/s]

テキスト・モダリティー用の ImageBind モデルを量子化します。

text_dataset = nncf.Dataset(text_data)
text_quantized_model = quantize_openvino_model(modality=ModalityType.TEXT, calibration_data=text_dataset)
Statistics collection: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:17<00:00, 16.82it/s]
Applying Smooth Quant: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 97/97 [00:06<00:00, 15.41it/s]
Statistics collection: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:50<00:00,  5.97it/s]
Applying Fast Bias correction: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:10<00:00, 11.06it/s]

オーディオ・モダリティー用の ImageBind モデルを量子化します。

audio_calibration_data = prepare_audio_dataset()
audio_dataset = nncf.Dataset(audio_calibration_data)
audio_quantized_model = quantize_openvino_model(modality=ModalityType.AUDIO, calibration_data=audio_dataset)
Repo card metadata block was not found. Setting CardData to empty.
0it [00:00, ?it/s]
Statistics collection: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:10<00:00,  4.26it/s]
Applying Smooth Quant: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49/49 [00:01<00:00, 27.79it/s]
Statistics collection: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:21<00:00,  3.69it/s]
Applying Fast Bias correction: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [00:03<00:00, 12.18it/s]

NNCF は、量子化対応トレーニングや量子化以外のアルゴリズムもサポートしています。詳細については、NNCF リポジトリーの NNCF ドキュメントを参照してください。

OpenVINO FP16 モデルと量子化モデルの結果を比較

FP16INT8 モデルの確率行列を比較します。確率行列の詳細については、ノートブックを参照してください。

# Prepare inputs
from imagebind.models.imagebind_model import ModalityType

text_list = ["A car", "A bird", "A dog"]
image_paths = [".assets/dog_image.jpg", ".assets/car_image.jpg", ".assets/bird_image.jpg"]
audio_paths = [".assets/dog_audio.wav", ".assets/bird_audio.wav", ".assets/car_audio.wav"]

inputs = {
    ModalityType.TEXT: data.load_and_transform_text(text_list, "cpu"),
    ModalityType.VISION: data.load_and_transform_vision_data(image_paths, "cpu"),
    ModalityType.AUDIO: data.load_and_transform_audio_data(audio_paths, "cpu"),
}

推論デバイスの選択

OpenVINO を使用して推論を実行するためにドロップダウン・リストからデバイスを選択します。

import ipywidgets as widgets

device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value='AUTO',
    description='Device:',
    disabled=False,
)

device
Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO')

元のモデルの埋め込みを取得します。

embeddings = {}
for modality in modalities:
    ov_model = core.compile_model(fp_model_paths[modality], device.value)
    embeddings[modality] = ov_model(inputs[modality])[ov_model.output(0)]

量子化モデルの埋め込みを取得します。

quantized_embeddings = {}
for modality in modalities:
    model = core.compile_model(int8_model_paths[modality], device.value)
    quantized_embeddings[modality] = model(inputs[modality])[model.output(0)]
import matplotlib.pyplot as plt
from scipy.special import softmax

def visualize_prob_matrices(fp_matrix, int_matrix, x_label, y_label):
    fig, ax = plt.subplots(1, 2)
    for i, matrix in enumerate([fp_matrix, int_matrix]):
        ax[i].matshow(matrix, cmap='winter')

        for (k, j), z in np.ndenumerate(matrix):
            ax[i].title.set_text('FP16 probs' if i == 0 else 'INT8 probs')
            ax[i].text(j, k, '{:0.3f}'.format(z), ha='center', va='center')
            ax[i].set_xticks(range(len(x_label)), x_label)
            ax[i].set_yticks(range(len(y_label)), y_label)
    fig.tight_layout()

image_list = [img.split('/')[-1] for img in image_paths]
audio_list = [audio.split('/')[-1] for audio in audio_paths]
fp_text_vision_scores = softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.TEXT].T, axis=-1)
int_text_vision_scores = softmax(quantized_embeddings[ModalityType.VISION] @ quantized_embeddings[ModalityType.TEXT].T, axis=-1)

visualize_prob_matrices(fp_text_vision_scores, int_text_vision_scores, text_list, image_list)
../_images/239-image-bind-quantize-with-output_32_0.png
fp_text_audio_scores = softmax(embeddings[ModalityType.AUDIO] @ embeddings[ModalityType.TEXT].T, axis=-1)
int_text_audio_scores = softmax(quantized_embeddings[ModalityType.AUDIO] @ quantized_embeddings[ModalityType.TEXT].T, axis=-1)

visualize_prob_matrices(fp_text_audio_scores, int_text_audio_scores, text_list, image_list)
../_images/239-image-bind-quantize-with-output_33_0.png
fp_audio_vision_scores = softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.AUDIO].T, axis=-1)
int_audio_vision_scores = softmax(quantized_embeddings[ModalityType.VISION] @ quantized_embeddings[ModalityType.AUDIO].T, axis=-1)

visualize_prob_matrices(fp_audio_vision_scores, int_audio_vision_scores, text_list, image_list)
../_images/239-image-bind-quantize-with-output_34_0.png

ファイルサイズの比較

def calculate_compression_rate(modality):
    fp16_ir_model_size = Path(fp_model_paths[modality]).with_suffix(".bin").stat().st_size / 1024
    quantized_model_size = Path(int8_model_paths[modality]).with_suffix(".bin").stat().st_size / 1024
    print(f'Modality: {modality}')
    print(f"    * FP16 IR model size: {fp16_ir_model_size:.2f} KB")
    print(f"    * INT8 model size: {quantized_model_size:.2f} KB")
    print(f"    * Model compression rate: {fp16_ir_model_size / quantized_model_size:.3f}")
for modality in modalities:
                                    calculate_compression_rate(modality)
Modality: text
    * FP16 IR model size: 691481.77 KB
    * INT8 model size: 347007.35 KB
    * Model compression rate: 1.993
Modality: vision
    * FP16 IR model size: 1235995.26 KB
    * INT8 model size: 620133.72 KB
    * Model compression rate: 1.993
Modality: audio
    * FP16 IR model size: 168429.22 KB
    * INT8 model size: 84818.78 KB
    * Model compression rate: 1.986

FP16 IR と量子化モデルの推論時間を比較

FP16INT8 モデルの推論パフォーマンスを測定するには、キャリブレーション・データセットの推論時間の中央値を使用します。したがって、動的量子化モデルの速度向上を見積もることができます。

注: 最も正確なパフォーマンス推定を行うには、他のアプリケーションを閉じた後、ターミナル/コマンドプロンプトで benchmark_app を実行することを推奨します。

import time

def calculate_inference_time(model_path, calibration_data):
    model = core.compile_model(model_path)
    output_layer = model.output(0)
    inference_time = []
    for batch in calibration_data:
        start = time.perf_counter()
        _ = model(batch)[output_layer]
        end = time.perf_counter()
        delta = end - start
        inference_time.append(delta)
    return np.median(inference_time)

ビジョンモデル

fp16_latency = calculate_inference_time(fp_model_paths[ModalityType.VISION], vision_data)
int8_latency = calculate_inference_time(int8_model_paths[ModalityType.VISION], vision_data)
print(f"Performance speed up: {fp16_latency / int8_latency:.3f}")
Performance speed up: 2.040

テキストモデル

fp16_latency = calculate_inference_time(fp_model_paths[ModalityType.TEXT], text_data)
int8_latency = calculate_inference_time(int8_model_paths[ModalityType.TEXT], text_data)
print(f"Performance speed up: {fp16_latency / int8_latency:.3f}")
Performance speed up: 1.404

オーディオモデル

fp16_latency = calculate_inference_time(fp_model_paths[ModalityType.AUDIO], audio_calibration_data)
int8_latency = calculate_inference_time(int8_model_paths[ModalityType.AUDIO], audio_calibration_data)
print(f"Performance speed up: {fp16_latency / int8_latency:.3f}")
Performance speed up: 5.713