ConvNeXt と OpenVINO の分類#

この Jupyter ノートブックは、ローカルへのインストール後にのみ起動できます。

torchvision.models サブパッケージには、画像分類、ピクセル単位のセマンティック・セグメント化、オブジェクト検出、インスタンスのセグメント化、人物キーポイント検出、ビデオ分類、オプティカル・フローなど、さまざまなタスクに対処するモデルの定義が含まれています。このノートブックでは、そのいずれかの使用方法を説明します。

ConvNeXt モデルは、A ConvNet for the 2020s の論文に基づいています。この探求の結果は、ConvNeXt と呼ばれる純粋な ConvNet モデルのファミリーです。標準 ConvNet モジュールのみから構築された ConvNeXt は、標準 ConvNet の容易性と効率を維持しながら、精度とスケーラビリティーの点で Transformers と有利に競合し、87.8% の ImageNet トップ 1 の精度を達成し、COCO 検出と ADE20K セグメント化で Swin Transformers を上回ります。torchvision.models サブパッケージには、事前トレーニングされた ConvNeXt モデルがいくつか含まれています。このチュートリアルでは、ConvNeXt Tiny モデルを使用します。

目次:

要件
テスト画像を取得
事前トレーニングされたモデルを取得
前処理を定義して入力データを準備
元のモデルを使用して推論を実行
モデルを OpenVINO 中間表現形式に変換
OpenVINO IR モデルを使用して推論を実行

必要条件#

%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu torch torchvision 
%pip install -q "openvino>=2023.1.0"

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

テスト画像を取得#

まず、開いたデータセットからテスト画像を取得します。

import requests 

from torchvision.io import read_image 
import torchvision.transforms as transforms 

img_path = "cats_image.jpeg" 
r = requests.get("https://huggingface.co/datasets/huggingface/cats-image/resolve/main/cats_image.jpeg") 

with open(img_path, "wb") as f: 
    f.write(r.content) 
image = read_image(img_path) 
display(transforms.ToPILImage()(image))

../_images/convnext-classification-with-output_4_0.png

事前トレーニングされたモデルを取得#

Torchvision は、利用可能なモデルをリストし、取得するメカニズムを提供します。

import torchvision.models as models 

# 利用可能なモデルの一覧 
all_models = models.list_models() 
# タイプ別のモデル一覧:  分類モデルは親モジュールにあります 
classification_models = models.list_models(module=models) 

print(classification_models)

['alexnet', 'convnext_base', 'convnext_large', 'convnext_small', 'convnext_tiny', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3', 'efficientnet_b4', 'efficientnet_b5', 'efficientnet_b6', 'efficientnet_b7', 'efficientnet_v2_l', 'efficientnet_v2_m', 'efficientnet_v2_s', 'googlenet', 'inception_v3', 'maxvit_t', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'mobilenet_v2', 'mobilenet_v3_large', 'mobilenet_v3_small', 'regnet_x_16gf', 'regnet_x_1_6gf', 'regnet_x_32gf', 'regnet_x_3_2gf', 'regnet_x_400mf', 'regnet_x_800mf', 'regnet_x_8gf', 'regnet_y_128gf', 'regnet_y_16gf', 'regnet_y_1_6gf', 'regnet_y_32gf', 'regnet_y_3_2gf', 'regnet_y_400mf', 'regnet_y_800mf', 'regnet_y_8gf', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext101_64x4d', 'resnext50_32x4d', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0', 'squeezenet1_0', 'squeezenet1_1', 'swin_b', 'swin_s', 'swin_t', 'swin_v2_b', 'swin_v2_s', 'swin_v2_t', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'vit_b_16', 'vit_b_32', 'vit_h_14', 'vit_l_16', 'vit_l_32', 'wide_resnet101_2', 'wide_resnet50_2']

convnext_tiny を使用します。事前トレーニングされたモデルを取得するには、models.get_model("convnext_tiny", weights='DEFAULT') を使用するか、ConvNeXt_Tiny_Weights.IMAGENET1K_V1 と同等のデフォルトの重みを使用してこのモデルの torchvision.models の特定のメソッドを使用します。weight を指定しない、または weights=None を指定した場合は、ランダムな初期化になります。モデルで使用可能なすべての重みを取得するため、weights_enum = models.get_model_weights("convnext_tiny") を呼び出すことができますが、このモデルには 1 つしかありません。事前トレーニングされたモデルを初期化する方法の詳細は、こちらをご覧ください。

model = models.convnext_tiny(weights=models.ConvNeXt_Tiny_Weights.DEFAULT)

前処理を定義して入力データを準備#

torchvision.transforms で前処理を行ったり、モデルの重みから前処理変換できます。

import torch 

preprocess = models.ConvNeXt_Tiny_Weights.DEFAULT.transforms() 

input_data = preprocess(image) 
input_data = torch.stack([input_data], dim=0)

元のモデルを使用して推論を実行#

outputs = model(input_data)

結果を印刷します

# クラス番号からクラスラベルへのマッピングをダウンロード 
imagenet_classes_file_path = "imagenet_2012.txt" 
r = requests.get( 

url="https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/datasets/imagenet/imagenet_2012.txt", 
) 

with open(imagenet_classes_file_path, "w") as f: 
    f.write(r.text) 

imagenet_classes = open(imagenet_classes_file_path).read().splitlines() 

def print_results(outputs: torch.Tensor):
    _, predicted_class = outputs.max(1) 
    predicted_probability = torch.softmax(outputs, dim=1)[0, predicted_class].item() 

    print(f"Predicted Class: {predicted_class.item()}") 
    print(f"Predicted Label: {imagenet_classes[predicted_class.item()]}") 
    print(f"Predicted Probability: {predicted_probability}")

print_results(outputs)

Predicted Class: 281 
Predicted Label: n02123045 tabby, tabby cat Predicted 
Probability: 0.5351971983909607

モデルを OpenVINO 中間表現形式に変換#

OpenVINO は、OpenVINO 中間表現 (IR) 形式への変換により PyTorch をサポートします。OpenVINO 最適化ツールと機能を活用するには、OpenVINO コンバーター・ツール (OVC) を使用してモデルを変換する必要があります。openvino.convert_model 関数は、OVC を使用するための Python API を提供します。この関数は、Python インターフェイスで使用できる OpenVINO Model クラスのインスタンスを返します。ただし、将来の実行に向けて openvino.save_model でディスクに保存することもできます。

from pathlib import Path 

import openvino as ov 

ov_model_xml_path = Path("models/ov_convnext_model.xml") 

if not ov_model_xml_path.exists(): 
    ov_model_xml_path.parent.mkdir(parents=True, exist_ok=True) 
    converted_model = ov.convert_model(model, example_input=torch.randn(1, 3, 224, 224)) 
    # OpenVINO の前処理変換に変換を追加 
    ov.save_model(converted_model, ov_model_xml_path) 
else: 
    print(f"IR model {ov_model_xml_path} already exists.")

['x']

openvino.save_model 関数を使用すると、OpenVINO モデルは、拡張子 .xml および .bin を持つ 2 つのファイルとしてファイルシステムにシリアル化されます。このファイルのペアは OpenVINO 中間表現形式 (OpenVINO IR、または単に IR) と呼ばれ、効率的なモデル展開に役立ちます。OpenVINO IR は、openvino.Core.read_model 関数を使用して推論のため別のアプリケーションにロードできます。

OpenVINO を使用して推論を実行するデバイスをドロップダウン・リストから選択します。

import ipywidgets as widgets 

core = ov.Core() 
device = widgets.Dropdown( 
    options=core.available_devices + ["AUTO"], 
    value="AUTO", 
    description="Device:", 
    disabled=False, 
) 

device

Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

core = ov.Core() 

compiled_model = core.compile_model(ov_model_xml_path, device_name=device.value)

OpenVINO IR モデルを使用して推論を実行#

outputs = compiled_model(input_data)[0] 
print_results(torch.from_numpy(outputs))

Predicted Class: 281 
Predicted Label: n02123045 tabby, tabby cat Predicted 
Probability: 0.5664422512054443