ConvNeXt と OpenVINO の分類¶

この Jupyter ノートブックはオンラインで起動でき、ブラウザーのウィンドウで対話型環境を開きます。ローカルにインストールすることもできます。次のオプションのいずれかを選択します。

torchvision.models サブパッケージには、画像分類、ピクセル単位のセマンティック・セグメント化、オブジェクト検出、インスタンスのセグメント化、人物キーポイント検出、ビデオ分類、オプティカル・フローなど、さまざまなタスクに対処するモデルの定義が含まれています。このノートブックでは、そのいずれかの使用方法を説明します。

ConvNeXt モデルは、A ConvNet for the 2020s の論文に基づいています。この探求の結果は、ConvNeXt と呼ばれる純粋な ConvNet モデルのファミリーです。標準 ConvNet モジュールのみから構築された ConvNeXt は、標準 ConvNet のシンプルさと効率を維持しながら、精度とスケーラビリティーの点で Transformers と有利に競合し、87.8% の ImageNet トップ 1 の精度を達成し、COCO 検出と ADE20K セグメント化で Swin Transformers を上回ります。torchvision.models サブパッケージには、事前トレーニングされた ConvNeXt モデルがいくつか含まれています。このチュートリアルでは、ConvNeXt Tiny モデルを使用します。

必要条件¶

                                        %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu torch torchvision
%pip install -q  "openvino>=2023.1.0"

                                        Note: you may need to restart the kernel to use updated packages.

                                    

                                        Note: you may need to restart the kernel to use updated packages.

                                    

テスト画像を取得¶

まず、開いたデータセットからテスト画像を取得します。

                                        import urllib.request

from torchvision.io import read_image
import torchvision.transforms as transforms


img_path = 'cats_image.jpeg'
urllib.request.urlretrieve(
    url='https://huggingface.co/datasets/huggingface/cats-image/resolve/main/cats_image.jpeg',
    filename=img_path
)
image = read_image(img_path)
display(transforms.ToPILImage()(image))

                                    

../_images/125-convnext-classification-with-output_4_0.png

事前トレーニングされたモデルを取得¶

Torchvision は、利用可能なモデルをリストし、取得するメカニズムを提供します。

                                        import torchvision.models as models

# List available models
all_models = models.list_models()
# List of models by type. Classification models are in the parent module.
classification_models = models.list_models(module=models)

print(classification_models)

                                    

                                        ['alexnet', 'convnext_base', 'convnext_large', 'convnext_small', 'convnext_tiny', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3', 'efficientnet_b4', 'efficientnet_b5', 'efficientnet_b6', 'efficientnet_b7', 'efficientnet_v2_l', 'efficientnet_v2_m', 'efficientnet_v2_s', 'googlenet', 'inception_v3', 'maxvit_t', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'mobilenet_v2', 'mobilenet_v3_large', 'mobilenet_v3_small', 'regnet_x_16gf', 'regnet_x_1_6gf', 'regnet_x_32gf', 'regnet_x_3_2gf', 'regnet_x_400mf', 'regnet_x_800mf', 'regnet_x_8gf', 'regnet_y_128gf', 'regnet_y_16gf', 'regnet_y_1_6gf', 'regnet_y_32gf', 'regnet_y_3_2gf', 'regnet_y_400mf', 'regnet_y_800mf', 'regnet_y_8gf', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext101_64x4d', 'resnext50_32x4d', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0', 'squeezenet1_0', 'squeezenet1_1', 'swin_b', 'swin_s', 'swin_t', 'swin_v2_b', 'swin_v2_s', 'swin_v2_t', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'vit_b_16', 'vit_b_32', 'vit_h_14', 'vit_l_16', 'vit_l_32', 'wide_resnet101_2', 'wide_resnet50_2']

                                    

convnext_tiny を使用します。事前トレーニングされたモデルを取得するには、models.get_model("convnext_tiny", weights='DEFAULT') を使用するか、ConvNeXt_Tiny_Weights.IMAGENET1K_V1 と同等のデフォルトの重みを使用してこのモデルの torchvision.models の特定のメソッドを使用します。weight を指定しない、または weights=None を指定した場合は、ランダムな初期化になります。モデルで使用可能なすべての重みを取得するため、weights_enum = models.get_model_weights("convnext_tiny") を呼び出すことができますが、このモデルには 1 つしかありません。事前トレーニングされたモデルを初期化する方法の詳細は、こちらをご覧ください。

                                        model = models.convnext_tiny(weights=models.ConvNeXt_Tiny_Weights.DEFAULT)

                                    

前処理を定義して入力データを準備¶

torchvision.transforms で前処理を行ったり、モデル wight からの前処理変換したりできます。

                                        import torch

preprocess = models.ConvNeXt_Tiny_Weights.DEFAULT.transforms()

input_data = preprocess(image)
input_data = torch.stack([input_data], dim=0)

                                        /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
  warnings.warn(

元のモデルを使用して推論を実行¶

                                        outputs = model(input_data)

                                    

結果を印刷します

                                        import urllib.request


# download class number to class label mapping
imagenet_classes_file_path = "imagenet_2012.txt"
urllib.request.urlretrieve(
    url="https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/datasets/imagenet/imagenet_2012.txt",
    filename=imagenet_classes_file_path
)
imagenet_classes = open(imagenet_classes_file_path).read().splitlines()


def print_results(outputs: torch.Tensor):
    _, predicted_class = outputs.max(1)
    predicted_probability = torch.softmax(outputs, dim=1)[0, predicted_class].item()

    print(f"Predicted Class: {predicted_class.item()}")
    print(f"Predicted Label: {imagenet_classes[predicted_class.item()]}")
    print(f"Predicted Probability: {predicted_probability}")

                                    

                                        print_results(outputs)

                                    

                                        Predicted Class: 281
Predicted Label: n02123045 tabby, tabby cat
Predicted Probability: 0.5800774693489075

                                    

モデルを OpenVINO 中間表現形式に変換¶

OpenVINO は、OpenVINO 中間表現 (IR) 形式への変換により PyTorch をサポートします。OpenVINO 最適化ツールと機能を活用するには、OpenVINO コンバーター・ツール (OVC) を使用してモデルを変換する必要があります。openvino.convert_model 関数は、OVC を使用するための Python API を提供します。この関数は、Python インターフェイスで使用できる OpenVINO Model クラスのインスタンスを返します。ただし、将来の実行に向けて openvino.save_model でディスクに保存することもできます。

                                        from pathlib import Path

import openvino as ov


ov_model_xml_path = Path('models/ov_convnext_model.xml')

if not ov_model_xml_path.exists():
    ov_model_xml_path.parent.mkdir(parents=True, exist_ok=True)
    converted_model = ov.convert_model(model, example_input=torch.randn(1, 3, 224, 224))
    # add transform to OpenVINO preprocessing converting
    ov.save_model(converted_model, ov_model_xml_path)
else:
    print(f"IR model {ov_model_xml_path} already exists.")

                                    

openvino.save_model 関数を使用すると、OpenVINO モデルは、拡張子 .xml および .bin を持つ 2 つのファイルとしてファイルシステムにシリアル化されます。このファイルのペアは OpenVINO 中間表現形式 (OpenVINO IR、または単に IR) と呼ばれ、効率的なモデル展開に役立ちます。OpenVINO IR は、openvino.Core.read_model 関数を使用して推論のため別のアプリケーションにロードできます。

OpenVINO を使用して推論を実行するデバイスをドロップダウン・リストから選択します。

                                        import ipywidgets as widgets

core = ov.Core()
device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value='AUTO',
    description='Device:',
    disabled=False,
)

device

                                    

                                        Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

                                    

                                        core = ov.Core()

compiled_model = core.compile_model(ov_model_xml_path, device_name=device.value)

OpenVINO IR モデルを使用して推論を実行¶

                                        outputs = compiled_model(input_data)[0]
print_results(torch.from_numpy(outputs))

                                        Predicted Class: 281
Predicted Label: n02123045 tabby, tabby cat
Predicted Probability: 0.6132654547691345