セグメンターを使用した OpenVINO™ によるセマンティックのセグメント化

この Jupyter ノートブックはオンラインで起動でき、ブラウザーのウィンドウで対話型環境を開きます。ローカルにインストールすることもできます。次のオプションのいずれかを選択します。

Google Colab GitHub

セマンティック・セグメント化は、自動運転、ロボット工学、拡張現実などの多くのアプリケーションにかかわる、コンピューター・ビジョンの難しい課題です。その目標は、所属するオブジェクトに従って各ピクセルにラベルを割り当て、いわゆるセグメント化マスクを作成することです。このラベルを適切に割り当てるには、モデルは画像のローカルおよびグローバルのコンテキストを考慮する必要があります。ここで、トランスフォーマーはグローバルなコンテキストをうまく捉えることができるため、利点が得られます。

セグメンターは、エンコーダーとして機能するビジュアル・トランスフォーマーと、デコーダーとして機能するマスク・トランスフォーマーに基づいています。この構成では、ADE20K、Pascal Context、Cityscapes などのさまざまなデータセットで良好な結果が得られます。以下の図に示すように、画像を取得してパッチに分割し、これらのパッチをエンコードすることで機能します。マスク・トランスフォーマーは、エンコードされたパッチをクラスマスクと結合し、出力としてセグメント化マップにデコードします。各ピクセルにはラベルが割り当てられます。

Segmenter diagram > この画像のクレジットは、セグメンターのオリジナルの作成者に帰属します。

モデルとその詳細については、論文 Segmenter: Transformer for Semantic Segmentation またはリポジトリーを参照してください。

目次

OpenVINO でセグメンターを変換して使用する方法を示すため、このノートブックは次の手順で構成されています。

  • PyTorch セグメンター・モデルの準備

  • 前処理と可視化関数の準備

  • 元のモデルの推論の検証

  • PyTorch モデルを OpenVINO IR に変換

  • 変換されたモデルの推論の検証

  • 変換されたモデルのベンチマーク・パフォーマンス

PyTorch モデルを取得して準備

最初に行うのは、モデルとヘルパー関数を含むリポジトリーのクローンを作成することです。マスク・トランスフォーマー付きの Tiny モデル、つまり Seg-T-Mask/16 を使用します。リンクされたリポジトリーには、より優れ、はるかに大きなモデルも用意されています。このモデルは、セグメント化に使用される ADE20K データセットで事前トレーニングされています。

リポジトリーのコードには、モデルを作成して重みをロードする関数がすでに含まれていますが、構成とトレーニング済みの重み (チェックポイント) ファイルをダウンロードし、いくつかのヘルパー関数を追加する必要があります。

必要条件

import sys
from pathlib import Path

# clone Segmenter repo
if not Path("segmenter").exists():
    !git clone https://github.com/rstrudel/segmenter
else:
    print("Segmenter repo already cloned")

# include path to Segmenter repo to use its functions
sys.path.append("./segmenter")
Cloning into 'segmenter'...
remote: Enumerating objects: 268, done.

Receiving objects:   0% (1/268)
Receiving objects:   1% (3/268)
Receiving objects:   2% (6/268)
Receiving objects:   3% (9/268)
Receiving objects:   4% (11/268)
Receiving objects:   5% (14/268)
Receiving objects:   6% (17/268)
Receiving objects:   7% (19/268)
Receiving objects:   8% (22/268)
Receiving objects:   9% (25/268)
Receiving objects:  10% (27/268)
Receiving objects:  11% (30/268)
Receiving objects:  12% (33/268)
Receiving objects:  13% (35/268)
Receiving objects:  14% (38/268)
Receiving objects:  15% (41/268)
Receiving objects:  16% (43/268)
Receiving objects:  17% (46/268)
Receiving objects:  18% (49/268)
Receiving objects:  19% (51/268)
Receiving objects:  20% (54/268)
Receiving objects:  21% (57/268)
Receiving objects:  22% (59/268)
Receiving objects:  23% (62/268)
Receiving objects:  24% (65/268)
Receiving objects:  24% (65/268), 3.68MiB | 3.63MiB/s
Receiving objects:  24% (66/268), 7.47MiB | 3.70MiB/s
Receiving objects:  25% (67/268), 7.47MiB | 3.70MiB/s
Receiving objects:  25% (68/268), 11.26MiB | 3.73MiB/s
Receiving objects:  26% (70/268), 11.26MiB | 3.73MiB/s
Receiving objects:  27% (73/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  28% (76/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  29% (78/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  30% (81/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  31% (84/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  32% (86/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  33% (89/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  34% (92/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  35% (94/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  36% (97/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  37% (100/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  38% (102/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  39% (105/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  40% (108/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  41% (110/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  42% (113/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  43% (116/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  44% (118/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  45% (121/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  46% (124/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  47% (126/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  48% (129/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  49% (132/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  50% (134/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  51% (137/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  52% (140/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  53% (143/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  54% (145/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  55% (148/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  56% (151/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  57% (153/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  58% (156/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  59% (159/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  60% (161/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  61% (164/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  62% (167/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  63% (169/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  64% (172/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  65% (175/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  66% (177/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  67% (180/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  68% (183/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  69% (185/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  70% (188/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  71% (191/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  72% (193/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  73% (196/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  74% (199/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  75% (201/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  76% (204/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  77% (207/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  78% (210/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  79% (212/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  80% (215/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  81% (218/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  82% (220/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  83% (223/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  84% (226/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  85% (228/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  86% (231/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  87% (234/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  88% (236/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  89% (239/268), 11.26 MiB | 3.73 MiB/s
Receiving objects:  90% (242/268), 11.26MiB | 3.73MiB/s
Receiving objects:  91% (244/268), 11.26MiB | 3.73MiB/s
Receiving objects:  92% (247/268), 13.11MiB | 3.72MiB/s
Receiving objects:  93% (250/268), 13.11 MiB | 3.72 MiB/s
Receiving objects:  94% (252/268), 13.11 MiB | 3.72 MiB/s
Receiving objects:  95% (255/268), 13.11 MiB | 3.72 MiB/s
Receiving objects:  96% (258/268), 13.11 MiB | 3.72 MiB/s
Receiving objects:  97% (260/268), 13.11MiB | 3.72MiB/s
Receiving objects:  98% (263/268), 13.11 MiB | 3.72 MiB/s
Receiving objects:  99% (266/268), 13.11 MiB | 3.72 MiB/s
Receiving objects:  99% (267/268), 15.03MiB | 3.73MiB/s
Receiving objects: 100% (268/268), 15.03 MiB | 3.73 MiB/s

Receiving objects: 100% (268/268), 15.34 MiB | 3.73 MiB/s, done.

Resolving deltas:   0% (0/117)
Resolving deltas:   1% (2/117)
Resolving deltas:   2% (3/117)
Resolving deltas:   3% (4/117)
Resolving deltas:   5% (6/117)
Resolving deltas:   7% (9/117)
Resolving deltas:   8% (10/117)
Resolving deltas:   9% (11/117)
Resolving deltas:  10% (12/117)
Resolving deltas:  11% (14/117)
Resolving deltas:  13% (16/117)
Resolving deltas:  15% (18/117)
Resolving deltas:  26% (31/117)
Resolving deltas:  33% (39/117)
Resolving deltas:  54% (64/117)
Resolving deltas:  56% (66/117)
Resolving deltas:  76% (90/117)
Resolving deltas:  80% (94/117)
Resolving deltas:  81% (95/117)
Resolving deltas:  82% (96/117)
Resolving deltas: 100% (117/117)
Resolving deltas: 100% (117/117), done.
# Installing requirements
%pip install -q "openvino>=2023.1.0"
%pip install -r segmenter/requirements.txt
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: torch in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from -r segmenter/requirements.txt (line 1)) (2.1.0+cpu)
Requirement already satisfied: click in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from -r segmenter/requirements.txt (line 2)) (8.1.7)
Requirement already satisfied: numpy in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from -r segmenter/requirements.txt (line 3)) (1.23.5)
Collecting einops (from -r segmenter/requirements.txt (line 4))
  Using cached einops-0.7.0-py3-none-any.whl.metadata (13 kB)
Collecting python-hostlist (from -r segmenter/requirements.txt (line 5))
  Using cached python_hostlist-1.23.0-py3-none-any.whl
Requirement already satisfied: tqdm in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from -r segmenter/requirements.txt (line 6)) (4.66.1)
Requirement already satisfied: requests in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from -r segmenter/requirements.txt (line 7)) (2.31.0)
Requirement already satisfied: pyyaml in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from -r segmenter/requirements.txt (line 8)) (6.0.1)
Collecting timm==0.4.12 (from -r segmenter/requirements.txt (line 9))
  Using cached timm-0.4.12-py3-none-any.whl (376 kB)
Collecting mmcv==1.3.8 (from -r segmenter/requirements.txt (line 10))
  Using cached mmcv-1.3.8-py2.py3-none-any.whl
Collecting mmsegmentation==0.14.1 (from -r segmenter/requirements.txt (line 11))
  Using cached mmsegmentation-0.14.1-py3-none-any.whl (201 kB)
Requirement already satisfied: torchvision in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from timm==0.4.12->-r segmenter/requirements.txt (line 9)) (0.16.0+cpu)
Requirement already satisfied: addict in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from mmcv==1.3.8->-r segmenter/requirements.txt (line 10)) (2.4.0)
Requirement already satisfied: Pillow in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from mmcv==1.3.8->-r segmenter/requirements.txt (line 10)) (10.2.0)
Collecting yapf (from mmcv==1.3.8->-r segmenter/requirements.txt (line 10))
  Using cached yapf-0.40.2-py3-none-any.whl.metadata (45 kB)
Requirement already satisfied: matplotlib in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (3.7.4)
Requirement already satisfied: prettytable in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (3.9.0)
Requirement already satisfied: filelock in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch->-r segmenter/requirements.txt (line 1)) (3.13.1)
Requirement already satisfied: typing-extensions in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch->-r segmenter/requirements.txt (line 1)) (4.9.0)
Requirement already satisfied: sympy in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch->-r segmenter/requirements.txt (line 1)) (1.12)
Requirement already satisfied: networkx in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch->-r segmenter/requirements.txt (line 1)) (3.1)
Requirement already satisfied: jinja2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch->-r segmenter/requirements.txt (line 1)) (3.1.3)
Requirement already satisfied: fsspec in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch->-r segmenter/requirements.txt (line 1)) (2023.10.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->-r segmenter/requirements.txt (line 7)) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->-r segmenter/requirements.txt (line 7)) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->-r segmenter/requirements.txt (line 7)) (2.2.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->-r segmenter/requirements.txt (line 7)) (2024.2.2)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from jinja2->torch->-r segmenter/requirements.txt (line 1)) (2.1.5)
Requirement already satisfied: contourpy>=1.0.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (4.48.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (23.2)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (6.1.1)
Requirement already satisfied: wcwidth in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from prettytable->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (0.2.13)
Requirement already satisfied: mpmath>=0.19 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from sympy->torch->-r segmenter/requirements.txt (line 1)) (1.3.0)
Requirement already satisfied: importlib-metadata>=6.6.0 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from yapf->mmcv==1.3.8->-r segmenter/requirements.txt (line 10)) (7.0.1)
Requirement already satisfied: platformdirs>=3.5.1 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from yapf->mmcv==1.3.8->-r segmenter/requirements.txt (line 10)) (4.2.0)
Collecting tomli>=2.0.1 (from yapf->mmcv==1.3.8->-r segmenter/requirements.txt (line 10))
                                            Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Requirement already satisfied: zipp>=0.5 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from importlib-metadata>=6.6.0->yapf->mmcv==1.3.8->-r segmenter/requirements.txt (line 10)) (3.17.0)
Requirement already satisfied: six>=1.5 in /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib->mmsegmentation==0.14.1->-r segmenter/requirements.txt (line 11)) (1.16.0)
Using cached einops-0.7.0-py3-none-any.whl (44 kB)
Using cached yapf-0.40.2-py3-none-any.whl (254 kB)
Installing collected packages: python-hostlist, tomli, einops, yapf, mmsegmentation, mmcv, timm
Attempting uninstall: tomli
  Found existing installation: tomli 1.2.3
  Uninstalling tomli-1.2.3:
    Successfully uninstalled tomli-1.2.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
black 21.7b0 requires tomli<2.0.0,>=0.2.6, but you have tomli 2.0.1 which is incompatible.
Successfully installed einops-0.7.0 mmcv-1.3.8 mmsegmentation-0.14.1 python-hostlist-1.23.0 timm-0.4.12 tomli-2.0.1 yapf-0.40.2
Note: you may need to restart the kernel to use updated packages.
import numpy as np
import yaml

# Fetch the notebook utils script from the openvino_notebooks repo
import urllib.request
urllib.request.urlretrieve(
    url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
    filename='notebook_utils.py'
)
from notebook_utils import download_file, load_image

セグメンター・リポジトリーの機能を使用するには、timmmmsegmentationeinopsmmcv が必要です。

まず、Segmenter リポジトリーをクローンし、モデルの重みと構成をダウンロードします。

# download config and pretrained model weights
# here we use tiny model, there are also better but larger models available in repository
WEIGHTS_LINK = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/models/segmenter/checkpoints/ade20k/seg_tiny_mask/checkpoint.pth"
CONFIG_LINK = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/models/segmenter/checkpoints/ade20k/seg_tiny_mask/variant.yml"

MODEL_DIR = Path("model/")
MODEL_DIR.mkdir(exist_ok=True)

download_file(WEIGHTS_LINK, directory=MODEL_DIR, show_progress=True)
download_file(CONFIG_LINK, directory=MODEL_DIR, show_progress=True)

WEIGHT_PATH = MODEL_DIR / "checkpoint.pth"
CONFIG_PATH = MODEL_DIR / "variant.yaml"
model/checkpoint.pth:   0%|          | 0.00/26.4M [00:00<?, ?B/s]
model/variant.yml:   0%|          | 0.00/940 [00:00<?, ?B/s]

PyTorch モデルのロード

PyTorch モデルは通常、モデルの重みを含む状態辞書によって初期化された torch.nn.Module クラスのインスタンスです。 したがって、モデルを取得する一般的な手順は次のとおりです。

  1. モデルクラスのインスタンスを作成します。

  2. 事前トレーニングされたモデルの重みを含むチェックポイント状態辞書をロードします。

  3. モデルを評価モードに切り替え、一部の操作を推論モードに切り替えます。

ここで、リポジトリーからすでに提供されているヘルパー関数を使用してモデルを初期化します。

from segmenter.segm.model.factory import load_model

pytorch_model, config = load_model(WEIGHT_PATH)
# put model into eval mode, to set it for inference
pytorch_model.eval()
print("PyTorch model loaded and ready for inference.")
PyTorch model loaded and ready for inference.

設定ファイルから正規化設定を読み込みます。

from segmenter.segm.data.utils import STATS
# load normalization name, in our case "vit" since we are using transformer
normalization_name = config["dataset_kwargs"]["normalization"]
# load normalization params, mean and std from STATS
normalization = STATS[normalization_name]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

前処理と可視化関数の準備

次に、前処理と結果の視覚化のためのユーティリティー関数を定義します。

前処理

推論入力は、B, C, H, W 形式の形状 [1, 3, H, W] のテンソルです。

  • B - バッチサイズ (この場合は 1 です。unsqueeze で 1 を追加するだけです)

  • C - 画像チャネル (この場合は RGB - 3)

  • H - 画像の髙さ

  • W - 画像の幅

正しいスケールへのサイズ変更とバッチへの分割は推論内で行われるため、前処理で画像のサイズを変更したり分割したりする必要はありません。

モデルは、RGB チャネル形式の画像を想定しており、[0, 1] の範囲にスケーリングされ、config.yml で指定された平均と標準偏差で正規化されています。

from PIL import Image
import torch
import torchvision.transforms.functional as F


def preprocess(im: Image, normalization: dict) -> torch.Tensor:
    """
    Preprocess image: scale, normalize and unsqueeze

    :param im: input image
    :param normalization: dictionary containing normalization data from config file
    :return:
            im: processed (scaled and normalized) image
    """
    # change PIL image to tensor and scale to [0, 1]
    im = F.pil_to_tensor(im).float() / 255
    # normalize by given mean and standard deviation
    im = F.normalize(im, normalization["mean"], normalization["std"])
    # change dim from [C, H, W] to [1, C, H, W]
    im = im.unsqueeze(0)

    return im

可視化

推論出力には各ピクセルに割り当てられたラベルが含まれるため、この場合の出力は CL, H, W 形式の [150, H, W] になります。

  • CL - ラベルのクラスの数 (この場合は 150)

  • H - 画像の髙さ

  • W - 画像の幅

この出力を視覚化するため、次元を [1, H, W] に減らし、予測ラベルである最高値を持つクラスのみを保持します。次に、元の画像と、推測されたラベルに対応する色を組み合わせます。

from segmenter.segm.data.utils import dataset_cat_description, seg_to_rgb
from segmenter.segm.data.ade20k import ADE20K_CATS_PATH


def apply_segmentation_mask(pil_im: Image, results: torch.Tensor) -> Image:
    """
    Combine segmentation masks with the image

    :param pil_im: original input image
    :param results: tensor containing segmentation masks for each pixel
    :return:
            pil_blend: image with colored segmentation masks overlay
    """
    cat_names, cat_colors = dataset_cat_description(ADE20K_CATS_PATH)

    # 3D array, where each pixel has values for all classes, take index of max as label
    seg_map = results.argmax(0, keepdim=True)
    # transform label id to colors
    seg_rgb = seg_to_rgb(seg_map, cat_colors)
    seg_rgb = (255 * seg_rgb.cpu().numpy()).astype(np.uint8)
    pil_seg = Image.fromarray(seg_rgb[0])

    # overlay segmentation mask over original image
    pil_blend = Image.blend(pil_im, pil_seg, 0.5).convert("RGB")

    return pil_blend

元のモデルの推論の検証

準備がすべて整ったので、サンプル画像 coco_hollywood.jpg のセグメント化を実行できます。

from segmenter.segm.model.utils import inference

# load image with PIL
image = load_image("https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco_hollywood.jpg")
# load_image reads the image in BGR format, [:,:,::-1] reshape transfroms it to RGB
pil_image = Image.fromarray(image[:,:,::-1])

# preprocess image with normalization params loaded in previous steps
image = preprocess(pil_image, normalization)

# inference function needs some meta parameters, where we specify that we don't flip images in inference mode
im_meta = dict(flip=False)
# perform inference with function from repository
original_results = inference(model=pytorch_model,
                                        ims=[image],
                                        ims_metas=[im_meta],
                                        ori_shape=image.shape[2:4],
                                        window_size=config["inference_kwargs"]["window_size"],
                                        window_stride=config["inference_kwargs"]["window_stride"],
                                        batch_size=2)

推論が完了したら、前のステップのヘルパー関数を使用して、出力を各クラスに指定された色があるセグメント化マスクに変換する必要があります。

# combine segmentation mask with image
blended_image = apply_segmentation_mask(pil_image, original_results)

# show image with segmentation mask overlay
blended_image
../_images/204-segmenter-semantic-segmentation-with-output_21_0.png

モデルが画像を意味のある部分に分割していることが分かります。非常に小さなモデルを使用しているため、結果は大きなモデルほど良くはありませんが、すでに優れたセグメント化のパフォーマンスを示しています。

PyTorch モデルを OpenVINO 中間表現 (IR) に変換

PyTorch モデルの推論が機能することを確認したので、これを OpenVINO IR 形式に変換します。

これを行うには、モデル構成ファイルから入力次元を取得し、トーチダミー入力を作成します。入力次元は、[2, 3, 512, 512] B, C, H, W] 形式です。

  • B - バッチサイズ

  • C - 画像チャネル (この場合は RGB - 3)

  • H - モデル入力画像の高さ

  • W - モデル入力画像の幅

モデルで必要なため、ここでは H と W は 512 に固定されていることに注意してください。サイズ変更は、元のリポジトリーからの推論関数内で実行されます。

その後、PyTorch の ov.convert_model 関数を使用してモデルを OpenVINO モデルに変換します。これは Python インターフェイスですぐに使用できますが、ov.save_model を使用して将来実行に備えて OpenVINO IR 形式にシリアル化することもできます。このプロセスではいくつかの警告が生成される可能性がありますが、問題はありません。

import openvino as ov

# get input sizes from config file
batch_size = 2
channels = 3
image_size = config["dataset_kwargs"]["image_size"]

# make dummy input with correct shapes obtained from config file
dummy_input = torch.randn(batch_size, channels, image_size, image_size)

model = ov.convert_model(pytorch_model, example_input=dummy_input, input=([batch_size, channels, image_size, image_size], ))
# serialize model for saving IR
ov.save_model(model, MODEL_DIR / "segmenter.xml")
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/notebooks/204-segmenter-semantic-segmentation/./segmenter/segm/model/utils.py:69: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if H % patch_size > 0:
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/notebooks/204-segmenter-semantic-segmentation/./segmenter/segm/model/utils.py:71: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if W % patch_size > 0:
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/notebooks/204-segmenter-semantic-segmentation/./segmenter/segm/model/vit.py:122: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if x.shape[1] != pos_embed.shape[1]:
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/notebooks/204-segmenter-semantic-segmentation/./segmenter/segm/model/decoder.py:100: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  masks = rearrange(masks, "b (h w) n -> b n h w", h=int(GS))
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/notebooks/204-segmenter-semantic-segmentation/./segmenter/segm/model/utils.py:85: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if extra_h > 0:
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/notebooks/204-segmenter-semantic-segmentation/./segmenter/segm/model/utils.py:87: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if extra_w > 0:

変換されたモデル推論を検証

モデルが正常に変換されたことを確認するには、元のリポジトリーと同じ推論関数を使用できますが、カスタムクラスを作成する必要があります。

SegmenterOV クラスには、推論関数に必要なすべての属性とメソッドを備えた OpenVINO モデルが含まれています。この方法では、入力を処理するのに追加のカスタムコードを記述する必要がありません。

class SegmenterOV:
    """
    Class containing OpenVINO model with all attributes required to work with inference function.

    :param model: compiled OpenVINO model
    :type model: CompiledModel
    :param output_blob: output blob used in inference
    :type output_blob: ConstOutput
    :param config: config file containing data about model and its requirements
    :type config: dict
    :param n_cls: number of classes to be predicted
    :type n_cls: int
    :param normalization:
    :type normalization: dict

    """

    def __init__(self, model_path: Path, device:str = "CPU"):
        """
        Constructor method.
        Initializes OpenVINO model and sets all required attributes

        :param model_path: path to model's .xml file, also containing variant.yml
        :param device: device string for selecting inference device
        """
        # init OpenVINO core
        core = ov.Core()
        # read model
        model_xml = core.read_model(model_path)
        self.model = core.compile_model(model_xml, device)
        self.output_blob = self.model.output(0)

        # load model configs
        variant_path = Path(model_path).parent / "variant.yml"
        with open(variant_path, "r") as f:
            self.config = yaml.load(f, Loader=yaml.FullLoader)

        # load normalization specs from config
        normalization_name = self.config["dataset_kwargs"]["normalization"]
        self.normalization = STATS[normalization_name]

        # load number of classes from config
        self.n_cls = self.config["net_kwargs"]["n_cls"]

    def forward(self, data: torch.Tensor) -> torch.Tensor:
        """
        Perform inference on data and return the result in Tensor format

        :param data: input data to model
        :return: data inferred by model
        """
        return torch.from_numpy(self.model(data)[self.output_blob])

SegmenterOV ヘルパークラスを作成したので、推論関数で使用できます。

推論デバイスの選択

OpenVINO を使用して推論を実行するためにドロップダウン・リストからデバイスを選択します。

import ipywidgets as widgets

core = ov.Core()
device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value='AUTO',
    description='Device:',
    disabled=False,
)

device
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')
# load model into SegmenterOV class
model = SegmenterOV(MODEL_DIR / "segmenter.xml", device.value)
# perform inference with same function as in case of PyTorch model from repository
results = inference(model=model,
                    ims=[image],
                    ims_metas=[im_meta],
                    ori_shape=image.shape[2:4],
                    window_size=model.config["inference_kwargs"]["window_size"],
                    window_stride=model.config["inference_kwargs"]["window_stride"],
                    batch_size=2)
# combine segmentation mask with image
converted_blend = apply_segmentation_mask(pil_image, results)

# show image with segmentation mask overlay
converted_blend
../_images/204-segmenter-semantic-segmentation-with-output_32_0.png

ご覧のとおり、元のモデルと同じ結果が得られます。

変換されたモデルのパフォーマンスのベンチマーク

最後に、OpenVINO Benchmark ツールを使用して、モデルの推論パフォーマンスを測定します。

注: より正確なパフォーマンスを得るには、他のアプリケーションを閉じて、ターミナル/コマンドプロンプトで benchmark_app を実行することを推奨します。benchmark_app -m model.xml -d CPU を実行して、CPU で非同期推論のベンチマークを 1 分間実行します。GPU でベンチマークを行うには、CPUGPU に変更します。benchmark_app --help を実行すると、すべてのコマンドライン・オプションが表示されます。

論文の著者は、次のスループットを得るため CPU よりも大幅に強力な V100 GPU を使用していることに留意してください。したがって、FPS を直接比較することはできません。

device
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')
# Inference FP32 model (OpenVINO IR)
!benchmark_app -m ./model/segmenter.xml -d $device.value -api async
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ WARNING ] Default duration 120 seconds is used for unknown device AUTO
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2023.3.0-13775-ceeafaf64f3-releases/2023/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] AUTO
[ INFO ] Build ................................. 2023.3.0-13775-ceeafaf64f3-releases/2023/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(AUTO) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 23.09 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     im (node: im) : f32 / [...] / [2,3,512,512]
[ INFO ] Model outputs:
[ INFO ]     y (node: aten::upsample_bilinear2d/Interpolate) : f32 / [...] / [2,150,512,512]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 2
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     im (node: im) : u8 / [N,C,H,W] / [2,3,512,512]
[ INFO ] Model outputs:
[ INFO ]     y (node: aten::upsample_bilinear2d/Interpolate) : f32 / [...] / [2,150,512,512]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 385.39 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: Model0
[ INFO ]   EXECUTION_DEVICES: ['CPU']
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6
[ INFO ]   MULTI_DEVICE_PRIORITIES: CPU
[ INFO ]   CPU:
[ INFO ]     AFFINITY: Affinity.CORE
[ INFO ]     CPU_DENORMALS_OPTIMIZATION: False
[ INFO ]     CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ]     ENABLE_CPU_PINNING: True
[ INFO ]     ENABLE_HYPER_THREADING: True
[ INFO ]     EXECUTION_DEVICES: ['CPU']
[ INFO ]     EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]     INFERENCE_NUM_THREADS: 24
[ INFO ]     INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]     NETWORK_NAME: Model0
[ INFO ]     NUM_STREAMS: 6
[ INFO ]     OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6
[ INFO ]     PERFORMANCE_HINT: THROUGHPUT
[ INFO ]     PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]     PERF_COUNT: NO
[ INFO ]     SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ]   MODEL_PRIORITY: Priority.MEDIUM
[ INFO ]   LOADED_FROM_CACHE: False
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'im'!. This input will be filled with random values!
[ INFO ] Fill input 'im' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 6 inference requests, limits: 120000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 210.45 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['CPU']
[ INFO ] Count:            1686 iterations
[ INFO ] Duration:         120531.12 ms
[ INFO ] Latency:
[ INFO ]    Median:        429.25 ms
[ INFO ]    Average:       428.34 ms
[ INFO ]    Min:           354.96 ms
[ INFO ]    Max:           506.55 ms
[ INFO ] Throughput:   27.98 FPS