OneFormer と OpenVINO によるユニバーサルなセグメント化¶

この Jupyter ノートブックは、ローカルへのインストール後にのみ起動できます。

このチュートリアルでは、Hugging Face の OneFormer モデルを OpenVINO で使用する方法を説明します。重みをダウンロードし、Hugging Face Transformers ライブラリーを使用して PyTorch モデルを作成し、次に OpenVINO モデル・オプティマイザー API を使用してモデルを OpenVINO 中間表現形式 (IR) に変換し、モデル推論を実行する方法について説明します。さらに、NNCF 量子化が適用され OneFormer セグメント化の速度が向上します。

OneFormer は、Mask2Former の後継です。後者で、最先端の結果を得るには、インスタンス/セマンティック/パノプティック・データセットを個別にトレーニングする必要があります。

OneFormer は、Mask2Former フレームワークにテキストモジュールを組み込み、それぞれのサブタスク (インスタンス、セマンティック、またはパノプティック) でモデルを条件付けします。ただし、これによりさらに正確な結果が得られますが、レイテンシーが増加するコストが伴います。

目次¶

必要なライブラリーをインストール
環境の準備
ユニバーサルなセグメント化のため COCO に微調整された OneFormer をロード
モデルを OpenVINO IR 形式に変換
推論デバイスの選択
セグメント化タスクを選択
推論
量子化
インタラクティブなデモ

必要なライブラリーをインストール¶

                                        %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu "transformers>=4.26.0" "openvino>=2023.1.0" "nncf>=2.6.0" gradio torch scipy ipywidgets Pillow matplotlib

                                    

                                        Note: you may need to restart the kernel to use updated packages.

                                    

環境の準備¶

必要なパッケージをすべてインポートし、モデルと定数変数のパスを設定します。

                                        import warnings
from collections import defaultdict
from pathlib import Path
import sys

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
from transformers.models.oneformer.modeling_oneformer import OneFormerForUniversalSegmentationOutput
import torch
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from PIL import Image
from PIL import ImageOps

import openvino

sys.path.append("../utils")
from notebook_utils import download_file

                                    

                                        IR_PATH = Path("oneformer.xml")
OUTPUT_NAMES = ['class_queries_logits', 'masks_queries_logits']

ユニバーサルなセグメント化のため COCO に微調整された OneFormer をロード¶

ここでは、OneFormerForUniversalSegmentation の from_pretrained メソッドを使用して、Swin-L バックボーンに基づいて COCO データセットでトレーニングされた Hugging Face OneFormer モデルを読み込みます。

また、Hugging Face プロセッサーを使用して、画像からのモデル入力を準備し、視覚化のためモデル出力を後処理します。

                                        processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained(
    "shi-labs/oneformer_coco_swin_large",
)
id2label = model.config.id2label

                                    

2023-10-06 14:00:53.306851: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-10-06 14:00:53.342792: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-06 14:00:53.913248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/nsavel/venvs/ov_notebooks_tmp/lib/python3.8/site-packages/transformers/models/oneformer/image_processing_oneformer.py:427: FutureWarning: The reduce_labels argument is deprecated and will be removed in v4.27. Please use do_reduce_labels instead.
  warnings.warn(

                                        task_seq_length = processor.task_seq_length
shape = (800, 800)
dummy_input = {
    "pixel_values": torch.randn(1, 3, *shape),
    "task_inputs": torch.randn(1, task_seq_length)
}

                                    

モデルを OpenVINO IR 形式に変換¶

PyTorch モデルを IR 形式に変換して、OpenVINO 最適化ツールと機能を活用します。OpenVINO コンバーターの openvino.convert_model Python 関数を使用してモデルを変換できます。この関数は、Python インターフェイスで使用できる OpenVINO モデルクラスのインスタンスを返します。ただし、save_model 関数を使用して、将来の実行に向けて OpenVINO IR 形式にシリアル化することもできます。PyTorch から OpenVINO への変換は、TorchScript トレースをベースにしています。Hugging Face モデルには、モデルのトレースに適したものにできる特定の構成パラメーター torchscript があります。モデルの準備用に、PyTorch モデル・インスタンスとサンプル入力を openvino.convert_model に提供する必要があります。

                                        model.config.torchscript = True

if not IR_PATH.exists():
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        model = openvino.convert_model(model, example_input=dummy_input)
    openvino.save_model(model, IR_PATH, compress_to_fp16=False)

                                    

                                        WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11.

                                    

                                        [ WARNING ]  Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s.

                                    

推論デバイスの選択¶

OpenVINO を使用して推論を実行するデバイスをドロップダウン・リストから選択します。

                                        import ipywidgets as widgets

core = openvino.Core()

device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value='AUTO',
    description='Device:',
    disabled=False,
)

device

                                    

                                        Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

                                    

Hugging Face プロセッサーを使用して画像を準備できます。OneFormer は、内部的に画像プロセッサー (画像モダリティー用) とトークナイザー (テキスト・モダリティー用) で構成されるプロセッサーを活用します。OneFormer は、画像のセグメント化を解決するため画像とテキストの両方を組み込んでいるため、実際にはマルチモーダル・モデルです。

                                        def prepare_inputs(image: Image.Image, task: str):
    """Convert image to model input"""
    image = ImageOps.pad(image, shape)
    inputs = processor(image, [task], return_tensors="pt")
    converted = {
        'pixel_values': inputs['pixel_values'],
        'task_inputs': inputs['task_inputs']
    }
    return converted

                                    

                                        def process_output(d):
    """Convert OpenVINO model output to Hugging Face representation for visualization"""
    hf_kwargs = {
        output_name: torch.tensor(d[output_name]) for output_name in OUTPUT_NAMES
    }

    return OneFormerForUniversalSegmentationOutput(**hf_kwargs)

                                    

                                        # Read the model from files.
model = core.read_model(model=IR_PATH)
# Compile the model.
compiled_model = core.compile_model(model=model, device_name=device.value)

                                    

モデルは、形状 (batch_size, num_queries) の class_queries_logits と形状 (batch_size, num_queries, height, width) の masks_queries_logits を予測します。

ここでは、推論結果を示すネットワーク出力を視覚化する関数を定義します。

                                        class Visualizer:
    @staticmethod
    def extract_legend(handles):
        fig = plt.figure()
        fig.legend(handles=handles, ncol=len(handles) // 20 + 1, loc='center')
        fig.tight_layout()
        return fig

    @staticmethod
    def predicted_semantic_map_to_figure(predicted_map):
        segmentation = predicted_map[0]
        # get the used color map
        viridis = plt.get_cmap('viridis', max(1, torch.max(segmentation)))
        # get all the unique numbers
        labels_ids = torch.unique(segmentation).tolist()
        fig, ax = plt.subplots()
        ax.imshow(segmentation)
        ax.set_axis_off()
        handles = []
        for label_id in labels_ids:
            label = id2label[label_id]
            color = viridis(label_id)
            handles.append(mpatches.Patch(color=color, label=label))
        fig_legend = Visualizer.extract_legend(handles=handles)
        fig.tight_layout()
        return fig, fig_legend

    @staticmethod
    def predicted_instance_map_to_figure(predicted_map):
        segmentation = predicted_map[0]['segmentation']
        segments_info = predicted_map[0]['segments_info']
        # get the used color map
        viridis = plt.get_cmap('viridis', max(torch.max(segmentation), 1))
        fig, ax = plt.subplots()
        ax.imshow(segmentation)
        ax.set_axis_off()
        instances_counter = defaultdict(int)
        handles = []
        # for each segment, draw its legend
        for segment in segments_info:
            segment_id = segment['id']
            segment_label_id = segment['label_id']
            segment_label = id2label[segment_label_id]
            label = f"{segment_label}-{instances_counter[segment_label_id]}"
            instances_counter[segment_label_id] += 1
            color = viridis(segment_id)
            handles.append(mpatches.Patch(color=color, label=label))

        fig_legend = Visualizer.extract_legend(handles)
        fig.tight_layout()
        return fig, fig_legend

    @staticmethod
    def predicted_panoptic_map_to_figure(predicted_map):
        segmentation = predicted_map[0]['segmentation']
        segments_info = predicted_map[0]['segments_info']
        # get the used color map
        viridis = plt.get_cmap('viridis', max(torch.max(segmentation), 1))
        fig, ax = plt.subplots()
        ax.imshow(segmentation)
        ax.set_axis_off()
        instances_counter = defaultdict(int)
        handles = []
        # for each segment, draw its legend
        for segment in segments_info:
            segment_id = segment['id']
            segment_label_id = segment['label_id']
            segment_label = id2label[segment_label_id]
            label = f"{segment_label}-{instances_counter[segment_label_id]}"
            instances_counter[segment_label_id] += 1
            color = viridis(segment_id)
            handles.append(mpatches.Patch(color=color, label=label))

        fig_legend = Visualizer.extract_legend(handles)
        fig.tight_layout()
        return fig, fig_legend

    @staticmethod
    def figures_to_images(fig, fig_legend, name_suffix=""):
        seg_filename, leg_filename = f"segmentation{name_suffix}.png", f"legend{name_suffix}.png"
        fig.savefig(seg_filename, bbox_inches="tight")
        fig_legend.savefig(leg_filename, bbox_inches="tight")
        segmentation = Image.open(seg_filename)
        legend = Image.open(leg_filename)
        return segmentation, legend

                                    

                                        def segment(model, img: Image.Image, task: str):
    """
    Apply segmentation on an image.

    Args:
        img: Input image. It will be resized to 800x800.
        task: String describing the segmentation task. Supported values are: "semantic", "instance" and "panoptic".
    Returns:
        Tuple[Figure, Figure]: Segmentation map and legend charts.
    """
    if img is None:
        raise gr.Error("Please load the image or use one from the examples list")
    inputs = prepare_inputs(img, task)
    outputs = model(inputs)
    hf_output = process_output(outputs)
    predicted_map = getattr(processor, f"post_process_{task}_segmentation")(
        hf_output, target_sizes=[img.size[::-1]]
    )
    return getattr(Visualizer, f"predicted_{task}_map_to_figure")(predicted_map)

                                    

                                        image = download_file("http://images.cocodataset.org/val2017/000000439180.jpg", "sample.jpg")
image = Image.open("sample.jpg")
image

                                    

sample.jpg:   0%|          | 0.00/194k [00:00<?, ?B/s]

../_images/249-oneformer-segmentation-with-output_23_1.png

セグメント化タスクを選択¶

                                        from ipywidgets import Dropdown

task = Dropdown(options=["semantic", "instance", "panoptic"], value="semantic")
task

                                        Dropdown(options=('semantic', 'instance', 'panoptic'), value='semantic')

                                    

推論¶

                                        import matplotlib
matplotlib.use("Agg")  # disable showing figures

def stack_images_horizontally(img1: Image, img2: Image):
    res = Image.new("RGB", (img1.width + img2.width, max(img1.height, img2.height)), (255, 255,255))
    res.paste(img1, (0, 0))
    res.paste(img2, (img1.width, 0))
    return res

segmentation_fig, legend_fig = segment(compiled_model, image, task.value)
segmentation_image, legend_image = Visualizer.figures_to_images(segmentation_fig, legend_fig)
plt.close("all")
prediction = stack_images_horizontally(segmentation_image, legend_image)
prediction

                                    

../_images/249-oneformer-segmentation-with-output_27_0.png

量子化¶

NNCF は、量子化レイヤーをモデルグラフに追加し、トレーニング・データセットのサブセットを使用してこれらの追加の量子化レイヤーのパラメーターを初期化することで、トレーニング後の量子化を可能にします。量子化操作は FP32/FP16 ではなく INT8 で実行されるため、モデル推論が高速化されます。

最適化プロセスには次の手順が含まれます。

量子化用のキャリブレーション・データセットを作成します。
nncf.quantize() を実行して、量子化されたモデルを取得します。
openvino.save_model() 関数を使用して INT8 モデルをシリアル化します。

注: 量子化は時間とメモリーを消費する操作です。以下の量子化コードの実行には時間がかかる場合があります。

モデルの推論速度を向上させるため量子化を実行するかどうかを以下で選択してください。

                                        compiled_quantized_model = None

to_quantize = widgets.Checkbox(
    value=False,
    description='Quantization',
    disabled=False,
)

to_quantize

                                    

                                        Checkbox(value=True, description='Quantization')

                                    

to_quantize が選択されていない場合に量子化をスキップするスキップマジック拡張機能をロードします。

                                        import sys
sys.path.append("../utils")

%load_ext skip_kernel_extension

キャリブレーション・データセットの準備¶

COCO128 データセットの画像をキャリブレーション・サンプルとして使用します。

                                            %%skip not $to_quantize.value

import nncf
import torch.utils.data as data

from zipfile import ZipFile

DATA_URL = "https://ultralytics.com/assets/coco128.zip"
OUT_DIR = Path('.')


class COCOLoader(data.Dataset):
    def __init__(self, images_path):
        self.images = list(Path(images_path).iterdir())

    def __getitem__(self, index):
        image = Image.open(self.images[index])
        if image.mode == 'L':
            rgb_image = Image.new("RGB", image.size)
            rgb_image.paste(image)
            image = rgb_image
        return image

    def __len__(self):
        return len(self.images)


def download_coco128_dataset():
    download_file(DATA_URL, directory=OUT_DIR, show_progress=True)
    if not (OUT_DIR / "coco128/images/train2017").exists():
        with ZipFile('coco128.zip' , "r") as zip_ref:
            zip_ref.extractall(OUT_DIR)
    coco_dataset = COCOLoader(OUT_DIR / 'coco128/images/train2017')
    return coco_dataset


def transform_fn(image):
    # We quantize model in panoptic mode because it produces optimal results for both semantic and instance segmentation tasks
    inputs = prepare_inputs(image, "panoptic")
    return inputs


coco_dataset = download_coco128_dataset()
calibration_dataset = nncf.Dataset(coco_dataset, transform_fn)

                                        

                                            INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino

                                        

coco128.zip:   0%|          | 0.00/6.66M [00:00<?, ?B/s]

量子化を実行¶

以下では、OneFormer モデルに量子化を適用するため nncf.quantize() を呼び出します。

                                            %%skip not $to_quantize.value

INT8_IR_PATH = Path(str(IR_PATH).replace(".xml", "_int8.xml"))

if not INT8_IR_PATH.exists():
    quantized_model = nncf.quantize(
        model,
        calibration_dataset,
        model_type=nncf.parameters.ModelType.TRANSFORMER,
        preset=nncf.QuantizationPreset.MIXED,
        subset_size=len(coco_dataset),
        # smooth_quant_alpha value of 0.5 was selected based on prediction quality visual examination
        advanced_parameters=nncf.AdvancedQuantizationParameters(smooth_quant_alpha=0.5))
    openvino.save_model(quantized_model, INT8_IR_PATH)
else:
    quantized_model = core.read_model(INT8_IR_PATH)
compiled_quantized_model = core.compile_model(model=quantized_model, device_name=device.value)

                                        

                                            Statistics collection: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [03:55<00:00,  1.84s/it]
Applying Smooth Quant: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 216/216 [00:18<00:00, 11.89it/s]

                                            INFO:nncf:105 ignored nodes was found by name in the NNCFGraph

                                        

                                            Statistics collection: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [09:24<00:00,  4.41s/it]
Applying Fast Bias correction: 100%|██████████████████████████████████████████████████████████████████████████████████████| 338/338 [03:20<00:00,  1.68it/s]

元のモデル予測の次に量子化モデル予測を見てみましょう。

                                            %%skip not $to_quantize.value

from IPython.display import display

image = Image.open("sample.jpg")
segmentation_fig, legend_fig = segment(compiled_quantized_model, image, task.value)
segmentation_image, legend_image = Visualizer.figures_to_images(segmentation_fig, legend_fig, name_suffix="_int8")
plt.close("all")
prediction_int8 = stack_images_horizontally(segmentation_image, legend_image)
print("Original model prediction:")
display(prediction)
print("Quantized model prediction:")
display(prediction_int8)

                                        

                                            Original model prediction:

                                        

../_images/249-oneformer-segmentation-with-output_39_1.png

                                            Quantized model prediction:

                                        

../_images/249-oneformer-segmentation-with-output_39_3.png

モデルのサイズとパフォーマンスを比較¶

以下では、元のモデルと量子化されたモデルのフットプリントと推論速度を比較します。

                                            %%skip not $to_quantize.value

import time
import numpy as np
from tqdm.auto import tqdm

INFERENCE_TIME_DATASET_SIZE = 30

def calculate_compression_rate(model_path_ov, model_path_ov_int8):
    model_size_fp32 = model_path_ov.with_suffix(".bin").stat().st_size / 1024
    model_size_int8 = model_path_ov_int8.with_suffix(".bin").stat().st_size / 1024
    print("Model footprint comparison:")
    print(f"    * FP32 IR model size: {model_size_fp32:.2f} KB")
    print(f"    * INT8 IR model size: {model_size_int8:.2f} KB")
    return model_size_fp32, model_size_int8


def calculate_call_inference_time(model):
    inference_time = []
    for i in tqdm(range(INFERENCE_TIME_DATASET_SIZE), desc="Measuring performance"):
        image = coco_dataset[i]
        start = time.perf_counter()
        segment(model, image, task.value)
        end = time.perf_counter()
        delta = end - start
        inference_time.append(delta)
    return np.median(inference_time)


time_fp32 = calculate_call_inference_time(compiled_model)
time_int8 = calculate_call_inference_time(compiled_quantized_model)

model_size_fp32, model_size_int8 = calculate_compression_rate(IR_PATH, INT8_IR_PATH)

print(f"Model footprint reduction: {model_size_fp32 / model_size_int8:.3f}")
print(f"Performance speedup: {time_fp32 / time_int8:.3f}")

                                        

Measuring performance:   0%|          | 0/30 [00:00<?, ?it/s]

Measuring performance:   0%|          | 0/30 [00:00<?, ?it/s]

                                            Model footprint comparison:
    * FP32 IR model size: 899385.45 KB
    * INT8 IR model size: 237545.83 KB
Model footprint reduction: 3.786
Performance speedup: 1.260

                                        

インタラクティブなデモ¶

                                        import time
import gradio as gr

quantized_model_present = compiled_quantized_model is not None


def compile_model(device):
    global compiled_model
    global compiled_quantized_model
    compiled_model = core.compile_model(model=model, device_name=device)
    if quantized_model_present:
        compiled_quantized_model = core.compile_model(model=quantized_model, device_name=device)

def segment_wrapper(image, task, run_quantized=False):
    current_model = compiled_quantized_model if run_quantized else compiled_model

    start_time = time.perf_counter()
    segmentation_fig, legend_fig = segment(current_model, image, task)
    end_time = time.perf_counter()

    name_suffix = "" if not quantized_model_present else "_int8" if run_quantized else "_fp32"
    segmentation_image, legend_image = Visualizer.figures_to_images(segmentation_fig, legend_fig, name_suffix=name_suffix)
    plt.close("all")
    result = stack_images_horizontally(segmentation_image, legend_image)
    return result, f"{end_time - start_time:.2f}"


with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            inp_img = gr.Image(label="Image", type="pil")
            inp_task = gr.Radio(
                ["semantic", "instance", "panoptic"], label="Task", value="semantic"
            )
            inp_device = gr.Dropdown(
                label="Device", choices=core.available_devices + ["AUTO"], value="AUTO"
            )
        with gr.Column():
            out_result = gr.Image(label="Result (Original)" if quantized_model_present else "Result")
            inference_time = gr.Textbox(label="Time (seconds)")
            out_result_quantized = gr.Image(label="Result (Quantized)", visible=quantized_model_present)
            inference_time_quantized = gr.Textbox(label="Time (seconds)", visible=quantized_model_present)
    run_button = gr.Button(value="Run")
    run_button.click(segment_wrapper, [inp_img, inp_task, gr.Number(0, visible=False)], [out_result, inference_time])
    run_quantized_button = gr.Button(value="Run quantized", visible=quantized_model_present)
    run_quantized_button.click(segment_wrapper, [inp_img, inp_task, gr.Number(1, visible=False)], [out_result_quantized, inference_time_quantized])
    gr.Examples(
        examples=[["sample.jpg", "semantic"]], inputs=[inp_img, inp_task]
    )


    def on_device_change_begin():
        return (
            run_button.update(value="Changing device...", interactive=False),
            run_quantized_button.update(value="Changing device...", interactive=False),
            inp_device.update(interactive=False)
        )

    def on_device_change_end():
        return (
            run_button.update(value="Run", interactive=True),
            run_quantized_button.update(value="Run quantized", interactive=True),
            inp_device.update(interactive=True)
        )

    inp_device.change(on_device_change_begin, outputs=[run_button, run_quantized_button, inp_device]).then(
        compile_model, inp_device
    ).then(on_device_change_end, outputs=[run_button, run_quantized_button, inp_device])

try:
    demo.launch(debug=False)
except Exception:
    demo.launch(share=True, debug=False)
# if you are launching remotely, specify server_name and server_port
# demo.launch(server_name='your server name', server_port='server port in int')
# Read more in the docs: https://gradio.app/docs/

                                    

Running on local URL:  http://127.0.0.1:7860

To create a public link, set share=True in launch().