TensorFlow Lite モデルを OpenVINO™ へ変換#

この Jupyter ノートブックはオンラインで起動でき、ブラウザーのウィンドウで対話型環境を開きます。ローカルにインストールすることもできます。次のオプションのいずれかを選択します:

TensorFlow Lite (TFLite とも呼ばれます) は、マシン・ラーニング・モデルをエッジデバイスにデプロイするために開発されたオープンソース・ライブラリーです。

このチュートリアルでは、モデル・コンバーターを使用して TensorFlow Lite EfficientNet-Lite-B0 画像分類モデルを OpenVINO 中間表現 (OpenVINO IR) 形式に変換する方法を示します。OpenVINO IR を作成した後、OpenVINO ランタイムにモデルをロードし、サンプルイメージを使用して推論を実行します。

目次:

準備
- 要件をインストール
- インポート
TFLite モデルのダウンロード
モデルを OpenVINO IR 形式に変換
OpenVINO TensorFlow Lite フロントエンドを使用してモデルをロード
OpenVINO モデル推論を実行
- 推論デバイスの選択
モデルのパフォーマンスを推定

準備#

要件をインストール#

%pip install -q "openvino>=2023.1.0" 
%pip install -q opencv-python requests tqdm kagglehub Pillow 

# Fetch `notebook_utils` module 
import requests 

r = requests.get( 

url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py", 
) 

open("notebook_utils.py", "w").write(r.text)

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

インポート#

from pathlib import Path 
import numpy as np 
from PIL import Image 
import openvino as ov 

from notebook_utils import download_file, load_image

TFLite モデルのダウンロード#

import kagglehub 

model_dir = kagglehub.model_download("tensorflow/efficientnet/tfLite/lite0-fp32") tflite_model_path = Path(model_dir) / "2.tflite" 

ov_model_path = tflite_model_path.with_suffix(".xml")

モデルを OpenVINO IR 形式に変換#

TFLite モデルを OpenVINO IR に変換するには、モデル変換 Python API を使用できます。ov.convert_model 関数は、TFLite モデルへのパスを受け取り、このモデルを表す OpenVINO Model クラスのインスタンスを返します。取得したモデルはすぐに使用でき、ov.compile_model を使用してデバイスにロードするか、ov.save_model 関数を使用してディスクに保存して次回の実行時のロード時間を短縮できます。デフォルトでは、モデルの重みは ov.save_model によるシリアル化中に FP16 に圧縮されます。モデル変換の詳細については、このページを参照してください。TensorFlow Lite モデルのサポートについては、このチュートリアルをご覧ください。

ov_model = ov.convert_model(tflite_model_path) 
ov.save_model(ov_model, ov_model_path) 
print(f"Model {tflite_model_path} successfully converted and saved to {ov_model_path}")

Model /opt/home/k8sworker/.cache/kagglehub/models/tensorflow/efficientnet/tfLite/lite0-fp32/2/2.tflite successfully converted and saved to /opt/home/k8sworker/.cache/kagglehub/models/tensorflow/efficientnet/tfLite/lite0-fp32/2/2.xml

OpenVINO TensorFlow Lite フロントエンドを使用してモデルをロード#

TensorFlow Lite モデルは FrontEnd API 経由でサポートされます。IR への変換をスキップし、OpenVINO ランタイム API によってモデルを直接読み取ることもできます。フロントエンド API 経由でサポートされる読み取り形式の他の例については、このチュートリアルをご覧ください。

core = ov.Core()
 
model = core.read_model(ir_path)

OpenVINO モデル推論を実行#

モデル入力の前処理に関する情報は、TensorFlow Hub の説明にあります。

image = 
load_image("https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco_bricks.png") 
# load_image は BGR 形式で画像を読み込み、[:,:,::-1] reshape はそれを RGB に変換 
image = Image.fromarray(image[:, :, ::-1]) 
resized_image = image.resize((224, 224)) 
input_tensor = np.expand_dims((np.array(resized_image).astype(np.float32) - 127) / 128, 0)

推論デバイスの選択#

OpenVINO を使用して推論を実行するためにドロップダウン・リストからデバイスを選択します

import ipywidgets as widgets 

device = widgets.Dropdown( 
    options=core.available_devices + ["AUTO"], 
    value="AUTO", 
    description="Device:", 
    disabled=False, 
) 

device

Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

compiled_model = core.compile_model(ov_model, device.value) 
predicted_scores = compiled_model(input_tensor)[0]

imagenet_classes_file_path = 
download_file("https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/datasets/imagenet/imagenet_2012.txt") 
imagenet_classes = open(imagenet_classes_file_path).read().splitlines() 

top1_predicted_cls_id = np.argmax(predicted_scores) 
top1_predicted_score = predicted_scores[0][top1_predicted_cls_id] 
predicted_label = imagenet_classes[top1_predicted_cls_id] 

display(image.resize((640, 512))) 
print(f"Predicted label: {predicted_label} with probability {top1_predicted_score :2f}")

imagenet_2012.txt: 0%|          | 0.00/30.9k [00:00<?, ?B/s]

../_images/tflite-to-openvino-with-output_16_1.png

Predicted label: n02109047 Great Dane with probability 0.715318

モデルのパフォーマンスを推定#

ベンチマーク・ツールは、CPU および GPU でモデルの推論パフォーマンスを測定するために使用されます。

注: より正確なパフォーマンスを得るには、他のアプリケーションを閉じて、ターミナル/コマンドプロンプトで benchmark_app を実行することを推奨します。benchmark_app -m model.xml -d CPU を実行して、CPU で非同期推論のベンチマークを 1 分間実行します。GPU でベンチマークを行うには、CPU を GPU に変更します。benchmark_app --help を実行すると、すべてのコマンドライン・オプションの概要が表示されます。

print(f"Benchmark model inference on {device.value}") 
!benchmark_app -m $ov_model_path -d $device.value -t 15

Benchmark model inference on AUTO 
[Step 1/11] Parsing and validating input arguments 
[ INFO ] Parsing input parameters 
[Step 2/11] Loading OpenVINO Runtime 
[ INFO ] OpenVINO: 
[ INFO ] Build .................................2024.4.0-16028-fe423b97163 
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] AUTO 
[ INFO ] Build .................................2024.4.0-16028-fe423b97163 
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration 
[ WARNING ] Performance hint was not explicitly specified in command line. Device(AUTO) performance hint will be set to PerformanceMode.THROUGHPUT. 
[Step 4/11] Reading model files 
[ INFO ] Loading model files 
[ INFO ] Read model took 9.14 ms 
[ INFO ] Original model I/O parameters: 
[ INFO ] Model inputs: 
[ INFO ]     images (node: images) : f32 / [...]/ [1,224,224,3] 
[ INFO ] Model outputs: 
[ INFO ]     Softmax (node: 61) : f32 / [...] / [1,1000] 
[Step 5/11] Resizing model to match image sizes and given batch 
[ INFO ] Model batch size: 1 
[Step 6/11] Configuring input of the model 
[ INFO ] Model inputs: 
[ INFO ]     images (node: images) : u8 / [N,H,W,C] / [1,224,224,3] 
[ INFO ] Model outputs: 
[ INFO ]     Softmax (node: 61) : f32 / [...]/ [1,1000] 
[Step 7/11] Loading the model to the device 
[ INFO ] Compile model took 146.63 ms 
[Step 8/11] Querying optimal runtime parameters 
[ INFO ] Model: 
[ INFO ]     NETWORK_NAME: TensorFlow_Lite_Frontend_IR 
[ INFO ]     EXECUTION_DEVICES: ['CPU'] 
[ INFO ]     PERFORMANCE_HINT: PerformanceMode.THROUGHPUT 
[ INFO ]     OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6 
[ INFO ]     MULTI_DEVICE_PRIORITIES: CPU 
[ INFO ]     CPU: 
[ INFO ]       AFFINITY: Affinity.CORE 
[ INFO ]       CPU_DENORMALS_OPTIMIZATION: False 
[ INFO ]       CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0 
[ INFO ]       DYNAMIC_QUANTIZATION_GROUP_SIZE: 32 
[ INFO ]       ENABLE_CPU_PINNING: True 
[ INFO ]       ENABLE_HYPER_THREADING: True 
[ INFO ]       EXECUTION_DEVICES: ['CPU'] 
[ INFO ]       EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE 
[ INFO ]       INFERENCE_NUM_THREADS: 24 
[ INFO ]       INFERENCE_PRECISION_HINT: <Type: 'float32'> 
[ INFO ]       KV_CACHE_PRECISION: <Type: 'float16'> 
[ INFO ]       LOG_LEVEL: Level.NO 
[ INFO ]       MODEL_DISTRIBUTION_POLICY: set() 
[ INFO ]       NETWORK_NAME: TensorFlow_Lite_Frontend_IR 
[ INFO ]       NUM_STREAMS: 6 
[ INFO ]       OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6 
[ INFO ]       PERFORMANCE_HINT: THROUGHPUT 
[ INFO ]       PERFORMANCE_HINT_NUM_REQUESTS: 0 
[ INFO ]       PERF_COUNT: NO 
[ INFO ]       SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE 
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM 
[ INFO ] LOADED_FROM_CACHE: False 
[ INFO ] PERF_COUNT: False 
[Step 9/11] Creating infer requests and preparing input tensors 
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values! 
[ INFO ] Fill input 'images' with random values 
[Step 10/11] Measuring performance (Start inference asynchronously, 6 inference requests, limits: 15000 ms duration) 
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 6.99 ms 
[Step 11/11] Dumping statistics report 
[ INFO ] Execution Devices:['CPU'] 
[ INFO ] Count: 17430 iterations 
[ INFO ] Duration: 15007.81 ms 
[ INFO ] Latency: 
[ INFO ]     Median: 5.03 ms 
[ INFO ]     Average: 5.03 ms 
[ INFO ]     Min: 3.10 ms 
[ INFO ]     Max: 13.40 ms 
[ INFO ] Throughput: 1161.40 FPS