CLIP 画像分類#

推論にマルチモーダル CLIP モデルを使用して、前処理と後処理に Python コードを使用した画像分類のデモ。クライアントは、画像と入力ラベルを含む要求をグラフに送信し、最も高い確率でラベルを受け取ります。前処理 Python ノードが最初に実行され、要求からのユーザー入力に基づいて入力ベクトルを準備します。次に、入力を使用して、CLIP モデルの推論から類似度行列を取得します。その後、後処理 Python ノードが実行され、入力されたラベルの中で最もスコアの高いラベルが抽出され、ユーザーに返されます。

デモはこの CLIP ノートブックに基づいています

以下の図は、実行フローをグラフで示したものです。

MediaPipe グラフの画像

イメージをビルド#

git clone https://github.com/openvinotoolkit/model_server.git 
cd model_server make python_image

クライアント・パッケージをインストール#

cd demos/python_demos/clip_image_classification/ 
virtualenv .venv
.  .venv/bin/activate 
pip3 install -r requirements.txt

モデルをダウンロードして変換#

pip3 install -r download_model_requirements.txt

python3 download_model.py

CLIP グラフを使用した OpenVINO モデルサーバーのデプロイ#

必要条件:

Python サポートと Optimum がインストールされた OVMS のイメージ

以下を含む ./servable をマウント:

postprocess.py および preprocess.py - CLIP モデルの実行と使用に必要な Python スクリプト
config.json - ロードするサーバブルを定義します
graph.pbtxt - Python ノードを含む MediaPipe グラフを定義します

docker run -d --rm -p 9000:9000 -p 8000:8000 -v ${PWD}/servable:/workspace -v ${PWD}/model:/model/ openvino/model_server:py --config_path /workspace/config.json --port 9000 --rest_port 8000

grpc 要求で検出名を求める#

grpc クライアント・スクリプトを実行します

python3 grpc_client.py --url localhost:9000

期待される出力:

Server Ready: True 
Using image_url: https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco.jpg 

Using input_labels: 
['cat', 'dog', 'wolf', 'tiger', 'man', 'horse', 'frog', 'tree', 'house', 'computer']

Iteration 0 
Detection: 
dog 

processing time for all iterations 
average time: 90.00 ms; average speed: 11.11 fps 
median time: 90.00 ms; median speed: 11.11 fps 
max time: 90.00 ms; min speed: 11.11 fps 
min time: 90.00 ms; max speed: 11.11 fps 
time percentile 90: 90.00 ms; speed percentile 90: 11.11 fps 
time percentile 50: 90.00 ms; speed percentile 50: 11.11 fps 
time standard deviation: 0.00 
time variance: 0.00

残りの要求で検出名を求める#

残りのクライアント・スクリプトを実行します

python3 rest_client.py --url localhost:8000

期待される出力:

Using image_url: https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco.jpg 

Using input_labels: 
['cat', 'dog', 'wolf', 'tiger', 'man', 'horse', 'frog', 'tree', 'house', 'computer']

Iteration 0 
Detection: 
dog 

processing time for all iterations 
average time: 93.00 ms; average speed: 10.75 fps 
median time: 93.00 ms; median speed: 10.75 fps 
max time: 93.00 ms; min speed: 10.75 fps 
min time: 93.00 ms; max speed: 10.75 fps 
time percentile 90: 93.00 ms; speed percentile 90: 10.75 fps 
time percentile 50: 93.00 ms; speed percentile 50: 10.75 fps 
time standard deviation: 0.00 
time variance: 0.00