ベンチマーク・ツール#

このページでは、ベンチマーク・ツールを使用して、サポートされているデバイスでディープラーニング推論のパフォーマンスを推測する方法を説明します。

注

Python のバージョンは、Python アプリケーションで使用されるモデルのベンチマークに推奨され、C++ バージョンでは、C++ アプリケーションで使用されるモデルのベンチマークに推奨されます。どちらのツールも同様のコマンドライン・インターフェイスとバックエンドを備えています。

基本的な使い方#

Python

PyPI を使用して OpenVINO をインストールすると、Python benchmark_app が自動的にインストールされます。benchmark_app を実行する前に、openvino_env 仮想環境がアクティブ化されていることを確認し、モデルが配置されているディレクトリーに移動します。

ベンチマーク・アプリケーションは、OpenVINO IR (model.xml および model.bin) と ONNX (model.onnx) 形式のモデルで動作します。必要に応じてモデルを変換してください。

モデルに対してデフォルトのオプションを使用してベンチマークを実行するには、次のコマンドを使用します:

benchmark_app -m model.xml

C++

C++ benchmark_app を使用するには、まずサンプル・アプリケーションのビルドの手順に従ってビルドし、次にサンプル・アプリケーションの実行準備の手順に従ってパスと環境変数を設定する必要があります。benchmark_app C++ サンプルバイナリーがビルドされたディレクトリーに移動します。

注

PyPI または Anaconda Cloud を使用して OpenVINO ランタイムをインストールした場合は、ベンチマーク Python ツールのみを使用できるため、このページの使用手順に従う必要があります。

ベンチマーク・アプリケーションは、OpenVINO IR、TensorFlow、TensorFlow Lite、PaddlePaddle、PyTorch、ONNX 形式のモデルで動作します。必要に応じて、OpenVINO を使用してモデルを変換することもできます。

モデルに対してデフォルトのオプションを使用してベンチマークを実行するには、次のコマンドを使用します:

./benchmark_app -m model.xml

デフォルトでは、アプリケーションは指定されたモデルを CPU にロードし、ランダムに生成されたデータ入力のバッチに対して 60 秒間推論を実行します。ロードすると、ベンチマーク・パラメーターに関する情報が出力されます。ベンチマークが完了すると、最小、平均、最大の推論レイテンシーと平均スループットが報告されます。

モデルの実行パラメーターの一部を再構成することで、ベンチマーク結果をデフォルト構成よりも改善できる場合があります。例えば、スループットまたはレイテンシーのパフォーマンス・ヒントを使用して、FPS を高めたり、推論時間を短縮するようにランタイムを最適化できます。benchmark_app で使用できる構成オプションの詳細については、以下を参照してください。

構成オプション#

ベンチマーク・アプリには、実行パラメーターを構成するためのさまざまなオプションが用意されています。このセクションでは、デバイスのパフォーマンスを向上させるためにベンチマークを簡単に調整する主要な構成オプションについて説明します。すべての構成オプションのリストは、高度な使い方のセクションに記載されています。

パフォーマンスのヒント: レイテンシーとスループット#

ベンチマーク・アプリを使用すると、ユーザーはレイテンシー重視またはスループット重視の推論モードを設定する高レベルの「パフォーマンスのヒント」を提供できます。このヒントにより、ランタイムは、処理ストリームの数や推論バッチサイズなどのランタイム・パラメーターを自動的に調整し、待ち時間の短縮や高スループットを優先できます。

パフォーマンスのヒントはデバイス固有の設定を必要とせず、デバイス間で完全に移行できますす。パラメーターは、使用されているデバイスに基づいて自動的に設定されます。これにより、ユーザーは、新しいデバイスに最適なランタイム・パラメーターを再設定することなく、ハードウェア・ターゲット間でアプリケーションを簡単に移行できるようになります。

指定しない場合、スループットがデフォルトとして使用されます。ヒントを明示的に設定するには、benchmark_app の実行時に -hint latency または -hint throughput を使用します:

Python*

benchmark_app -m model.xml -hint latency benchmark_app -m model.xml -hint throughput

C++

./benchmark_app -m model.xml -hint latency ./benchmark_app -m model.xml -hint throughput

注

ベンチマークが実行される環境が、最大のパフォーマンスを得られるように最適化されていることを確認するのはユーザーの責任です。そうしないと、異なる環境設定 (電力最適化設定、プロセッサーのオーバークロック、サーマルスロットなど) でアプリケーションを使用すると、異なる結果が生じる可能性があります。単一のオプションを複数回指定すると、最後の値のみが適用されます。例えば、-m フラグは次のようになります:

Python

benchmark_app -m model.xml -m model2.xml

C++

./benchmark_app -m model.xml -m model2.xml

レイテンシー#

レイテンシーは、1 つの推論要求を処理するのにかかる時間です。データをできるだけ迅速に推論して処理しなければならないアプリケーション (自動運転など) では、レイテンシーが低いことが望まれます。従来のデバイスでは、システムができるだけ多くのリソースを利用して各推論要求を迅速に計算できるように、並列処理ストリームの量を減らすことでレイテンシーの短縮が実現されます。ただし、マルチソケット CPU や最新の GPU などの高度なデバイスは、同じレイテンシーを実現しながら複数の推論要求を実行できます。

benchmark_app を -hint latency で実行すると、ハードウェアの並列化機能を最大限に活用しながら、レイテンシーを最小限に抑える最適な並列推論要求の数が決定されます。最適なレイテンシーを実現するために、処理ストリーム数と推論バッチサイズが自動的に設定されます。

スループット#

スループットは、推論パイプラインが一度に処理できるデータ量であり、通常は 1 秒あたりのフレーム (FPS) 数または 1 秒あたりの推論数で測定されます。大量のデータを同時に推論するアプリケーション (マルチカメラ・ビデオ・ストリームなど) では、高いスループットが要求されます。高スループットを達成するため、ランタイムは処理する十分なデータを供給することでデバイスを完全に飽和させることに重点を置きます。同時に処理できるデータ量を最大化するために、できるだけ多くのメモリーと並列ストリームを利用します。

benchmark_app を -hint throughput で実行すると、デバイスで使用可能なすべてのスレッドを利用することで並列推論要求の数が最大化されます。GPU では、利用可能な GPU メモリーがいっぱいになるように推論バッチサイズが自動的に設定されます。

パフォーマンスのヒントの詳細については、高レベルのパフォーマンスのヒントページを参照してください。最適なランタイム構成と、パフォーマンスのヒントを使用してランタイム構成が自動的に決定される方法の詳細については、ランタイム推論の最適化を参照してください。

デバイス#

ベンチマークを実行するデバイスを指定するには、-d <device> 引数を使用します。これにより、benchmark_app に特定のデバイスでベンチマークを実行するように指示されます。ベンチマーク・アプリは、CPU および GPU デバイスをサポートしています。GPU を使用するには、システムに適切なドライバーがインストールされている必要があります。デバイスが指定されていない場合、benchmark_app はデフォルトで CPU を使用します。

例えば、GPU でベンチマークを実行するには、以下を使用します:

Python

benchmark_app -m model.xml -d GPU

C++

./benchmark_app -m model.xml -d GPU

デバイスとして AUTO を指定することもできますが、その場合、benchmark_app は CPU でのモデルの読み込み段階でベンチマークに最適なデバイスを自動的に選択します。これによりパフォーマンスが向上する可能性があるため、テストしてみると良いでしょう。詳細については、自動デバイス選択ページを参照してください。

注

レイテンシーまたはスループットのヒントが設定されると、指定されたデバイスに対して最適なパフォーマンスが得られるようにストリームとバッチサイズが自動的に構成されます。

反復数#

デフォルトでは、ベンチマーク・アプリは事前に定義された期間実行され、モデルを使用して推論を繰り返し実行し、結果の推論速度を測定します。推論の反復回数を設定するには、いくつかのオプションがあります:

-niter <number_of_iterations> オプションを使用して、モデルが実行する反復回数を明示的に指定できます。
-t <seconds> オプションを使用して、アプリの実行時間を設定することもできます。
両方を設定すると、両方の条件が満たされるまで実行が継続されます。
-niter も -t も指定されていない場合、アプリはデバイスに応じて事前定義された期間実行されます。

モデルの実行回数が増えるほど、平均レイテンシーとスループットを決定する統計の精度が向上します。

入力#

ベンチマーク・ツールは、ユーザーが提供した .jpg、.bmp、または .png 形式の入力画像に対してベンチマークを実行します。-i <PATH_TO_INPUT> を使用して、イメージまたはイメージのフォルダーへのパスを指定します。例えば、test1.jpg という名前の画像に対しベンチマークを実行するには、以下を使用します:

Python*

benchmark_app -m model.xml -i test1.jpg

C++

./benchmark_app -m model.xml -i test1.jpg

このツールは、指定された入力を繰り返しループし、指定された時間または反復回数に達するまで推論を実行します。-i フラグが指定されていない場合、ツールはモデルの入力形状に適合するランダムデータを自動的に生成します。

例#

その他の使用例 (およびベンチマーク用のモデルの設定方法に関する詳しい手順) については、ツールの実行例セクションを参照してください。

高度な使い方#

注

OpenVINO™ のサンプル、ツール、およびデモは、デフォルトでは BGR チャンネル順序での入力を想定しています。RGB 順序で動作するようにモデルをトレーニングした場合は、サンプルまたはデモ・アプリケーションでデフォルトのチャンネル順序を手動で再配置するか、reverse_input_channels 引数を指定したモデル・トランスフォーメーション API を使用してモデルを再変換する必要があります。引数の詳細については、「モデルを中間表現 (IR) に変換する」の「入力チャンネルを反転する場合」セクションを参照してください。

レイヤーごとのパフォーマンスとロギング#

-report_type パラメーターを次のいずれかの値に設定して統計ダンプを有効にすると、アプリケーションは、実行された推論要求ごとにレイヤーごとのパフォーマンス測定 (PM) カウンターも収集します:

no_counters レポートには、指定された構成オプション、結果の FPS およびレイテンシーが含まれます。
average_counters レポートは no_counters レポートを拡張し、ネットワークのレイヤーごとの平均 PM カウンター値を追加します。
detailed_counters レポートは、average_counters レポートを拡張し、実行された各推論要求のレイヤーごとの PM カウンターとレイテンシーを追加します。

タイプに応じて、レポートは、-report_folder で指定されたパスにある benchmark_no_counters_report.csv、benchmark_average_counters_report.csv、または benchmark_detailed_counters_report.csv ファイルに保存されます。-exec_graph_path パラメーターでパスを指定すると、アプリケーションは XML ファイルにシリアル化された実行可能グラフ情報も保存します。

すべての構成オプション#

-h または --help オプションを指定してアプリケーションを実行すると、使用方法が表示されます:

Python*

[Step 1/11] Parsing and validating input arguments 
[ INFO ] Parsing input parameters 
usage: benchmark_app.py [-h [HELP]] [-i PATHS_TO_INPUT [PATHS_TO_INPUT ...]]-m PATH_TO_MODEL [-d TARGET_DEVICE] 
                        [-hint {throughput,cumulative_throughput,latency,none}] [-niter NUMBER_ITERATIONS] [-t TIME] [-b BATCH_SIZE] [-shape SHAPE] 
                        [-data_shape DATA_SHAPE] [-layout LAYOUT] [-extensions EXTENSIONS] [-c PATH_TO_CLDNN_CONFIG] [-cdir CACHE_DIR] [-lfile [LOAD_FROM_FILE]] 
                        [-api {sync,async}] [-nireq NUMBER_INFER_REQUESTS] [-nstreams NUMBER_STREAMS] [-inference_only [INFERENCE_ONLY]] 
                        [-infer_precision INFER_PRECISION] [-ip {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}] 
                        [-op {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}] [-iop INPUT_OUTPUT_PRECISION]
                        [--mean_values [R,G,B]] [--scale_values [R,G,B]] 
                        [-nthreads NUMBER_THREADS] [-pin {YES,NO,NUMA,HYBRID_AWARE}] [-latency_percentile LATENCY_PERCENTILE] 
                        [-report_type {no_counters,average_counters,detailed_counters}] [-report_folder REPORT_FOLDER] [-pc [PERF_COUNTS]] 
                        [-pcsort {no_sort,sort,simple_sort}] [-pcseq [PCSEQ]] [-exec_graph_path EXEC_GRAPH_PATH] [-dump_config DUMP_CONFIG] [-load_config LOAD_CONFIG] 

Options:     -h [HELP], --help [HELP] 
                        Show this help message and exit.     -i PATHS_TO_INPUT [PATHS_TO_INPUT ...], --paths_to_input PATHS_TO_INPUT [PATHS_TO_INPUT ...]                        Optional.Path to a folder with images and/or binaries or to specific image or binary file.It is also allowed to map files to model inputs: input_1:file_1/dir1,file_2/dir2,input_4:file_4/dir4 input_2:file_3/dir3 Currently supported data types: bin, npy.If OPENCV is enabled, this functionalityis extended with the following data types: bmp, dib, jpeg, jpg, jpe, jp2, png, pbm, pgm, ppm, sr, ras, tiff, tif.    -m PATH_TO_MODEL, --path_to_model PATH_TO_MODEL Required.Path to an .xml/.onnx file with a trained model or to a .blob file with a trained compiled model.    -d TARGET_DEVICE, --target_device TARGET_DEVICE Optional.Specify a target device to infer on (the list of available devices is shown below).Default value is CPU.Use '-d HETERO:<comma separated devices list>' format to specify HETERO plugin.Use '-d MULTI:<comma separated devices list>' format to specify MULTI plugin.The application looks for a suitable plugin for the specified device.    -hint {throughput,cumulative_throughput,latency,none}, --perf_hint {throughput,cumulative_throughput,latency,none} Optional.Performance hint (latency or throughput or cumulative_throughput or none).Performance hint allows the OpenVINO device to select the right model-specific settings.'throughput': device performance mode will be set to THROUGHPUT.'cumulative_throughput': device performance mode will be set to CUMULATIVE_THROUGHPUT.'latency': device performance mode will be set to LATENCY.'none': no device performance mode will be set.Using explicit 'nstreams' or other device-specific options, please set hint to 'none’ 
    -niter NUMBER_ITERATIONS, --number_iterations NUMBER_ITERATIONS Optional.Number of iterations.If not specified, the number of iterations is calculated depending on a device.    -t TIME, --time TIME Optional.Time in seconds to execute topology.    -api {sync,async}, --api_type {sync,async} 
                       Optional.Enable using sync/async API.Default value is async.
Input shapes:  -b BATCH_SIZE, --batch_size BATCH_SIZE Optional.Batch size value.If not specified, the batch size value is determined from Intermediate Representation 
  -shape SHAPE Optional.Set shape for input.For example, "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]" in case of one input size.This parameter affect model Parameter shape, can be dynamic.For dynamic dimesions use symbol `?`, `-1` or range `low.. up`.  -data_shape DATA_SHAPE Optional.Optional if model shapes are all static (original ones or set by -shape).Required if at least one input shape is dynamic and input images are not provided.Set shape for input tensors.For example, "input1[1,3,224,224][1,3,448,448],input2[1,4][1,8]" or "[1,3,224,224][1,3,448,448] in case of one input size.  -layout LAYOUT Optional.Prompts how model layouts should be treated by application.For example, "input1[NCHW],input2[NC]" or "[NCHW]" in case of one input size.
Advanced options:  -extensions EXTENSIONS, --extensions EXTENSIONS Optional.Path or a comma-separated list of paths to libraries (.so or .dll) with extensions.  -c PATH_TO_CLDNN_CONFIG, --path_to_cldnn_config PATH_TO_CLDNN_CONFIG Optional.Required for GPU custom kernels.Absolute path to an .xml file with the kernels description.  -cdir CACHE_DIR, --cache_dir CACHE_DIR Optional.Enable model caching to specified directory -lfile [LOAD_FROM_FILE], --load_from_file [LOAD_FROM_FILE] Optional.Loads model from file directly without read_model.  -nireq NUMBER_INFER_REQUESTS, --number_infer_requests NUMBER_INFER_REQUESTS Optional.Number of infer requests.Default value is determined automatically for device.  -nstreams NUMBER_STREAMS, --number_streams NUMBER_STREAMS Optional.Number of streams to use for inference on the CPU/GPU (for HETERO and MULTI device cases use format <device1>:<nstreams1>,<device2>:<nstreams2> or just <nstreams>).Default value is determined automatically for a device.Please note that although the automatic selection usually provides a reasonable performance, it still may be non - optimal for some cases, especially for very small models.Also, using nstreams>1 is inherently throughput-oriented option, while for the best-latency estimations the number of streams should be set to 1.See samples README for more details.  -inference_only [INFERENCE_ONLY], --inference_only [INFERENCE_ONLY] Optional.If true inputs filling only once before measurements (default for static models), else inputs filling is included into loop measurement (default for dynamic models) -infer_precision INFER_PRECISION Optional.Specifies the inference precision.Example #1: '-infer_precision bf16'.Example #2: '-infer_precision CPU:bf16,GPU:f32’ 
  -exec_graph_path EXEC_GRAPH_PATH, --exec_graph_path EXEC_GRAPH_PATH Optional.Path to a file where to store executable graph information serialized.
Preprocessing options:  -ip {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}, --input_precision {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64} Optional.Specifies precision for all input layers of the model.-op {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}, --output_precision {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64} Optional. Specifies precision for all output layers of the model.  -iop INPUT_OUTPUT_PRECISION, --input_output_precision INPUT_OUTPUT_PRECISION Optional.Specifies precision for input and output layers by name. Example:  -iop "input:f16, output:f16".Notice that quotes are required.Overwrites precision from ip and op options for specified layers.  --mean_values [R,G,B] Optional.Mean values to be used for the input image per channel.Values to be provided in the [R,G,B] format.Can be defined for desired input of the model, for example: "--mean_values data[255,255,255],info[255,255,255]".The exact meaning and order of channels depend on how the original model was trained.Applying the values affects performance and may cause type conversion     
  --scale_values [R,G,B] Optional.Scale values to be used for the input image per channel.Values are provided in the [R,G,B] format.Can be defined for desired input of the model, for example: "--scale_values data[255,255,255],info[255,255,255]".The exact meaning and order of channels depend on how the original model was trained.If both --mean_values and --scale_values are specified, the mean is subtracted first and then scale is applied regardless of the order of options in command line. Applying the values affects performance and may cause type conversion Device-specific performance options:   -nthreads NUMBER_THREADS, --number_threads NUMBER_THREADS Number of threads to use for inference on the CPU (including HETERO and MULTI cases).  -pin {YES,NO,NUMA,HYBRID_AWARE}, --infer_threads_pinning {YES,NO,NUMA,HYBRID_AWARE} Optional.Enable threads->cores ('YES' which is OpenVINO runtime's default for conventional CPUs), threads->(NUMA)nodes ('NUMA'), threads->appropriate core types ('HYBRID_AWARE', which is OpenVINO runtime's default for Hybrid CPUs) or completely disable ('NO') CPU threads pinning for CPU-involved inference.
Statistics dumping options:  -latency_percentile LATENCY_PERCENTILE, --latency_percentile LATENCY_PERCENTILE Optional.Defines the percentile to be reported in latency metric.The valid range is [1, 100].The default value is 50 (median).  -report_type {no_counters,average_counters,detailed_counters}, --report_type {no_counters,average_counters,detailed_counters} Optional.Enable collecting statistics report."no_counters" report contains configuration options specified, resulting FPS and latency."average_counters" report extends "no_counters" report and additionally includes average PM counters values for each layer from the model. "detailed_counters" report extends "average_counters" report and additionally includes per-layer PM counters and latency for each executed infer request.  -report_folder REPORT_FOLDER, --report_folder REPORT_FOLDER Optional.Path to a folder where statistics report is stored.  -json_stats [JSON_STATS], --json_stats [JSON_STATS] Optional.Enables JSON-based statistics output (by default reporting system will use CSV format).Should be used together with -report_folder option.  -pc [PERF_COUNTS], --perf_counts [PERF_COUNTS] Optional.Report performance counters.  -pcsort {no_sort,sort,simple_sort}, --perf_counts_sort {no_sort,sort,simple_sort} Optional.Report performance counters and analysis the sort hotpoint opts. sort: Analysis opts time cost, print by hotpoint order no_sort: Analysis opts time cost, print by normal order simple_sort: Analysis opts time cost, only print EXECUTED opts by normal order 
  -pcseq [PCSEQ], --pcseq [PCSEQ] Optional.Report latencies for each shape in -data_shape sequence.  -dump_config DUMP_CONFIG Optional.Path to JSON file to dump OpenVINO parameters, which were set by application.  -load_config LOAD_CONFIG Optional.Path to JSON file to load custom OpenVINO parameters.Please note, command line parameters have higher priority then parameters from configuration file.Example 1: a simple JSON file for HW device with primary properties.{ "CPU": {"NUM_STREAMS": "3", "PERF_COUNT": "NO"} } Example 2: a simple JSON file for meta device(AUTO/MULTI) with HW device properties.{ "AUTO": { "PERFORMANCE_HINT": "THROUGHPUT", "PERF_COUNT": "NO", "DEVICE_PROPERTIES": "{CPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:3},GPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:5}}" } }

C++

[Step 1/11] Parsing and validating input arguments [ INFO ] Parsing input parameters usage: benchmark_app [OPTION] Options:    -h, --help Print the usage message 
    -m <path> Required.Path to an .xml/.onnx file with a trained model or to a .blob files with a trained compiled model.    -i <path> Optional.Path to a folder with images and/or binaries or to specific image or binary file.In case of dynamic shapes models with several inputs provide the same number of files for each input (except cases with single file for any input) :"input1:1.jpg input2:1.bin", "input1:1.bin,2.bin input2:3.bin input3:4.bin,5.bin ".Also you can pass specific keys for inputs: "random" - for fillling input with random data, "image_info" - for filling input with image size.You should specify either one files set to be used for all inputs (without providing input names) or separate files sets for every input of model (providing inputs names).Currently supported data types: bmp, bin, npy.If OPENCV is enabled, this functionality is extended with the following data types: dib, jpeg, jpg, jpe, jp2, png, pbm, pgm, ppm, sr, ras, tiff, tif.    -d <device> Optional.Specify a target device to infer on (the list of available devices is shown below).Default value is CPU.Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin.Use "-d MULTI:<comma-separated_devices_list>" format to specify MULTI plugin.The application looks for a suitable plugin for the specified device.    -hint <performance hint> (latency or throughput or cumulative_throughput or none) Optional.Performance hint allows the OpenVINO device to select the right model-specific settings.'throughput' or 'tput': device performance mode will be set to THROUGHPUT.'cumulative_throughput' or 'ctput': device performance mode will be set to CUMULATIVE_THROUGHPUT.'latency': device performance mode will be set to LATENCY.'none': no device performance mode will be set.Using explicit 'nstreams' or other device-specific options, please set hint to 'none' -niter <integer> Optional.Number of iterations.If not specified, the number of iterations is calculated depending on a device.    -t Optional.Time in seconds to execute topology.Input shapes -b <integer> Optional.Batch size value.If not specified, the batch size value is determined from Intermediate Representation.    -shape Optional.Set shape for model input.For example, "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]" in case of one input size.This parameter affect model input shape and can be dynamic.For dynamic dimensions use symbol `?` or '-1'.Ex.[?,3,?,?].For bounded dimensions specify range 'min..max'.Ex.[1..10,3,?,?].    -data_shape Required for models with dynamic shapes.Set shape for input blobs.In case of one input size: "[1,3,224,224]" or "input1[1,3,224,224],input2[1,4] ".In case of several input sizes provide the same number for each input (except cases with single shape for any input): "[1,3,128,128][3,3,128,128][1,3,320,320]", "input1[1,1, 128,128][1,1,256,256],input2[80,1]" or "input1[1,192][1,384],input2[1,192][1,384],input3[1,192][1,384],input4[1,192][1,384]".If model shapes are all static specifying the option will cause an exception.    -layout Optional.Prompts how model layouts should be treated by application.For example, "input1[NCHW],input2[NC]" or "[NCHW]" in case of one input size.Advanced options -extensions <absolute_path> Required for custom layers (extensions).Absolute path to a shared library with the kernels implementations.    -c <absolute_path> Required for GPU custom kernels.Absolute path to an .xml file with the kernels description.    -cache_dir <path> Optional.Enables caching of loaded models to specified directory.List of devices which support caching is shown at the end of this message.    -load_from_file Optional.Loads model from file directly without read_model.All CNNNetwork options (like re-shape) will be ignored -api <sync/async> Optional.Enable Sync/Async API.Default value is "async".    -nireq <integer> Optional.Number of infer requests.Default value is determined automatically for device.    -nstreams <integer> Optional.Number of streams to use for inference on the CPU or GPU devices (for HETERO and MULTI device cases use format <dev1>:<nstreams1>, <dev2>:<nstreams2> or just <nstreams>).Default value is determined automatically for a device.Please note that although the automatic selection usually provides a reasonable performance, it still may be non - optimal for some cases, especially for very small models.See sample's README for more details.Also, using nstreams>1 is inherently throughput-oriented option, while for the best-latency estimations the number of streams should be set to 1.    -inference_only Optional.Measure only inference stage.Default option for static models.Dynamic models are measured in full mode which includes inputs setup stage, inference only mode available for them with single input data shape only. To enable full mode for static models pass "false" value to this argument: ex. "-inference_only=false".    -infer_precision Optional.Specifies the inference precision.Example #1: '-infer_precision bf16'. Example #2: '-infer_precision CPU:bf16,GPU:f32' Preprocessing options:    -ip <value> Optional.Specifies precision for all input layers of the model.    -op <value> Optional.Specifies precision for all output layers of the model.    -iop <value> Optional.Specifies precision for input and output layers by name. Example:     -iop "input:f16, output:f16". Notice that quotes are required. Overwrites precision from ip and op options for specified layers.    -mean_values [R,G,B] Optional.Mean values to be used for the input image per channel. Values to be provided in the [R,G,B] format. Can be defined for desired input of the model, for example: "--mean_values data[255,255,255],info[255,255,255]". The exact meaning and order of channels depend on how the original model was trained. Applying the values affects performance and may cause type conversion -scale_values [R,G,B] Optional. Scale values to be used for the input image per channel. Values are provided in the [R,G,B] format. Can be defined for desired input of the model, for example: "--scale_values data[255,255,255],info[255,255,255]". The exact meaning and order of channels depend on how the original model was trained. If both --mean_values and --scale_values are specified, the mean is subtracted first and then scale is applied regardless of the order of options in command line. Applying the values affects performance and may cause type conversion Device-specific performance options:    -nthreads <integer> Optional.Number of threads to use for inference on the CPU (including HETERO and MULTI cases).    -pin <string> ("YES"|"CORE") / "HYBRID_AWARE" / ("NO"|"NONE") / "NUMA" Optional.Explicit inference threads binding options (leave empty to let the OpenVINO make a choice): enabling threads->cores pinning("YES", which is already default for any conventional CPU), letting the runtime to decide on the threads->different core types("HYBRID_AWARE", which is default on the hybrid CPUs) threads->(NUMA)nodes("NUMA") or completely disable("NO") CPU inference threads pinning Statistics dumping options:    -latency_percentile Optional.Defines the percentile to be reported in latency metric. The valid range is [1, 100]. The default value is 50 (median).    -report_type <type> Optional.Enable collecting statistics report. "no_counters" report contains configuration options specified, resulting FPS and latency. "average_counters" report extends "no_counters" report and additionally includes average PM counters values for each layer from the model. "detailed_counters" report extends "average_counters" report and additionally includes per-layer PM counters and latency for each executed infer request.    -report_folder Optional.Path to a folder where statistics report is stored.    -json_stats Optional.Enables JSON-based statistics output (by default reporting system will use CSV format). Should be used together with -report_folder option.    -pc Optional.Report performance counters.    -pcsort Optional.Report performance counters and analysis the sort hotpoint opts. "sort" Analysis opts time cost, print by hotpoint order "no_sort" Analysis opts time cost, print by normal order "simple_sort" Analysis opts time cost, only print EXECUTED opts by normal order -pcseq Optional. Report latencies for each shape in -data_shape sequence.    -exec_graph_path Optional.Path to a file where to store executable graph information serialized.    -dump_config Optional.Path to JSON file to dump device properties, which were set by application.    -load_config Optional.Path to JSON file to load custom device properties. Please note, command line parameters have higher priority then parameters from configuration file. Example 1: a simple JSON file for HW device with primary properties. { "CPU": {"NUM_STREAMS": "3", "PERF_COUNT": "NO"} } Example 2: a simple JSON file for meta device(AUTO/MULTI) with HW device properties. { "AUTO": { "PERFORMANCE_HINT": "THROUGHPUT", "PERF_COUNT": "NO", "DEVICE_PROPERTIES": "{CPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:3},GPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:5}}" } }

オプションの空のリストを指定してアプリケーションを実行すると、上記の使用法メッセージとエラーメッセージが表示されます。

入力に関する詳細情報#

ベンチマーク・ツールは、1 つ以上の入力を持つトポロジーをサポートします。トポロジーがデータに依存しない場合は、入力パラメーターをスキップでき、入力にはランダムな値が入力されます。モデルに画像入力しかない場合は、画像を含むフォルダーまたは画像へのパスを指定します。モデルに特定の入力 (画像以外) がある場合は、適切な精度のデータで満たされたバイナリーファイルまたは numpy 配列を準備し、そのパスを指定します。モデルに混合入力タイプがある場合、入力フォルダーにはファイルがすべて含まれている必要があります。画像入力には画像ファイルが 1 つずつ入力されます。バイナリー入力にはバイナリーの入力が 1 つずつ入力されます。

ツールの実行例#

このセクションでは、CPU または GPU デバイス上で asl-recognition のインテルモデルを使用してベンチマーク・ツールを実行する手順を説明します。ランダムなデータ入力を使用します。

注

次の手順を実行するには、インターネット・アクセスが必要です。プロキシーサーバー経由でのみインターネットにアクセスできる場合は、OS 環境でプロキシーサーバーが設定されていることを確認してください。

OpenVINO 中間表現 (IR) モデルの .xml ファイルの場所、推論を実行するデバイス、およびパフォーマンスのヒントを指定してツールを実行します。次のコマンドは、CPU ではレイテンシー・モード、GPU デバイスではスループット・モードでベンチマーク・ツールを実行します:

CPU (レイテンシー・モード)
Python
benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency
C++
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency
GPU (スループット・モード)
Python
benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d GPU -hint throughput
C++
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d GPU -hint throughput

アプリケーションは、実行された反復回数、合計実行時間、レイテンシー、およびスループットを出力します。-report_type パラメーターを設定すると、アプリケーションは統計レポートを出力します。-pc パラメーターを設定すると、アプリケーションはパフォーマンス・カウンターを出力します。-exec_graph_path を設定すると、アプリケーションは実行可能グラフ情報をシリアル化して報告します。レイヤーごとの PM カウンターを含むすべての測定値はミリ秒単位で報告されます。

CPU 上で benchmark_app をレイテンシー・モードで実行したときに出力される情報の例を以下に示します:

Python*

benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency

[Step 1/11] Parsing and validating input arguments 
[ INFO ] Parsing input parameters 
[ INFO ] Input command: /home/openvino/tools/benchmark_tool/benchmark_app.py -m omz_models/intel/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency 
[Step 2/11] Loading OpenVINO Runtime 
[ INFO ] OpenVINO: [ INFO ] Build .................................2022.3.0-7750-c1109a7317e-feature/py_cpp_align 
[ INFO ] 
[ INFO ] Device info: [ INFO ] CPU 
[ INFO ] Build .................................2022.3.0-7750-c1109a7317e-feature/py_cpp_align 
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration 
[Step 4/11] Reading model files 
[ INFO ] Loading model files 
[ INFO ] Read model took 147.82 ms 
[ INFO ] Original model I/O parameters: [ INFO ] Model inputs: [ INFO ] input (node: input) : f32 / [N,C,D,H,W] / {1,3,16,224,224} 
[ INFO ] Model outputs: [ INFO ] output (node: output) : f32 / [...]/ {1,100} 
[Step 5/11] Resizing model to match image sizes and given batch 
[ INFO ] Model batch size: 1 [Step 6/11] Configuring input of the model 
[ INFO ] Model inputs: [ INFO ] input (node: input) : f32 / [N,C,D,H,W] / {1,3,16,224,224} 
[ INFO ] Model outputs: [ INFO ] output (node: output) : f32 / [...]/ {1,100} 
[Step 7/11] Loading the model to the device 
[ INFO ] Compile model took 974.64 ms 
[Step 8/11] Querying optimal runtime parameters 
[ INFO ] Model: [ INFO ] NETWORK_NAME: torch-jit-export 
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 2 
[ INFO ] NUM_STREAMS: 2 
[ INFO ] AFFINITY: Affinity.CORE 
[ INFO ] INFERENCE_NUM_THREADS: 0 
[ INFO ] PERF_COUNT: False 
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'> 
[ INFO ] PERFORMANCE_HINT: PerformanceMode.LATENCY 
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0 
[Step 9/11] Creating infer requests and preparing input tensors 
[ WARNING ] No input files were given for input 'input'!.This input will be filled with random values! [ INFO ] Fill input 'input' with random values 
[Step 10/11] Measuring performance (Start inference asynchronously, 2 inference requests, limits: 60000 ms duration) 
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).[ INFO ] First inference took 38.41 ms [Step 11/11] Dumping statistics report 
[ INFO ] Count: 5380 iterations 
[ INFO ] Duration: 60036.78 ms 
[ INFO ] Latency: [ INFO ] Median: 22.04 ms 
[ INFO ] Average: 22.09 ms 
[ INFO ] Min: 20.78 ms 
[ INFO ] Max: 33.51 ms 
[ INFO ] Throughput: 89.61 FPS

C++

./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency

[Step 1/11] Parsing and validating input arguments 
[ INFO ] Parsing input parameters 
[ INFO ] Input command: /home/openvino/bin/intel64/DEBUG/benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency 
[Step 2/11] Loading OpenVINO Runtime 
[ INFO ] OpenVINO: [ INFO ] Build .................................2022.3.0-7750-c1109a7317e-feature/py_cpp_align 
[ INFO ] 
[ INFO ] Device info: [ INFO ] CPU 
[ INFO ] Build .................................2022.3.0-7750-c1109a7317e-feature/py_cpp_align 
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration 
[ WARNING ] Device(CPU) performance hint is set to LATENCY 
[Step 4/11] Reading model files 
[ INFO ] Loading model files 
[ INFO ] Read model took 141.11 ms 
[ INFO ] Original model I/O parameters: [ INFO ] Network inputs: [ INFO ] input (node: input) : f32 / [N,C,D,H,W] / {1,3,16,224,224} 
[ INFO ] Network outputs: [ INFO ] output (node: output) : f32 / [...]/ {1,100} 
[Step 5/11] Resizing model to match image sizes and given batch 
[ INFO ] Model batch size: 0 
[Step 6/11] Configuring input of the model 
[ INFO ] Model batch size: 1 
[ INFO ] Network inputs: [ INFO ] input (node: input) : f32 / [N,C,D,H,W] / {1,3,16,224,224} 
[ INFO ] Network outputs: [ INFO ] output (node: output) : f32 / [...]/ {1,100} 
[Step 7/11] Loading the model to the device 
[ INFO ] Compile model took 989.62 ms 
[Step 8/11] Querying optimal runtime parameters 
[ INFO ] Model: [ INFO ] NETWORK_NAME: torch-jit-export 
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 2 
[ INFO ] NUM_STREAMS: 2 
[ INFO ] AFFINITY: CORE 
[ INFO ] INFERENCE_NUM_THREADS: 0 
[ INFO ] PERF_COUNT: NO 
[ INFO ] INFERENCE_PRECISION_HINT: f32 
[ INFO ] PERFORMANCE_HINT: LATENCY 
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0 
[Step 9/11] Creating infer requests and preparing input tensors 
[ WARNING ] No input files were given: all inputs will be filled with random values! [ INFO ] Test Config 0 
[ INFO ] input ([N,C,D,H,W], f32, {1, 3, 16, 224, 224}, static): random (binary data is expected) 
[Step 10/11] Measuring performance (Start inference asynchronously, 2 inference requests, limits: 60000 ms duration) 
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).[ INFO ] First inference took 37.27 ms [Step 11/11] Dumping statistics report 
[ INFO ] Count: 5470 iterations 
[ INFO ] Duration: 60028.56 ms 
[ INFO ] Latency: [ INFO ] Median: 21.79 ms 
[ INFO ] Average: 21.92 ms 
[ INFO ] Min: 20.60 ms 
[ INFO ] Max: 37.19 ms 
[ INFO ] Throughput: 91.12 FPS

ベンチマーク・ツールを動的に形成されたネットワークで使用し、さまざまな入力データの形状に対して予想される推論時間を測定することもできます。動的形状の使用法の詳細については、すべての構成オプションセクションの -shape および -data_shape 引数の説明を参照してください。以下は、動的ネットワークで benchmark_app を実行するコマンドの例と、その結果の出力の一部です:

Python*

benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -shape [-1,3,16,224,224] -data_shape [1,3,16,224,224][2,3,16,224,224][4,3,16,224,224] -pcseq

[Step 9/11] Creating infer requests and preparing input tensors 
[ WARNING ] No input files were given for input 'input'!.This input will be filled with random values! [ INFO ] Fill input 'input' with random values 
[ INFO ] Defined 3 tensor groups: [ INFO ] input: {1, 3, 16, 224, 224} 
[ INFO ] input: {2, 3, 16, 224, 224} 
[ INFO ] input: {4, 3, 16, 224, 224} 
[Step 10/11] Measuring performance (Start inference asynchronously, 11 inference requests, limits: 60000 ms duration) 
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).[ INFO ] First inference took 201.15 ms [Step 11/11] Dumping statistics report 
[ INFO ] Count: 2811 iterations 
[ INFO ] Duration: 60271.71 ms 
[ INFO ] Latency: [ INFO ] Median: 207.70 ms 
[ INFO ] Average: 234.56 ms 
[ INFO ] Min: 85.73 ms 
[ INFO ] Max: 773.55 ms 
[ INFO ] Latency for each data shape group: [ INFO ] 1. input: {1, 3, 16, 224, 224} 
[ INFO ] Median: 118.08 ms 
[ INFO ] Average: 115.05 ms 
[ INFO ] Min: 85.73 ms 
[ INFO ] Max: 339.25 ms 
[ INFO ] 2. input: {2, 3, 16, 224, 224} 
[ INFO ] Median: 207.25 ms 
[ INFO ] Average: 205.16 ms 
[ INFO ] Min: 166.98 ms 
[ INFO ] Max: 545.55 ms 
[ INFO ] 3. input: {4, 3, 16, 224, 224} 
[ INFO ] Median: 384.16 ms 
[ INFO ] Average: 383.48 ms 
[ INFO ] Min: 305.51 ms 
[ INFO ] Max: 773.55 ms 
[ INFO ] Throughput: 108.82 FPS

C++

./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -shape [-1,3,16,224,224] -data_shape [1,3,16,224,224][2,3,16,224,224][4,3,16,224,224] -pcseq

[Step 9/11] Creating infer requests and preparing input tensors 
[ INFO ] Test Config 0 
[ INFO ] input ([N,C,D,H,W], f32, {1, 3, 16, 224, 224}, dyn:{?,3,16,224,224}): random (binary data is expected) 
[ INFO ] Test Config 1 
[ INFO ] input ([N,C,D,H,W], f32, {2, 3, 16, 224, 224}, dyn:{?,3,16,224,224}): random (binary data is expected) 
[ INFO ] Test Config 2 
[ INFO ] input ([N,C,D,H,W], f32, {4, 3, 16, 224, 224}, dyn:{?,3,16,224,224}): random (binary data is expected) 
[Step 10/11] Measuring performance (Start inference asynchronously, 11 inference requests, limits: 60000 ms duration) 
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).[ INFO ] First inference took 204.40 ms [Step 11/11] Dumping statistics report 
[ INFO ] Count: 2783 iterations 
[ INFO ] Duration: 60326.29 ms 
[ INFO ] Latency: [ INFO ] Median: 208.20 ms 
[ INFO ] Average: 237.47 ms 
[ INFO ] Min: 85.06 ms 
[ INFO ] Max: 743.46 ms 
[ INFO ] Latency for each data shape group: [ INFO ] 1. input: {1, 3, 16, 224, 224} 
[ INFO ] Median: 120.36 ms 
[ INFO ] Average: 117.19 ms 
[ INFO ] Min: 85.06 ms 
[ INFO ] Max: 348.66 ms 
[ INFO ] 2. input: {2, 3, 16, 224, 224} 
[ INFO ] Median: 207.81 ms 
[ INFO ] Average: 206.39 ms 
[ INFO ] Min: 167.19 ms 
[ INFO ] Max: 578.33 ms 
[ INFO ] 3. input: {4, 3, 16, 224, 224} 
[ INFO ] Median: 387.40 ms 
[ INFO ] Average: 388.99 ms 
[ INFO ] Min: 327.50 ms 
[ INFO ] Max: 743.46 ms 
[ INFO ] Throughput: 107.61 FPS

ベンチマーク・ツール#

基本的な使い方#

構成オプション#

パフォーマンスのヒント: レイテンシーとスループット#

レイテンシー#

スループット#

デバイス#

反復数#

入力#

例#

高度な使い方#

レイヤーごとのパフォーマンスとロギング#

すべての構成オプション#

入力に関する詳細情報#

ツールの実行例#

関連情報#