歩行者追跡 C++ デモ#

このデモでは、歩行者追跡シナリオを紹介します。入力ビデオシーケンスからフレームを読み取り、フレーム内の歩行者を検出し、フレームごとに歩行者の動きの軌跡を構築します。

どのように動作するか#

起動時に、アプリケーションはコマンドライン・パラメーターを読み取り、指定されたネットワークを読み込みます。

入力ビデオシーケンス (ビデオファイルまたは画像が含まれるフォルダー) からフレームを取得すると、アプリケーションは歩行者検知ネットワークの推論を実行します。

その後、検出された歩行者の境界ボックスが、歩行者の外観と既知の (すでに追跡されている) 人物とを照合するトラッカークラスのインスタンスに渡されます。明らかである場合 (検出された歩行者のピクセル間の類似性が、既知のトラックの 1 つからの最新の歩行者画像に近い場合)、再識別ネットワークの推論なしで照合が行われます。複雑なケースでは、デモでは再識別ネットワークを使用して、検出された歩行者が移動した既知の人物であるか、または新たに追跡された人物の最初の位置であるかを判断します。

その後、アプリケーションはトラックと最新の検出を画面に表示し、次のフレームに進みます。

注: デフォルトでは、Open Model Zoo のデモは BGR チャネル順序での入力を期待します。RGB 順序で動作するようにモデルをトレーニングした場合は、サンプルまたはデモ・アプリケーションでデフォルトのチャネル順序を手動で再配置するか、--reverse_input_channels 引数を指定したモデル・オプティマイザー・ツールを使用してモデルを再変換する必要があります。引数の詳細については、[前処理計算の埋め込み](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases) の入力チャネルを反転するセクションを参照してください。

実行の準備#

デモの入力画像またはビデオファイルについては、Open Model Zoo デモの概要のデモに使用できるメディアファイルのセクションを参照してください。デモでサポートされるモデルリストは、<omz_dir>/demos/pedestrian_tracker_demo/cpp/models.lst ファイルにあります。このファイルは、モデル・ダウンローダーおよびコンバーターのパラメーターとして使用され、モデルをダウンロードし、必要に応じて OpenVINO IR 形式 (*.xml + *.bin) に変換できます。

モデル・ダウンローダーの使用例:

omz_downloader --list models.lst

モデル・コンバーターの使用例:

omz_converter --list models.lst

サポートされるモデル#

architecture_type = centernet
- ctdet_coco_dlav0_512
architecture_type = ssd
- efficientdet-d0-tf
- efficientdet-d1-tf
- faster-rcnn-resnet101-coco-sparse-60-0001
- pedestrian-and-vehicle-detector-adas-0001
- pedestrian-detection-adas-0002
- person-detection-0106
- person-detection-0200
- person-detection-0201
- person-detection-0202
- person-detection-0203
- person-detection-retail-0002
- person-detection-retail-0013
- person-vehicle-bike-detection-2000
- person-vehicle-bike-detection-2001
- person-vehicle-bike-detection-2002
- person-vehicle-bike-detection-2003
- person-vehicle-bike-detection-2004
- rfcn-resnet101-coco-tf
- retinanet-tf
- ssd-resnet34-1200-onnx
- ssd_mobilenet_v1_coco
- ssd_mobilenet_v1_fpn_coco
- ssdlite_mobilenet_v2
- vehicle-detection-adas-0002
architecture_type = yolo
- person-vehicle-bike-detection-crossroad-yolov3-1020
- yolo-v3-tf
- yolo-v3-tiny-tf
- yolo-v1-tiny-tf
- yolo-v2-ava-0001
- yolo-v2-ava-sparse-35-0001
- yolo-v2-ava-sparse-70-0001
- yolo-v2-tf
- yolo-v2-tiny-ava-0001
- yolo-v2-tiny-ava-sparse-30-0001
- yolo-v2-tiny-ava-sparse-60-0001
- yolo-v2-tiny-tf
- yolo-v2-tiny-vehicle-detection-0001
再識別モデル
- person-reidentification-retail-0277
- person-reidentification-retail-0286
- person-reidentification-retail-0287
- person-reidentification-retail-0288

注: 各種デバイス向けのモデル推論サポートの詳細については、インテルの事前トレーニング・モデルのデバイスサポートとパブリックの事前トレーニング・モデルのデバイスサポートの表を参照してください。

実行する#

-h オプションを指定してデモを実行すると、ヘルプメッセージが表示されます:

pedestrian_tracker_demo [OPTION] 
Options: 

  -h                         Print a usage message.
  -i                         Required. An input to process.The input must be a single image, a folder of images, video file or camera id. 
  -loop                      Optional. Enable reading the input in a loop.
  -first                     Optional. The index of the first frame of the input to process. The actual first frame captured depends on cv::VideoCapture implementation and may have slightly different number.
  -read_limit                Optional. Read length limit before stopping or restarting reading the input.
  -o "<path>"                Optional. Name of the output file(s) to save.Frames of odd width or height can be truncated.See https://github.com/opencv/opencv/pull/24086 
  -limit "<num>"             Optional. Number of frames to store in output. If 0 is set, all frames are stored.
  -m_det "<path>"            Required. Path to the Pedestrian Detection Retail model (.xml) file.
  -m_reid "<path>"           Required. Path to the Pedestrian Reidentification Retail model (.xml) file.
  -d_det "<device>"          Optional. Specify the target device for pedestrian detection (the list of available devices is shown below). Default value is CPU.Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin.
  -d_reid "<device>"         Optional. Specify the target device for pedestrian reidentification (the list of available devices is shown below). Default value is CPU.Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin.
  -layout_det "<string>"     Optional.Specify inputs layouts.Ex.NCHW or input0:NCHW,input1:NC in case of more than one input.
  -r                         Optional. Output pedestrian tracking results in a raw format (compatible with MOTChallenge format).
  -no_show                   Optional.Don't show output.
  -delay                     Optional. Delay between frames used for visualization. If negative, the visualization is turned off (like with the option 'no_show'). If zero, the visualization is made frame-by-frame.
  -out "<path>"              Optional. The file name to write output log file with results of pedestrian tracking. The format of the log file is compatible with MOTChallenge format.
  -u                         Optional. List of monitors to show initially.
  -at "<type>"               Required. Architecture type for detector model: centernet, ssd or yolo.
  -t                         Optional. Probability threshold for detections.
  -auto_resize Optional. Enables resizable input with support of ROI crop & auto resize.
  -iou_t                     Optional. Filtering intersection over union threshold for overlapping boxes.
  -yolo_af Optional. Use advanced postprocessing/filtering algorithm for YOLO.
  -labels "<path>"           Optional. Path to a file with labels mapping.
  -nireq "<integer>"         Optional. Number of infer requests for detector model. If this option is omitted, number of infer requests is determined automatically.
  -nstreams                  Optional. Number of streams to use for inference on the CPU or/and GPU in throughput mode for detector model (for HETERO and MULTI device cases use format <device1>:<nstreams1>,<device2>:<nstreams2> or just <nstreams>)
  -nthreads "<integer>"      Optional. Number of threads for detector model.
  -person_label              Optional. Label of class person for detector. Default -1 for tracking all objects

例えば、GPU で歩行者検出器を推論し、CPU で歩行者を再識別する OpenVINO™ ツールキットの事前トレーニング済みモデルを使用してアプリケーションを実行するには、次のコマンドを実行します:

./pedestrian_tracker_demo -i <path_video_file> \ 
    -m_det <path_to_model>/person-detection-retail-0013.xml \ 
    -m_reid <path_to_model>/person-reidentification-retail-0277.xml \ 
    -d_det GPU 
    -at ssd

注: 単一の画像を入力として指定すると、デモはすぐに処理してレンダリングし終了します。推論結果を画面上で継続的に視覚化するには、loop オプションを適用します。これにより、単一の画像がループで処理されます。

-o オプションを使用すると、処理結果を Motion JPEG AVI ファイル、または別の JPEG または PNG ファイルに保存できます:

処理結果を AVI ファイルに保存するには、avi 拡張子を付けた出力ファイル名を指定します (例: -o output.avi)。
処理結果を画像として保存するには、出力画像ファイルのテンプレート名を拡張子 jpg または png で指定します (例: -o output_%03d.jpg)。実際のファイル名は、実行時に正規表現 %03d をフレーム番号に置き換えることによってテンプレートから構築され、output_000.jpg、output_001.jpg などになります。カメラなど連続入力ストリームでディスク領域のオーバーランを避けるため、limit オプションを使用して出力ファイルに保存されるデータの量を制限できます。デフォルト値は 1000 です。これを変更するには、-limit N オプションを適用します。ここで、N は保存するフレームの数です。

注: Windows* システムには、デフォルトでは Motion JPEG コーデックがインストールされていない場合があります。この場合、OpenVINO ™ インストール・パッケージに付属する、<INSTALL_DIR>/opencv/ffmpeg-download.ps1 にある PowerShell スクリプトを使用して OpenCV FFMPEG バックエンドをダウンロードできます。OpenVINO ™ がシステムで保護されたフォルダーにインストールされている場合 (一般的なケース)、スクリプトは管理者権限で実行する必要があります。あるいは、結果を画像として保存することもできます。

デモの出力#

このデモでは、OpenCV を使用して、境界ボックス、曲線 (軌跡表示用)、およびテキストとしてレンダリングされた検出を含む結果フレームを表示します。デモレポート:

FPS: ビデオフレーム処理の平均レート (1 秒あたりのフレーム数)。
レイテンシー: 1 フレームの処理 (フレームの読み取りから結果の表示まで) に必要な平均時間。

これらのメトリックを使用して、アプリケーション・レベルのパフォーマンスを測定できます。