スマート・クラスルーム C++ デモ#

このデモでは、教室環境で生徒の行動を検出し (person-detection-action-recognition-0005 モデルでは座る、立つ、手を上げる、person-detection-action-recognition-0006 モデルでは座る、立つ、手を上げる、立つ、向きを変える、机の上に横になる)、人を顔を認識するために、複数のニューラル・ネットワークを共同使用する例が示されています。アクションおよび顔検出ネットワークに Async API を使用します。これにより、顔認識と検出の実行を並列化できます。顔認識が 1 つのアクセラレーターで実行されている間に、顔とアクションの検出を別のアクセラレーターで実行できます。デモでは、次の事前トレーニングされたモデルのセットを使用できます:

face-detection-adas-0001、これは顔を検出する主要な検出ネットワークです。
landmarks-regression-retail-0009、これは最初のネットワークからの結果に基づいて実行され、検出された顔ごとに顔のランドマークのベクトルを出力します。
face-reidentification-retail-0095、これは最初のネットワークからの結果に基づいて実行され、検出された顔ごとに特徴のベクトルを出力します。
person-detection-action-recognition-0005、これは人を検出すると同時にその人の現在の行動 (3 つの行動: 座る、立つ、手を上げる) を予測する検出ネットワークです。
person-detection-action-recognition-0006、これは人を検出すると同時にその人の現在の行動 (6 つの行動: 座る、書く、手を上げる、立つ、振り向く、机に横たわる) を予測する検出ネットワークです。
person-detection-raisinghand-recognition-0001、これは生徒を検出すると同時に、生徒の現在の行動を予測する検出ネットワークです (以前のモデルとは対照的に、生徒が手を挙げているかどうかのみを予測します)。
person-detection-action-recognition-teacher-0002、これは人を検出し、同時にその人の現在の行動を予測する検出ネットワークです。

どのように動作するか#

起動時に、アプリケーションはコマンドライン・パラメーターを読み取り、-m... オプションファミリーに応じて、さまざまなデバイスで実行するため 4 つのモデルを OpenVINO™ ランタイムプラグインにロードします。OpenCV VideoCapture からフレームを取得すると、顔検出ネットワークとアクション検出ネットワークの推論が実行されます。その後、顔検出器によって取得された ROI が顔ランドマーク回帰ネットワークに供給されます。次に、ランドマークを使用してアフィン変換によって顔を位置合わせし、顔認識ネットワークにフィードします。認識された顔と検出された動作が照合され、フレームごとに認識された人物の動作が検出されます。

注: デフォルトでは、Open Model Zoo のデモは BGR チャネル順序での入力を期待します。RGB 順序で動作するようにモデルをトレーニングした場合は、サンプルまたはデモ・アプリケーションでデフォルトのチャネル順序を手動で再配置するか、--reverse_input_channels 引数を指定したモデル・オプティマイザー・ツールを使用してモデルを再変換する必要があります。引数の詳細については、[前処理計算の埋め込み](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases) の入力チャネルを反転するセクションを参照してください。

顔認識用のギャラリーの作成#

フレーム上の顔を認識するには、デモには参照画像のギャラリーが必要です。各画像には顔がしっかりと切り取られている必要があります。任意の画像リストからギャラリーを作成できます:

正面向きの顔を厳密に切り取った画像を別の空のフォルダーに置きます。各 ID にはイメージが 1 つだけ含まれている必要があります。画像に id_name0.png, id_name1.png, ... という名前を付けます。
python3 <omz_dir>/demos/smart_classroom_demo/utils/create_list.py <path_to_folder_with_images> コマンドを実行します。これにより、ファイルと ID のリストを含む faces_gallery.json ファイルが作成されます。

実行の準備#

デモの入力画像またはビデオファイルについては、Open Model Zoo デモの概要のデモに使用できるメディアファイルのセクションを参照してください。デモでサポートされるモデルリストは、<omz_dir>/demos/smart_classroom_demo/cpp/models.lst ファイルにあります。このファイルは、モデル・ダウンローダーおよびコンバーターのパラメーターとして使用され、モデルをダウンロードし、必要に応じて OpenVINO IR 形式 (*.xml + *.bin) に変換できます。

モデル・ダウンローダーの使用例:

omz_downloader --list models.lst

モデル・コンバーターの使用例:

omz_converter --list models.lst

サポートされるモデル#

face-detection-adas-0001
face-recognition-resnet100-arcface-onnx
face-reidentification-retail-0095
facenet-20180408-102900
landmarks-regression-retail-0009
person-detection-action-recognition-0005
person-detection-action-recognition-0006
person-detection-action-recognition-teacher-0002
person-detection-raisinghand-recognition-0001

注: 各種デバイス向けのモデル推論サポートの詳細については、インテルの事前トレーニング・モデルのデバイスサポートとパブリックの事前トレーニング・モデルのデバイスサポートの表を参照してください。

実行する#

-h オプションを指定してアプリケーションを実行すると、使用方法が表示されます:

smart_classroom_demo [OPTION] 
Options: 
    -h                     Print a usage message.
    -i                     Required. An input to process. The input must be a single image, a folder of images, video file or camera id. 
    -loop                  Optional. Enable reading the input in a loop.
    -read_limit            Optional. Read length limit before stopping or restarting reading the input.
    -o "<path>"            Optional. Name of the output file(s) to save. Frames of odd width or height can be truncated. See https://github.com/opencv/opencv/pull/24086 
    -limit "<num>"         Optional. Number of frames to store in output. If 0 is set, all frames are stored.
    -m_act '<path>'        Required. Path to the Person/Action Detection Retail model (.xml) file.
    -m_fd '<path>'         Required. Path to the Face Detection model (.xml) file.
    -m_lm '<path>'         Optional. Path to the Facial Landmarks Regression Retail model (.xml) file.
    -m_reid '<path>'       Optional. Path to the Face Reidentification Retail model (.xml) file.
    -d_act '<device>'      Optional. Specify the target device for Person/Action Detection Retail (the list of available devices is shown below). Default value is CPU. Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin. The application looks for a suitable plugin for the specified device.
    -d_fd '<device>'       Optional. Specify the target device for Face Detection Retail (the list of available devices is shown below). Default value is CPU. Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin. The application looks for a suitable plugin for the specified device.
    -d_lm '<device>'       Optional. Specify the target device for Landmarks Regression Retail (the list of available devices is shown below). Default value is CPU. Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin. The application looks for a suitable plugin for the specified device.
    -d_reid '<device>'     Optional. Specify the target device for Face Reidentification Retail (the list of available devices is shown below). Default value is CPU. Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin. The application looks for a suitable plugin for the specified device.
    -greedy_reid_matching  Optional. Use faster greedy matching algorithm in face reid.
    -r                     Optional. Output Inference results as raw values.
    -ad                    Optional. Output file name to save per-person action statistics in. Requires -teacher_id and -a_top to be unset and -fg to be set 
    -t_ad                  Optional. Probability threshold for person/action detection.
    -t_ar                  Optional. Probability threshold for action recognition.
    -t_fd                  Optional. Probability threshold for face detections.
    -inh_fd                Optional. Input image height for face detector.
    -inw_fd                Optional. Input image width for face detector.
    -exp_r_fd Optional. Expand ratio for bbox before face recognition.
    -t_reid                Optional. Cosine distance threshold between two vectors for face reidentification.
    -fg                    Optional. Path to a faces gallery in .json format.
    -teacher_id            Optional. ID of a teacher.You must also set a faces gallery parameter (-fg) to use it.
    -no_show               Optional. Don't show output.
    -min_ad                Optional. Minimum action duration in seconds.
    -d_ad                  Optional. Maximum time difference between actions in seconds.
    -student_ac            Optional. List of student actions separated by a comma.
    -top_ac                Optional. List of student actions (for top-k mode) separated by a comma.
    -teacher_ac            Optional. List of teacher actions separated by a comma.
    -top_id                Optional. Target action name.
    -a_top                 Optional. Number of first K students.If this parameter is positive, the demo detects first K persons with the action, pointed by the parameter 'top_id’ 
    -crop_gallery          Optional. Crop images during faces gallery creation.
    -t_reg_fd              Optional. Probability threshold for face detections during database registration.
    -min_size_fr           Optional. Minimum input size for faces during database registration.
    -al                    Optional. Output file name to save per-person action detections in.
    -ss_t                  Optional. Number of frames to smooth actions.
    -u                     Optional. List of monitors to show initially.

オプションのリストを空にしてアプリケーションを実行すると、エラーメッセージが表示されます。

生徒のアクションを認識するために事前トレーニングされたモデルを使用してアプリケーションを実行する有効なコマンドラインの例:

./smart_classroom_demo \ 
    -i <path_to_video> \ 
    -m_act <path_to_model>/person-detection-action-recognition-0005.xml \ 
    -student_ac "sitting, standing, raising hand" \ 
    -m_fd <path_to_model>/face-detection-adas-0001.xml \ 
    -m_reid <path_to_model>/face-reidentification-retail-0095.xml \ 
    -m_lm <path_to_model>/landmarks-regression-retail-0009.xml \ 
    -t_reid 0.8 \ 
    -fg <path_to_faces_gallery.json>

注: 生徒のアクションを認識するには、3 つの基本アクションには person-detection-action-recognition-0005 モデルを、6 つのアクションには person-detection-action-recognition-0006 モデルを使用します。認識されるアクションのリストの詳細については、モデルの説明を参照してください。

教師のアクションを認識するアプリケーションを実行する有効なコマンドラインの例:

./smart_classroom_demo \ 
    -i <path_to_video> \ 
    -m_act <path_to_model>/person-detection-action-recognition-teacher-0002.xml \ 
    -m_fd <path_to_model>/face-detection-adas-0001.xml \ 
    -m_reid <path_to_model>/face-reidentification-retail-0095.xml \ 
    -m_lm <path_to_model>/landmarks-regression-retail-0009.xml \ 
    -fg <path to faces_gallery.json> \ 
    -top_id \ 
    -teacher_id <ID of a teacher in the face gallery>

注: 教師のアクションを認識するには、person-detection-action-recognition-teacher-0002 モデルを使用します。認識されるアクションのリストの詳細については、モデルの説明を参照してください。

最初に挙手をした学生を認識するアプリケーションを実行する有効なコマンドラインの例:

./smart_classroom_demo \ 
    -i <path_to_video> \ 
    -m_act <path_to_model>/person-detection-raisinghand-recognition-0001.xml \ 
    -a_top <number of first raised-hand students>

注: 生徒の挙手動作を認識するには、person-detection-raisinghand-recognition-0001 モデルを使用します。

注: 単一の画像を入力として指定すると、デモはすぐに処理してレンダリングし終了します。推論結果を画面上で継続的に視覚化するには、loop オプションを適用します。これにより、単一の画像がループで処理されます。

-o オプションを使用すると、処理結果を Motion JPEG AVI ファイル、または別の JPEG または PNG ファイルに保存できます:

処理結果を AVI ファイルに保存するには、avi 拡張子を付けた出力ファイル名を指定します (例: -o output.avi)。
処理結果を画像として保存するには、出力画像ファイルのテンプレート名を拡張子 jpg または png で指定します (例: -o output_%03d.jpg)。実際のファイル名は、実行時に正規表現 %03d をフレーム番号に置き換えることによってテンプレートから構築され、output_000.jpg、output_001.jpg などになります。カメラなど連続入力ストリームでディスク領域のオーバーランを避けるため、limit オプションを使用して出力ファイルに保存されるデータの量を制限できます。デフォルト値は 1000 です。これを変更するには、-limit N オプションを適用します。ここで、N は保存するフレームの数です。

注: Windows* システムには、デフォルトでは Motion JPEG コーデックがインストールされていない場合があります。この場合、OpenVINO ™ インストール・パッケージに付属する、<INSTALL_DIR>/opencv/ffmpeg-download.ps1 にある PowerShell スクリプトを使用して OpenCV FFMPEG バックエンドをダウンロードできます。OpenVINO ™ がシステムで保護されたフォルダーにインストールされている場合 (一般的なケース)、スクリプトは管理者権限で実行する必要があります。あるいは、結果を画像として保存することもできます。

デモの出力#

デモでは OpenCV を使用して、ラベル付きのアクションと面を含む結果フレームを表示します。デモレポート:

FPS: ビデオフレーム処理の平均レート (1 秒あたりのフレーム数)。
レイテンシー: 1 フレームの処理 (フレームの読み取りから結果の表示まで) に必要な平均時間。

これらのメトリックを使用して、アプリケーション・レベルのパフォーマンスを測定できます。