OpenVINO™ によるリアルタイムの人間の姿勢推定#

この Jupyter ノートブックはオンラインで起動でき、ブラウザーのウィンドウで対話型環境を開きます。ローカルにインストールすることもできます。次のオプションのいずれかを選択します:

このノートブックは、Open Model Zoo の OpenPose human-pose-estimation-0001 モデルを使用して、OpenVINO によるライブ姿勢推定について説明します。このノートブックの最後では、Web カメラからのライブ推論結果が表示されます。さらに、ビデオファイルをアップロードすることもできます。

注: ウェブカメラを使用するには、ウェブカメラを備えたコンピューター上でこの Jupyter ノートブックを実行する必要があります。サーバー上で実行すると、Web カメラは機能しなくなります。ただし、最終ステップではビデオに対して推論を行うことができます。

目次:

インポート
モデル
- モデルのダウンロード
- モデルのロード
処理中
実行
- ライブポーズ推定の実行

%pip install -q "openvino>=2023.1.0" opencv-python tqdm

Note: you may need to restart the kernel to use updated packages.

インポート#

import collections 
import time 
from pathlib import Path 

import cv2 
import numpy as np 
from IPython import display 
from numpy.lib.stride_tricks import as_strided 
import openvino as ov 

# `notebook_utils` モジュールを取得 
import requests 

r = requests.get( 
    url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py", 
) 

open("notebook_utils.py", "w").write(r.text) 
import notebook_utils as utils

モデル#

モデルのダウンロード#

notebook_utils ファイルの関数である download_file を使用します。ディレクトリー構造が自動的に作成され、選択したモデルがダウンロードされます。

別のモデルをダウンロードする場合は、以下のコード内のモデルの名前と精度を置き換えます。

注: これには、別のポーズデコーダーが必要になることがあります。

# モデルがダウンロードされるディレクトリー
base_model_dir = Path("model") 

# Open Model Zoo モデルの名前 
model_name = "human-pose-estimation-0001" 
# Selected precision (FP32, FP16, FP16-INT8). 
precision = "FP16-INT8" 

model_path = base_model_dir / "intel" / model_name / precision / f"{model_name}.xml" 

if not model_path.exists(): 
    model_url_dir = 
f"https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/3/{model_name}/{precision}/" 
    utils.download_file(model_url_dir + model_name + ".xml", model_path.name, model_path.parent) 
    utils.download_file( 
        model_url_dir + model_name + ".bin", 
        model_path.with_suffix(".bin").name, 
        model_path.parent, 
    )

model/intel/human-pose-estimation-0001/FP16-INT8/human-pose-estimation-0001.xml:    0%|          | 0.00/474k [0…

model/intel/human-pose-estimation-0001/FP16-INT8/human-pose-estimation-0001.bin:    0%|          |0.00/4.03M[…

モデルのロード#

ダウンロードされたモデルは、ベンダー、モデル名、精度を示す固定構造内にあります。

モデルを実行するには、数行のコードで済みます。まず、OpenVINO ランタイムを初期化します。次に、.bin および .xml ファイルからネットワーク・アーキテクチャーとモデルの重みを読み取り、目的のデバイス用にコンパイルします。OpenVINO を使用して推論を実行するデバイスをドロップダウン・リストから選択します。

import ipywidgets as widgets 

core = ov.Core() 
device = widgets.Dropdown( 
    options=core.available_devices + ["AUTO"], 
    value="AUTO", 
    description="Device:", 
    disabled=False, 
) 

device

Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

# OpenVINO ランタイムを初期化 
core = ov.Core() 
# ファイルからネットワークを読み取り 
model = core.read_model(model_path) 
# AUTO デバイスにモデルをロードする場所を決定させます (CPU、GPU も使用できます) 
compiled_model = core.compile_model(model=model, device_name=device.value, config={"PERFORMANCE_HINT": "LATENCY"}) 

    # ノードの入力名と出力名を取得 
    input_layer = compiled_model.input(0) 
    output_layers = compiled_model.outputs 

    # 入力サイズを取得 
    height, width = list(input_layer.shape)[2:]

入力レイヤーには入力ノードの名前が含まれ、出力レイヤーにはネットワークの出力ノードの名前が含まれます。SSDLite MobileNetV2 の場合、入力が 1 つ、出力が 1 つあります。PAF とキーポイントのヒートマップ。

input_layer.any_name, [o.any_name for o in output_layers]

('data', ['Mconv7_stage2_L1', 'Mconv7_stage2_L2'])

OpenPose デコーダー#

ニューラル・ネットワークからの生の結果を姿勢推定に変換するには、OpenPose デコーダーが必要です。Open Model Zoo で提供されており、human-pose-estimation-0001 モデルと互換性があります。

human-pose-estimation-0001 以外のモデルを選択する場合は、別のデコーダー (AssociativeEmbeddingDecoder など) が必要になります。これは、Open Model Zoo のデモセクションで入手できます。

# code from https://github.com/openvinotoolkit/open_model_zoo/blob/9296a3712069e688fe64ea02367466122c8e8a3b/demos/common/python/models/open_pose.py#L135 
class OpenPoseDecoder:
    BODY_PARTS_KPT_IDS = ( 
        (1, 2), 
        (1, 5), 
        (2, 3), 
        (3, 4), 
        (5, 6), 
        (6, 7), 
        (1, 8), 
        (8, 9), 
        (9, 10), 
        (1, 11), 
        (11, 12), 
        (12, 13), 
        (1, 0), 
        (0, 14), 
        (14, 16), 
        (0, 15), 
        (15, 17), 
        (2, 16), 
        (5, 17), 
    ) 
    BODY_PARTS_PAF_IDS = ( 
        12, 
        20, 
        14, 
        16, 
        22, 
        24, 
        0, 
        2, 
        4, 
        6, 
        8, 
        10, 
        28, 
        30, 
        34, 
        32, 
        36, 
        18, 
        26, 
    ) 

    def __init__( 
        self, 
        num_joints=18, 
        skeleton=BODY_PARTS_KPT_IDS, 
        paf_indices=BODY_PARTS_PAF_IDS, 
        max_points=100, 
        score_threshold=0.1, 
        min_paf_alignment_score=0.05, 
        delta=0.5, 
    ): 
        self.num_joints = num_joints 
        self.skeleton = skeleton 
        self.paf_indices = paf_indices 
        self.max_points = max_points 
        self.score_threshold = score_threshold 
        self.min_paf_alignment_score = min_paf_alignment_score 
        self.delta = delta 

        self.points_per_limb = 10 
        self.grid = np.arange(self.points_per_limb, dtype=np.float32).reshape(1, -1, 1) 

    def __call__(self, heatmaps, nms_heatmaps, pafs): 
        batch_size, _, h, w = heatmaps.shape 
        assert batch_size == 1, "Batch size of 1 only supported" 

        keypoints = self.extract_points(heatmaps, nms_heatmaps) pafs = np.transpose(
        pafs, (0, 2, 3, 1)) 

        if self.delta > 0: 
            for kpts in keypoints: 
                kpts[:, :2] += self.delta 
                np.clip(kpts[:, 0], 0, w - 1, out=kpts[:, 0]) 
                np.clip(kpts[:, 1], 0, h - 1, out=kpts[:, 1]) 

        pose_entries, keypoints = self.group_keypoints(keypoints, pafs, pose_entry_size=self.num_joints + 2) 
        poses, scores = self.convert_to_coco_format(pose_entries, keypoints) 
        if len(poses) > 0: 
            poses = np.asarray(poses, dtype=np.float32) 
            poses = poses.reshape((poses.shape[0], -1, 3)) 
        else: 
            poses = np.empty((0, 17, 3), dtype=np.float32) 
            scores = np.empty(0, dtype=np.float32) 

        return poses, scores 

    def extract_points(self, heatmaps, nms_heatmaps): 
        batch_size, channels_num, h, w = heatmaps.shape 
        assert batch_size == 1, "Batch size of 1 only supported" 
        assert channels_num >= self.num_joints 

        xs, ys, scores = self.top_k(nms_heatmaps) 
        masks = scores > self.score_threshold 
        all_keypoints = [] 
        keypoint_id = 0 
        for k in range(self.num_joints):
            # 低スコアポイントをフィルター処理 
            mask = masks[0, k] 
            x = xs[0, k][mask].ravel() 
            y = ys[0, k][mask].ravel() 
            score = scores[0, k][mask].ravel() 
            n = len(x) 
            if n == 0: 
                all_keypoints.append(np.empty((0, 4), dtype=np.float32)) 
                continue 
            # ローカリゼーションの精度を向上させるため、クォーターオフセットを適用 
            x, y = self.refine(heatmaps[0, k], x, y) 
            np.clip(x, 0, w - 1, out=x) 
            np.clip(y, 0, h - 1, out=y) 
            # 結果のポイントをパック 
            keypoints = np.empty((n, 4), dtype=np.float32) 
            keypoints[:, 0] = x 
            keypoints[:, 1] = y 
            keypoints[:, 2] = score 
            keypoints[:, 3] = np.arange(keypoint_id, keypoint_id + n) 
            keypoint_id += n all_keypoints.append(keypoints) 
        return all_keypoints 

    def top_k(self, heatmaps):
        N, K, _, W = heatmaps.shape 
        heatmaps = heatmaps.reshape(N, K, -1) 
        # Get positions with top scores. 
        ind = heatmaps.argpartition(-self.max_points, axis=2)[:, :, -self.max_points :] 
        scores = np.take_along_axis(heatmaps, ind, axis=2) 
        # Keep top scores sorted. 
        subind = np.argsort(-scores, axis=2) 
        ind = np.take_along_axis(ind, subind, axis=2) 
        scores = np.take_along_axis(scores, subind, axis=2) 
        y, x = np.divmod(ind, W) 
        return x, y, scores 

    @staticmethod 
    def refine(heatmap, x, y): 
        h, w = heatmap.shape[-2:] 
        valid = np.logical_and(np.logical_and(x > 0, x < w - 1), np.logical_and(y > 0, y < h - 1)) 
        xx = x[valid] 
        yy = y[valid] 
        dx = np.sign(heatmap[yy, xx + 1] - heatmap[yy, xx - 1], dtype=np.float32) * 0.25 
        dy = np.sign(heatmap[yy + 1, xx] - heatmap[yy - 1, xx], dtype=np.float32) * 0.25 
        x = x.astype(np.float32) 
        y = y.astype(np.float32) 
        x[valid] += dx 
        y[valid] += dy 
        return x, y 

    @staticmethod 
    def is_disjoint(pose_a, pose_b): 
        pose_a = pose_a[:-2] 
        pose_b = pose_b[:-2] 
        return np.all(np.logical_or.reduce((pose_a == pose_b, pose_a < 0, pose_b < 0))) 

    def update_poses( 
        self, 
        kpt_a_id, 
        kpt_b_id, 
        all_keypoints, 
        connections, 
        pose_entries, 
        pose_entry_size, 
    ): 
        for connection in connections: 
            pose_a_idx = -1 
            pose_b_idx = -1 
            for j, pose in enumerate(pose_entries): 
                if pose[kpt_a_id] == connection[0]: 
                    pose_a_idx = j 
                if pose[kpt_b_id] == connection[1]: 
                    pose_b_idx = j 
            if pose_a_idx < 0 and pose_b_idx < 0:
                # 新しいポーズエントリーを作成 
                pose_entry = np.full(pose_entry_size, -1, dtype=np.float32) 
                pose_entry[kpt_a_id] = connection[0] 
                pose_entry[kpt_b_id] = connection[1] 
                pose_entry[-1] = 2 
                pose_entry[-2] = np.sum(all_keypoints[connection[0:2], 2]) + connection[2] pose_entries.append(pose_entry) 
            elif pose_a_idx >= 0 and pose_b_idx >= 0 and pose_a_idx != pose_b_idx:
                # 分離している 2 つのポーズをマージします。それ以外の場合は接続を無視します 
                pose_a = pose_entries[pose_a_idx] 
                pose_b = pose_entries[pose_b_idx] 
                if self.is_disjoint(pose_a, pose_b): 
                    pose_a += pose_b 
                    pose_a[:-2] += 1
                     pose_a[-2] += connection[2] 
                    del pose_entries[pose_b_idx] 
            elif pose_a_idx >= 0 and pose_b_idx >= 0:
                # ポーズのスコアを調整 
                pose_entries[pose_a_idx][-2] += connection[2] 
            elif pose_a_idx >= 0:
                # ポーズに新しい limb を追加 
                pose = pose_entries[pose_a_idx] 
                if pose[kpt_b_id] < 0: 
                    pose[-2] += all_keypoints[connection[1], 2] 
                pose[kpt_b_id] = connection[1] 
                pose[-2] += connection[2] 
                pose[-1] += 1 
            elif pose_b_idx >= 0:
                # ポーズに新しい limb を追加 
                pose = pose_entries[pose_b_idx] 
                if pose[kpt_a_id] < 0: 
                    pose[-2] += all_keypoints[connection[0], 2] 
                pose[kpt_a_id] = connection[0] 
                pose[-2] += connection[2] 
                pose[-1] += 1 
        return pose_entries 

    @staticmethod 
    def connections_nms(a_idx, b_idx, affinity_scores):
        # 開始/終了キーポイントを共有する取得されたすべての接続から、最高スコアだけを残します 
        order = affinity_scores.argsort()[::-1] 
        affinity_scores = affinity_scores[order] 
        a_idx = a_idx[order] 
        b_idx = b_idx[order] 
        idx = [] 
        has_kpt_a = set() 
        has_kpt_b = set() 
        for t, (i, j) in enumerate(zip(a_idx, b_idx)): 
            if i not in has_kpt_a and j not in has_kpt_b: 
                idx.append(t) 
                has_kpt_a.add(i) 
                has_kpt_b.add(j) 
        idx = np.asarray(idx, dtype=np.int32) 
        return a_idx[idx], b_idx[idx], affinity_scores[idx] 

    def group_keypoints(self, all_keypoints_by_type, pafs, pose_entry_size=20): 
        all_keypoints = np.concatenate(all_keypoints_by_type, axis=0) 
        pose_entries = [] 
        # あらゆる limb 向け 
        for part_id, paf_channel in enumerate(self.paf_indices): 
            kpt_a_id, kpt_b_id = self.skeleton[part_id] 
            kpts_a = all_keypoints_by_type[kpt_a_id] 
            kpts_b = all_keypoints_by_type[kpt_b_id] 
            n = len(kpts_a) 
            m = len(kpts_b) 
            if n == 0 or m == 0: 
                continue 

            # すべてのキーポイントのペア間のベクトル、つまり候補となる limb ベクトルを取得 
            a = kpts_a[:, :2] 
            a = np.broadcast_to(a[None], (m, n, 2)) 
            b = kpts_b[:, :2] 
            vec_raw = (b[:, None, :]- a).reshape(-1, 1, 2) 

            # すべての候補肢ベクトルに沿ってポイントをサンプリング 
            steps = 1 / (self.points_per_limb - 1) * vec_raw 
            points = steps * self.grid + a.reshape(-1, 1, 2) 
            points = points.round().astype(dtype=np.int32) 
            x = points[..., 0].ravel() 
            y = points[..., 1].ravel() 

            # 候補肢ベクトルと部位類似性フィールド間の類似性スコアを計算 
            part_pafs = pafs[0, :, :, paf_channel : paf_channel + 2] 
            field = part_pafs[y, x].reshape(-1, self.points_per_limb, 2) 
            vec_norm = np.linalg.norm(vec_raw, ord=2, axis=-1, keepdims=True) vec = vec_raw / (vec_norm + 1e-6) 
            affinity_scores = (field * vec).sum(-1).reshape(-1, self.points_per_limb) 
            valid_affinity_scores = affinity_scores > self.min_paf_alignment_score 
            valid_num = valid_affinity_scores.sum(1) 
            affinity_scores = (affinity_scores * valid_affinity_scores).sum(1) / (valid_num + 1e-6) 
            success_ratio = valid_num / self.points_per_limb 

            # 取得した親和性スコアに応じて limb のリストを取得 
            valid_limbs = np.where(np.logical_and(affinity_scores > 0, success_ratio > 0.8))[0] 
            if len(valid_limbs) == 0: 
                continue 
            b_idx, a_idx = np.divmod(valid_limbs, n) 
            affinity_scores = affinity_scores[valid_limbs] 

            # 互換性のない接続を抑制 
            a_idx, b_idx, affinity_scores = self.connections_nms(a_idx, b_idx, affinity_scores) 
            connections = list( 
                zip( 
                    kpts_a[a_idx, 3].astype(np.int32), 
                    kpts_b[b_idx, 3].astype(np.int32), 
                    affinity_scores, 
                ) 
            ) 
            if len(connections) == 0: 
                continue 

            # 新しい接続でポーズを更新 
            pose_entries = self.update_poses( 
                kpt_a_id, 
                kpt_b_id, 
                all_keypoints, 
                connections, 
                pose_entries, 
                pose_entry_size, 
            ) 

        # ポイントが足りないポーズを削除 
        pose_entries = np.asarray(pose_entries, dtype=np.float32).reshape(-1, pose_entry_size) 
        pose_entries = pose_entries[pose_entries[:, -1] >= 3] 
        return pose_entries, all_keypoints 

    @staticmethod 
    def convert_to_coco_format(pose_entries, all_keypoints): 
        num_joints = 17 
        coco_keypoints = [] 
        scores = [] 
        for pose in pose_entries: 
            if len(pose) == 0: 
                continue 
            keypoints = np.zeros(num_joints * 3) 
            reorder_map = [0, -1, 6, 8, 10, 5, 7, 9, 12, 14, 16, 11, 13, 15, 2, 1, 4, 3] 
            person_score = pose[-2] 
            for keypoint_id, target_id in zip(pose[:-2], reorder_map): 
                if target_id < 0: 
                    continue 
                cx, cy, score = 0, 0, 0 # キーポイントが見つかりません 
                if keypoint_id != -1: 
                    cx, cy, score = all_keypoints[int(keypoint_id), 0:3] 
                keypoints[target_id * 3 + 0] = cx 
                keypoints[target_id * 3 + 1] = cy 
                keypoints[target_id * 3 + 2] = score 
            coco_keypoints.append(keypoints) 
            scores.append(person_score * max(0, (pose[-1] - 1))) # 'neck’ は -1 
        return np.asarray(coco_keypoints), np.asarray(scores)

処理#

decoder = OpenPoseDecoder()

処理結果#

結果をポーズに変換するいくつかの関数が用意されています。

まず、ヒートマップをプールします。numpy ではプーリングが利用できないため、numpy で直接行う方法を使用してください。次に、非最大抑制を使用してヒートマップからキーポイントを取得します。その後、デコーダーを使用してポーズをデコードします。入力イメージはネットワーク出力よりも大きいため、すべてのポーズ座標にスケール係数を乗算する必要があります。

# NumPy での 2D プーリング (https://stackoverflow.com/a/54966908/1624463 から) 
def pool2d(A, kernel_size, stride, padding, pool_mode="max"): 
    """ 
    2D Pooling 

    Parameters:
        A: input 2D array 
        kernel_size: int, the size of the window 
        stride: int, the stride of the window 
        padding: int, implicit zero paddings on both sides of the input 
        pool_mode: string, 'max' or 'avg' 
    """ 
    # パディング 
    A = np.pad(A, padding, mode="constant") 

    # A のウィンドウビュー 
    output_shape = ( 
        (A.shape[0] - kernel_size) // stride + 1, 
        (A.shape[1] - kernel_size) // stride + 1, 
    ) 
    kernel_size = (kernel_size, kernel_size) 
    A_w = as_strided( 
        A, 
        shape=output_shape + kernel_size, 
        strides=(stride * A.strides[0], stride * A.strides[1]) + A.strides, 
    ) 
    A_w = A_w.reshape(-1, *kernel_size) 

    # プーリングの結果を返す 
    if pool_mode == "max": 
        return A_w.max(axis=(1, 2)).reshape(output_shape) 
    elif pool_mode == "avg": 
        return A_w.mean(axis=(1, 2)).reshape(output_shape) 

# 非最大抑制 
def heatmap_nms(heatmaps, pooled_heatmaps): 
    return heatmaps * (heatmaps == pooled_heatmaps) 

# 結果からポーズを取得 
def process_results(img, pafs, heatmaps):
    # この処理は 
    # https://github.com/openvinotoolkit/open_model_zoo/blob/master/demos/common/python/models/open_pose.py から取得されています
    pooled_heatmaps = np.array([[pool2d(h, kernel_size=3, stride=1, padding=1, pool_mode="max") for h in heatmaps[0]]]) 
    nms_heatmaps = heatmap_nms(heatmaps, pooled_heatmaps) 

    # ポーズをデコード 
    poses, scores = decoder(heatmaps, nms_heatmaps, pafs) 
    output_shape = list(compiled_model.output(index=0).partial_shape) 
    output_scale = ( 
        img.shape[1] / output_shape[3].get_length(), 
        img.shape[0] / output_shape[2].get_length(), 
    ) 
    # 座標にスケーリング係数を掛ける 
    poses[:, :, :2] *= output_scale 
    return poses, scores

ポーズオーバーレイの描画#

画像上にポーズ・オーバーレイを描画して、推定されたポーズを視覚化します。関節は円として描画され、手足は線で描画されます。このコードは、Open Model Zoo の Human Pose Estimation Demo をベースにしています。

colors = ( 
    (255, 0, 0), 
    (255, 0, 255), 
    (170, 0, 255), 
    (255, 0, 85), 
    (255, 0, 170), 
    (85, 255, 0), 
    (255, 170, 0), 
    (0, 255, 0), 
    (255, 255, 0), 
    (0, 255, 85), 
    (170, 255, 0), 
    (0, 85, 255), 
    (0, 255, 170), 
    (0, 0, 255), 
    (0, 255, 255), 
    (85, 0, 255), 
    (0, 170, 255), ) 

default_skeleton = ( 
    (15, 13), 
    (13, 11), 
    (16, 14), 
    (14, 12), 
    (11, 12), 
    (5, 11), 
    (6, 12), 
    (5, 6), 
    (5, 7), 
    (6, 8), 
    (7, 9), 
    (8, 10), 
    (1, 2), 
    (0, 1), 
    (0, 2), 
    (1, 3), 
    (2, 4), 
    (3, 5), 
    (4, 6), 
) 

def draw_poses(img, poses, point_score_threshold, skeleton=default_skeleton): 
    if poses.size == 0: 
        return img 

    img_limbs = np.copy(img) 
    for pose in poses: 
        points = pose[:, :2].astype(np.int32) 
        points_scores = pose[:, 2] 
        # ジョイントを描画 
        for i, (p, v) in enumerate(zip(points, points_scores)): 
            if v > point_score_threshold: 
                cv2.circle(img, tuple(p), 1, colors[i], 2) 
        # limb を描画 
        for i, j in skeleton: 
            if points_scores[i] > point_score_threshold and points_scores[j] > point_score_threshold: 
                cv2.line( 
                    img_limbs, 
                    tuple(points[i]), 
                    tuple(points[j]), 
                    color=colors[j], 
                    thickness=4, 
                ) 
    cv2.addWeighted(img, 0.4, img_limbs, 0.6, 0, dst=img) 
    return img

メイン処理関数#

指定されたソースで姿勢推定を実行します。ウェブカメラまたはビデオファイルのいずれか。

# 姿勢推定を実行するためのメイン処理関数 
def run_pose_estimation(source=0, flip=False, use_popup=False, skip_first_frames=0): 
    pafs_output_key = compiled_model.output("Mconv7_stage2_L1") 
    heatmaps_output_key = compiled_model.output("Mconv7_stage2_L2") 
    player = None 
    try:
        # ターゲット fps で再生するビデオプレーヤーを作成 
        player = utils.VideoPlayer(source, flip=flip, fps=30, skip_first_frames=skip_first_frames) 
        # キャプチャーを開始 
        player.start() 
        if use_popup: 
            title = "Press ESC to Exit" 
            cv2.namedWindow(title, cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE) 

        processing_times = collections.deque() 

        while True:
            # フレームをグラブ 
            frame = player.next() 
            if frame is None: 
                print("Source ended") 
                break 
            # フレームがフル HD より大きい場合は、サイズを縮小してパフォーマンスを向上させる 
            scale = 1280 / max(frame.shape) 
            if scale < 1: 
                frame = cv2.resize(frame, None, fx=scale, fy=scale, interpolation=cv2.INTER_AREA) 
            # ニューラル・ネットワークの入力に合わせて画像のサイズを変更し、暗さを変更
            # (https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/human-pose-estimation-0001 を参照) 
            input_img = cv2.resize(frame, (width, height), interpolation=cv2.INTER_AREA) 
            # 画像のバッチを作成 (サイズ = 1) 
            input_img = input_img.transpose((2, 0, 1))[np.newaxis, ...]
            # 処理時間を測定 
            start_time = time.time() 
            # 結果を取得 
            results = compiled_model([input_img]) 
            stop_time = time.time() 

            pafs = results[pafs_output_key] 
            heatmaps = results[heatmaps_output_key] 
            # ネットワークの結果からポーズを取得 
            poses, scores = process_results(frame, pafs, heatmaps) 

            # フレームにポーズを描画 
            frame = draw_poses(frame, poses, 0.1) 

            processing_times.append(stop_time - start_time) 
            # 最後の 200 フレームの処理時間を使用 
            if len(processing_times) > 200: 
                processing_times.popleft() 

            _, f_width = frame.shape[:2] 
            # 平均処理時間 [ms] 
            processing_time = np.mean(processing_times) * 1000 
            fps = 1000 / processing_time 
            cv2.putText( 
                frame, 
                f"Inference time: {processing_time:.1f}ms ({fps:.1f} FPS)", (20, 40), 
                cv2.FONT_HERSHEY_COMPLEX, 
                f_width / 1000, 
                (0, 0, 255), 
                1, 
                cv2.LINE_AA, 
            ) 

            # ちらつきがある場合は、この回避策を使用 
            if use_popup: 
                cv2.imshow(title, frame) 
                key = cv2.waitKey(1) 
                # escape = 27 
                if key == 27: 
                    break 
            else: 
                # numpy 配列を jpg にエンコード
                _, encoded_img = cv2.imencode(".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90]) 
                # IPython イメージを作成 
                i = display.Image(data=encoded_img) 
                # このノートブックに画像を表示 
                display.clear_output(wait=True) 
                display.display(i) 
# ctrl-c 
except KeyboardInterrupt: 
    print("Interrupted") 
# 異なるエラー 
except RuntimeError as e: 
    print(e) 
finally: 
    if player is not None:
        # キャプチャーを停止 
        Player.stop().stop() 
    if use_popup: 
        cv2.destroyAllWindows()

実行#

ライブポーズ推定の実行#

Web カメラをビデオ入力として使用します。デフォルトでは、プライマリー Web カメラは source=0 に設定されます。複数のウェブカメラがある場合、0 から始まる連続した番号が割り当てられます。前面カメラを使用する場合は、flip=True を設定します。一部のウェブブラウザー、特に Mozilla Firefox ではちらつきが発生する場合があります。ちらつきが発生する場合、use_popup=True を設定してください。

注: このノートブックをウェブカメラで使用するには、ウェブカメラを備えたコンピューター上でノートブックを実行する必要があります。ノートブックをサーバー (Binder など) 上で実行する場合、ウェブカメラは機能しません。このノートブックをリモート・コンピューター (Binder など) で実行する場合、ポップアップ・モードは機能しない可能性があります。

ウェブカメラがない場合でも、ビデオファイルを使用してこのデモを実行できます。OpenCV でサポートされている形式であればどれでも機能します。最初の N フレームをスキップしてビデオを早送りできます。

姿勢推定を実行します:

USE_WEBCAM = False 
cam_id = 0 
video_file = "https://github.com/intel-iot-devkit/sample-videos/blob/master/store-aisle-detection.mp4?raw=true" 
source = cam_id if USE_WEBCAM else video_file 

additional_options = {"skip_first_frames": 500} if not USE_WEBCAM else {} 
run_pose_estimation(source=source, flip=isinstance(source, int), use_popup=False, **additional_options)

../_images/pose-estimation-with-output_22_0.png

ソースの終わり