プラグイン¶

OpenVINO プラグインは通常、バックエンドのラッパーを表します。バックエンドは次のことを行います。

GPU デバイス向けの OpenCL* のようなバックエンド (clDNN ライブラリーなど)。
インテル® CPU デバイス向けの oneDNN バックエンド。
NVIDIA GPU 向けの NVIDIA cuDNN。

OpenVINO プラグインの役割:

バックエンドを初期化しますが、初期化できない場合はエンジン・コンストラクターで例外をスローします。
特定のバックエンドによって有効にされているデバイスに関する情報 (デバイスの数、そのプロパティーなど) を提供します。
コンパイルされたモデルのオブジェクトをロードまたはインポートします。

OpenVINO パブリック API に加えて、OpenVINO はプラグイン API を提供します。これは、新しいプラグインの開発を簡素化する一連の関数とヘルパークラスで構成されます。

src/inference/dev_api/openvino ディレクトリー内のヘッダーファイル
src/inference/src/dev/ ディレクトリー内の実装
OpenVINO 共有ライブラリーのシンボル

プラグイン API を使用して OpenVINO プラグインをビルドするには、OpenVINO プラグインのビルドガイドを参照してください。

プラグインクラス¶

OpenVINO Plugin API は、プラグインの基本クラスとして使用することを推奨するヘルパー ov::IPlugin クラスを提供します。これに基づくプラグインクラスの宣言は次のようになります。

                                    namespace ov {
namespace template_plugin {

class Plugin : public ov::IPlugin {
public:
    Plugin();
    ~Plugin();

    std::shared_ptr<ov::ICompiledModel> compile_model(const std::shared_ptr<const ov::Model>& model,
                                                      const ov::AnyMap& properties) const override;

    std::shared_ptr<ov::ICompiledModel> compile_model(const std::shared_ptr<const ov::Model>& model,
                                                      const ov::AnyMap& properties,
                                                      const ov::SoPtr<ov::IRemoteContext>& context) const override;

    void set_property(const ov::AnyMap& properties) override;

    ov::Any get_property(const std::string& name, const ov::AnyMap& arguments) const override;

    ov::SoPtr<ov::IRemoteContext> create_context(const ov::AnyMap& remote_properties) const override;

    ov::SoPtr<ov::IRemoteContext> get_default_context(const ov::AnyMap& remote_properties) const override;

    std::shared_ptr<ov::ICompiledModel> import_model(std::istream& model, const ov::AnyMap& properties) const override;

    std::shared_ptr<ov::ICompiledModel> import_model(std::istream& model,
                                                     const ov::SoPtr<ov::IRemoteContext>& context,
                                                     const ov::AnyMap& properties) const override;

    ov::SupportedOpsMap query_model(const std::shared_ptr<const ov::Model>& model,
                                    const ov::AnyMap& properties) const override;

private:
    friend class CompiledModel;
    friend class InferRequest;

    std::shared_ptr<ov::runtime::Backend> m_backend;
    Configuration m_cfg;
    std::shared_ptr<ov::threading::ITaskExecutor> m_waitExecutor;
};

}  // namespace template_plugin
}  // namespace ov

                                

クラスフィールド¶

提供されるプラグインクラスにはいくつかのフィールドがあります。

m_backend - モデル推論で実際の計算を実行するのに使用されるバックエンド・エンジン。テンプレート・プラグインでは、OpenVINO™ リファレンス実装を使用して計算を実行する ov::runtime::Backend が使用されます。
m_waitExecutor - デバイスからのデバイスタスクの完了応答を待つタスク・エグゼキューター。
m_cfg のタイプ構成:

                                        struct Configuration {
    Configuration();
    Configuration(const Configuration&) = default;
    Configuration(Configuration&&) = default;
    Configuration& operator=(const Configuration&) = default;
    Configuration& operator=(Configuration&&) = default;

    explicit Configuration(const ov::AnyMap& config,
                           const Configuration& defaultCfg = {},
                           const bool throwOnUnsupported = true);

    ov::Any Get(const std::string& name) const;

    // Plugin configuration parameters

    int device_id = 0;
    bool perf_count = false;
    ov::threading::IStreamsExecutor::Config streams_executor_config;
    int streams = 1;
    int threads = 0;
    int threads_per_stream = 0;
    ov::hint::PerformanceMode performance_mode = ov::hint::PerformanceMode::LATENCY;
    uint32_t num_requests = 1;
    bool disable_transformations = false;
    bool exclusive_async_requests = false;

    // unused
    ov::element::Type inference_precision = ov::element::undefined;
    ov::hint::ExecutionMode execution_mode = ov::hint::ExecutionMode::ACCURACY;
    ov::log::Level log_level = ov::log::Level::NO;

    ov::hint::Priority model_priority = ov::hint::Priority::DEFAULT;
};

                                    

例として、プラグイン設定には 3 つの値パラメーターがあります。

device_id - 使用する特定のデバイス ID。プラグインが複数のテンプレート・デバイスをサポートする場合に適用されます。この場合、set_property、query_model、compile_model など一部のプラグインメソッドでは、ov::device::id プロパティーをサポートする必要があります。
perf_counts - 推論要求の実行中にパフォーマンス・カウンターを収集するかどうかを識別するブール値。
streams_executor_config - マルチスレッド・コンテキストの設定を処理する ov::threading::IStreamsExecutor の構成。
performance_mode - パフォーマンス・モードを設定する ov::hint::PerformanceMode の構成。
disable_transformations - モデルのコンパイルのプロセスで適用される変換を無効にできます。
exclusive_async_requests - 非同期推論要求に排他的なタスク・エグゼキューターを使用できるようにします。

プラグイン・コンストラクター¶

プラグイン・コンストラクターには、テンプレート・タイプのデバイスを操作できるかチェックするコードが含まれている必要があります。例えば、複数のドライバーが必要な場合、コードはドライバーの可用性を確認する必要があります。ドライバーが利用できない場合は (例えば、GPU デバイスの場合は OpenCL* ランタイムがインストールされていない、またはホストマシン上に不適切なバージョンのドライバーがある)、プラグイン・コンストラクターから例外をスローする必要があります。

プラグインは、基本クラスの set_device_name() メソッドによって有効なデバイス名を定義する必要があります。

                                        ov::template_plugin::Plugin::Plugin() {
    // TODO: fill with actual device name, backend engine
    set_device_name("TEMPLATE");

    // create backend which performs inference using openvino reference implementations
    m_backend = ov::runtime::Backend::create();

    // create default stream executor with a given name
    m_waitExecutor = get_executor_manager()->get_idle_cpu_streams_executor({wait_executor_name});
}

                                    

プラグイン・デストラクター¶

プラグイン・デストラクターは、すべてのプラグイン・アクティビティーを停止し、割り当てられたすべてのリソースをクリーンアップします。

                                        ov::template_plugin::Plugin::~Plugin() {
    // Plugin should remove executors from executor cache to avoid threads number growth in the whole application
    get_executor_manager()->clear(stream_executor_name);
    get_executor_manager()->clear(wait_executor_name);
}

                                    

compile_model()¶

プラグインは 2 つの compile_model() メソッドを実装する必要があります: 1 つ目はリモート・コンテキストなしでモデルをコンパイルし、2 つ目はプラグインがサポートするリモート・コンテキストを使用してコンパイルします。

プラグインクラスで最も重要な機能は、バックエンドに依存するコンパイル済みモデルを内部表現に保持する、コンパイル済み CompiledModel のインスタンスを作成することです。

                                        std::shared_ptr<ov::ICompiledModel> ov::template_plugin::Plugin::compile_model(
    const std::shared_ptr<const ov::Model>& model,
    const ov::AnyMap& properties) const {
    return compile_model(model, properties, {});
}

                                    

                                        std::shared_ptr<ov::ICompiledModel> ov::template_plugin::Plugin::compile_model(
    const std::shared_ptr<const ov::Model>& model,
    const ov::AnyMap& properties,
    const ov::SoPtr<ov::IRemoteContext>& context) const {
    OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "Plugin::compile_model");

    auto fullConfig = Configuration{properties, m_cfg};
    fullConfig.streams_executor_config = ov::threading::IStreamsExecutor::Config{stream_executor_name,
                                                                                 fullConfig.streams,
                                                                                 fullConfig.threads_per_stream};
    auto streamsExecutorConfig =
        ov::threading::IStreamsExecutor::Config::make_default_multi_threaded(fullConfig.streams_executor_config);
    fullConfig.streams = streamsExecutorConfig.get_streams();
    fullConfig.threads = streamsExecutorConfig.get_threads();
    fullConfig.threads_per_stream = streamsExecutorConfig.get_threads_per_stream();
    auto compiled_model = std::make_shared<CompiledModel>(
        model->clone(),
        shared_from_this(),
        context,
        fullConfig.exclusive_async_requests
            ? get_executor_manager()->get_executor(template_exclusive_executor)
            : get_executor_manager()->get_idle_cpu_streams_executor(streamsExecutorConfig),
        fullConfig);
    return compiled_model;
}

                                    

プラグインは、コンストラクターを介して CompiledModel インスタンスを作成する前に、必要に応じて、提供された ov::Model オブジェクトがデバイスでサポートされていることを確認することがあります。

モデルの実際のコンパイルは CompiledModel コンストラクターで行われます。詳細は、CompiledModel 実装ガイドを参照してください。

注

CompiledModel で使用される設定マップは、Plugin::set_property を介して設定された基本プラグインとして構築されます。ここで、一部の値は Plugin::compile_model に渡される設定で上書きされます。したがって、Plugin::compile_model の設定が優先されます。

transform_model()¶

この関数は ov::Model オブジェクトへの const 共有ポインターを受け入れ、コピーされたモデルに共通のデバイス固有の変換を適用して、ハードウェア操作をより容易にします。カスタムのデバイス固有の変換を記述する方法の詳細は、OpenVINO™ 変換の記述ガイドを参照してください。モデル表現に関する詳細なトピックを参照してください。

                                        void transform_model(const std::shared_ptr<ov::Model>& model) {
    // Perform common optimizations and device-specific transformations
    ov::pass::Manager passManager;
    // Example: register CommonOptimizations transformation from transformations library
    passManager.register_pass<ov::pass::CommonOptimizations>();
    // Disable some transformations
    passManager.get_pass_config()->disable<ov::pass::UnrollIf>();
    // This transformation changes output name
    passManager.get_pass_config()->disable<ov::pass::ConvertReduceSumToPooling>();
    // Register any other transformations
    // ..

    const auto& pass_config = passManager.get_pass_config();

    // Allow FP16 Converts to be folded and FP16 constants to be upgraded to FP32 data type
    pass_config->disable<ov::pass::DisableDecompressionConvertConstantFolding>();
    pass_config->disable<ov::pass::ConvertCompressedOnlyToLegacy>();

    // After `run_passes`, we have the transformed function, where operations match device operations,
    // and we can create device backend-dependent graph
    passManager.run_passes(model);
}

                                    

注

これらすべての変換の後、ov::Model オブジェクトには、バックエンド・カーネルにマップできる操作が含まれます。例えば。バックエンドに A + B 操作を同時に計算するカーネルがある場合、transform_model 関数には、操作 A と B を、バックエンドのカーネルセットに適合する単一のカスタム操作 A + B に融合するパスが含まれている必要があります。

query_model()¶

HETERO モードでメソッドを使用すると、affinity キーを含む ov::Node::get_rt_info() マップに基づいて、異なるデバイス間でモデルの実行を分散できます。query_model メソッドは、提供されたモデルの操作を分析し、ov::SupportedOpsMap 構造体を介してサポートされている操作のリストを返します。query_model はまず、入力 ov::Model 引数に transform_model パスを適用します。その後、理想的には、変換されたモデルには、計算バックエンドのカーネルに 1:1 でマップされる操作のみが含まれます。この場合、どの操作が想定されているか (m_backend にその操作のカーネルがあるか、操作の拡張機能が提供されているか)、サポートされないのか (m_backend にカーネルが欠落している) を分析するのは容易です。

すべての操作の元の名前を入力 ov::Model に保存します。
transform_model パスを適用します。変換されたモデル内の操作名は異なる場合があるため、以下の手順でマッピングを復元する必要があることに注意してください。
元の操作の名前を含むサポートされるマップを構築します。推論は OpenVINO™ リファレンス・バックエンドで実行されるため、操作がサポートされるかどうかの決定は、最新の OpenVINO オプセットにその操作が含まれているかどうかによって決まることに注意してください。
ov.SupportedOpsMap には、m_backend によって完全にサポートされる操作のみが含まれます。

                                        ov::SupportedOpsMap ov::template_plugin::Plugin::query_model(const std::shared_ptr<const ov::Model>& model,
                                                             const ov::AnyMap& properties) const {
    OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "Plugin::query_model");

    Configuration fullConfig{properties, m_cfg, false};

    OPENVINO_ASSERT(model, "OpenVINO Model is empty!");

    auto supported = ov::get_supported_nodes(
        model,
        [&](std::shared_ptr<ov::Model>& model) {
            // skip transformations in case of user config
            if (fullConfig.disable_transformations)
                return;
            // 1. It is needed to apply all transformations as it is done in compile_model
            transform_model(model);
        },
        [&](std::shared_ptr<ov::Node> node) {
            // 2. Сheck whether node is supported
            ov::OpSet op_super_set;
#define _OPENVINO_OP_REG(NAME, NAMESPACE) op_super_set.insert<NAMESPACE::NAME>();
        // clang-format off
#include "openvino/opsets/opset1_tbl.hpp"
#include "openvino/opsets/opset2_tbl.hpp"
#include "openvino/opsets/opset3_tbl.hpp"
#include "openvino/opsets/opset4_tbl.hpp"
#include "openvino/opsets/opset5_tbl.hpp"
#include "openvino/opsets/opset6_tbl.hpp"
#include "openvino/opsets/opset7_tbl.hpp"
#include "openvino/opsets/opset8_tbl.hpp"
#include "openvino/opsets/opset9_tbl.hpp"
#include "openvino/opsets/opset10_tbl.hpp"
#include "openvino/opsets/opset11_tbl.hpp"
#include "openvino/opsets/opset12_tbl.hpp"
#include "openvino/opsets/opset13_tbl.hpp"
#include "openvino/opsets/opset14_tbl.hpp"
        // clang-format on
#undef _OPENVINO_OP_REG
            return op_super_set.contains_type(node->get_type_info());
        });

    // 3. Produce the result
    ov::SupportedOpsMap res;
    for (auto&& layerName : supported) {
        res.emplace(layerName, get_device_name() + "." + std::to_string(m_cfg.device_id));
    }

    return res;
}

                                    

set_property()¶

プラグインのプロパティーのキーに新しい値を設定します。

                                        void ov::template_plugin::Plugin::set_property(const ov::AnyMap& properties) {
    m_cfg = Configuration{properties, m_cfg};
}

                                    

上記の例では、Configuration クラスが以前の構成値を新しい値でオーバーライドしています。これらの値はすべて、バックエンド固有のモデルのコンパイルおよび推論要求の実行中に使用されます。

注

サポートされない構成キーを受け取った場合、関数は例外をスローする必要があります。

get_property()¶

指定されたプロパティーのキー値を返します。

                                        ov::Any ov::template_plugin::Plugin::get_property(const std::string& name, const ov::AnyMap& arguments) const {
    const auto& default_ro_properties = []() {
        std::vector<ov::PropertyName> ro_properties{ov::available_devices,
                                                    ov::supported_properties,
                                                    ov::device::full_name,
                                                    ov::device::architecture,
                                                    ov::device::capabilities,
                                                    ov::device::type,
                                                    ov::range_for_async_infer_requests,
                                                    ov::execution_devices};
        return ro_properties;
    };
    const auto& default_rw_properties = []() {
        std::vector<ov::PropertyName> rw_properties{ov::device::id,
                                                    ov::enable_profiling,
                                                    ov::hint::performance_mode,
                                                    ov::hint::num_requests,
                                                    ov::hint::inference_precision,
                                                    ov::hint::execution_mode,
                                                    ov::num_streams,
                                                    ov::template_plugin::disable_transformations,
                                                    ov::log::level};
        return rw_properties;
    };
    if (ov::supported_properties == name) {
        auto ro_properties = default_ro_properties();
        auto rw_properties = default_rw_properties();

        std::vector<ov::PropertyName> supported_properties;
        supported_properties.reserve(ro_properties.size() + rw_properties.size());
        supported_properties.insert(supported_properties.end(), ro_properties.begin(), ro_properties.end());
        supported_properties.insert(supported_properties.end(), rw_properties.begin(), rw_properties.end());
        return decltype(ov::supported_properties)::value_type(supported_properties);
    } else if (ov::internal::supported_properties == name) {
        return decltype(ov::internal::supported_properties)::value_type{
            ov::PropertyName{ov::internal::caching_properties.name(), ov::PropertyMutability::RO},
            ov::PropertyName{ov::internal::exclusive_async_requests.name(), ov::PropertyMutability::RW}};
    } else if (ov::available_devices == name) {
        // TODO: fill list of available devices
        std::vector<std::string> available_devices = {""};
        return decltype(ov::available_devices)::value_type(available_devices);
    } else if (ov::device::full_name == name) {
        std::string device_name = "Template Device Full Name";
        return decltype(ov::device::full_name)::value_type(device_name);
    } else if (ov::device::architecture == name) {
        // TODO: return device architecture for device specified by DEVICE_ID config
        std::string arch = get_device_name();
        return decltype(ov::device::architecture)::value_type(arch);
    } else if (ov::device::type == name) {
        return decltype(ov::device::type)::value_type(ov::device::Type::INTEGRATED);
    } else if (ov::internal::caching_properties == name) {
        std::vector<ov::PropertyName> caching_properties = {ov::device::architecture};
        return decltype(ov::internal::caching_properties)::value_type(caching_properties);
    } else if (ov::device::capabilities == name) {
        // TODO: fill actual list of supported capabilities: e.g. Template device supports only FP32 and EXPORT_IMPORT
        std::vector<std::string> capabilities = {ov::device::capability::FP32, ov::device::capability::EXPORT_IMPORT};
        return decltype(ov::device::capabilities)::value_type(capabilities);
    } else if (ov::execution_devices == name) {
        std::string dev = get_device_name();
        return decltype(ov::execution_devices)::value_type{dev};
    } else if (ov::range_for_async_infer_requests == name) {
        // TODO: fill with actual values
        using uint = unsigned int;
        return decltype(ov::range_for_async_infer_requests)::value_type(std::make_tuple(uint{1}, uint{1}, uint{1}));
    } else {
        return m_cfg.Get(name);
    }
}

                                    

この関数は、Configuration::Get メソッドで実装されており、実際の構成キーの値を ov::Any にラップして返します。

注

サポートされない構成キーを受け取った場合、関数は例外をスローする必要があります。

import_model()¶

コンパイル済みモデルのインポートメカニズムにより、以前にエクスポートしたバックエンド固有のモデルをインポートし、CompiledModel オブジェクトを使用してラップすることができます。この機能は、バックエンド固有のモデルのコンパイルに時間がかかる場合、または他の理由によりターゲット・ホスト・デバイス上でコンパイルを実行できない場合、あるいはその両方で役立ちます。

CompiledModel::export_model を使用してバックエンド固有のモデルをエクスポートする際、プラグインは、コンパイルされたモデルを適切にインポートし、その正当性をチェックするのに必要なタイプの情報をエクスポートできます。例えば、エクスポート情報には次のものが含まれる場合があります。

コンパイルオプション (Plugin::m_cfg 構造体の状態)。
プラグインとデバイスタイプに関する情報は、インポート中にこの情報を確認し、モデルストリームに誤ったデータが含まれている場合は例外をスローします。例えば、デバイスの機能が異なるため、特定のデバイス向けにコンパイルされたモデルを別のデバイスでは使用できない場合、タイプ情報を保存してインポート中にチェックする必要があります。
コンパイルされたバックエンド固有のモデル自身。

                                        std::shared_ptr<ov::ICompiledModel> ov::template_plugin::Plugin::import_model(std::istream& model,
                                                                              const ov::AnyMap& properties) const {
    return import_model(model, {}, properties);
}

                                    

                                        std::shared_ptr<ov::ICompiledModel> ov::template_plugin::Plugin::import_model(
    std::istream& model,
    const ov::SoPtr<ov::IRemoteContext>& context,
    const ov::AnyMap& properties) const {
    OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "Plugin::import_model");

    // check ov::loaded_from_cache property and erase it due to not needed any more.
    auto _properties = properties;
    const auto& it = _properties.find(ov::loaded_from_cache.name());
    bool loaded_from_cache = false;
    if (it != _properties.end()) {
        loaded_from_cache = it->second.as<bool>();
        _properties.erase(it);
    }

    auto fullConfig = Configuration{_properties, m_cfg};
    fullConfig.streams_executor_config = ov::threading::IStreamsExecutor::Config{stream_executor_name,
                                                                                 fullConfig.streams,
                                                                                 fullConfig.threads_per_stream};
    // read XML content
    std::string xmlString;
    std::uint64_t dataSize = 0;
    model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
    xmlString.resize(dataSize);
    model.read(const_cast<char*>(xmlString.c_str()), dataSize);

    // read blob content
    ov::Tensor weights;
    model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
    if (0 != dataSize) {
        weights = ov::Tensor(ov::element::from<char>(), ov::Shape{static_cast<ov::Shape::size_type>(dataSize)});
        model.read(weights.data<char>(), dataSize);
    }

    auto ov_model = get_core()->read_model(xmlString, weights);
    auto streamsExecutorConfig =
        ov::threading::IStreamsExecutor::Config::make_default_multi_threaded(fullConfig.streams_executor_config);
    fullConfig.streams = streamsExecutorConfig.get_streams();
    fullConfig.threads = streamsExecutorConfig.get_threads();
    fullConfig.threads_per_stream = streamsExecutorConfig.get_threads_per_stream();
    auto compiled_model =
        std::make_shared<CompiledModel>(ov_model,
                                        shared_from_this(),
                                        context,
                                        get_executor_manager()->get_idle_cpu_streams_executor(streamsExecutorConfig),
                                        fullConfig,
                                        loaded_from_cache);
    return compiled_model;
}

                                    

create_context()¶

プラグインは、プラグインがリモート・コンテキストをサポートする場合に ov::RemoteContext を返す Plugin::create_context() メソッドを実装する必要があります。それ以外では、プラグインはこのメソッドを実装していないことを示す例外をスローできます。

                                        ov::SoPtr<ov::IRemoteContext> ov::template_plugin::Plugin::create_context(const ov::AnyMap& remote_properties) const {
    return std::make_shared<ov::template_plugin::RemoteContext>();
}

                                    

get_default_context()¶

Plugin::get_default_context() は、プラグインがリモート・コンテキストをサポートする場合に必要ですが、サポートしていない場合、このメソッドは機能が実装されていないことを示す例外をスローする可能性があります。

                                        ov::SoPtr<ov::IRemoteContext> ov::template_plugin::Plugin::get_default_context(
    const ov::AnyMap& remote_properties) const {
    return std::make_shared<ov::template_plugin::RemoteContext>();
}

                                    

プラグインクラスのインスタンスを作成¶

OpenVINO プラグイン・ライブラリーは、OV_DEFINE_PLUGIN_CREATE_FUNCTION マクロを使用してプラグイン・インスタンスを作成する関数を 1 つだけエクスポートする必要があります。

                                    static const ov::Version version = {CI_BUILD_NUMBER, "openvino_template_plugin"};
OV_DEFINE_PLUGIN_CREATE_FUNCTION(ov::template_plugin::Plugin, version)

プラグイン・ライブラリー実装の次のステップは、CompiledModelクラスです。