TAPAS と OpenVINO™ を使用したテーブル質問応答

この Jupyter ノートブックはオンラインで起動でき、ブラウザーのウィンドウで対話型環境を開きます。ローカルにインストールすることもできます。次のオプションのいずれかを選択します。

Google Colab GitHub

テーブル質問応答 (テーブル QA) は、特定のテーブルの情報に関する質問に答えることです。テーブル質問応答モデルを使用すると、テーブルを入力して SQL 実行をシミュレートできます。

このチュートリアルでは、OpenVINO を使用してテーブル質問応答を実行する方法を示します。この例は、論文 TAPAS: Weakly Supervised Table Parsing via Pre-training に基づいて、WikiTable question (WTQ) で微調整された TAPAS 基本モデルをベースにしています。

テーブルに関する自然言語の質問に答えることは、通常、意味解析タスクと見なされます。完全な論理形式の収集コストを軽減する一般的なアプローチの 1 つは、論理形式ではなく指示で構成される弱い監視に焦点を当てています。ただし、弱い監視からセマンティック・パーサーをトレーニングするには困難が伴い、さらに、生成された論理形式は、指示を取得する前の中間ステップとしてのみ使用されます。この論文では、論理形式を生成せずにテーブル上で質問に応答するアプローチである TAPAS について説明しています。TAPAS は弱い監視からトレーニングし、表のセルを選択し、必要に応じてその選択に対応する集計オペレーターを適用することによって意味を予測します。TAPAS は、BERT アーキテクチャーを拡張してテーブルを入力としてエンコードし、Wikipedia からクロールされたテキストセグメントとテーブルの事前トレーニングから初期化し、エンドツーエンドでトレーニングされます。

目次

必要条件

%pip install -q torch "transformers>=4.31.0" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q "openvino>=2023.2.0" "gradio>=4.0.2"
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Note: you may need to restart the kernel to use updated packages.
DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Note: you may need to restart the kernel to use updated packages.
import torch
from transformers import TapasForQuestionAnswering
from transformers import TapasTokenizer
from transformers import pipeline
import pandas as pd
2024-02-10 00:34:04.609886: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-02-10 00:34:04.644206: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-10 00:34:05.295529: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

TapasForQuestionAnswering.from_pretrained を使用して事前トレーニング済みモデルをダウンロードし、TapasTokenizer.from_pretrained を使用してトークナイザーを取得します。

model = TapasForQuestionAnswering.from_pretrained('google/tapas-large-finetuned-wtq')
tokenizer = TapasTokenizer.from_pretrained("google/tapas-large-finetuned-wtq")

data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
question = "how many movies does Leonardo Di Caprio have?"
table
Actors Number of movies
0 Brad Pitt 87
1 Leonardo Di Caprio 53
2 George Clooney 69

元のモデルを使用して推論を実行

この例を使用して、推論を行う方法を示します。この目的のために、Transformers ライブラリーのパイプラインを使用できます。

tqa = pipeline(task="table-question-answering", model=model, tokenizer=tokenizer)
result = tqa(table=table, query=question)
print(f"The answer is {result['cells'][0]}")
The answer is 53

推論出力構造の詳細については、このドキュメントを参照してください。

元のモデルを OpenVINO 中間表現 (IR) 形式に変換

元のモデルは PyTorch モジュールであり、ov.convert_model 関数を使用して直接変換できます。ov.save_model 関数を使用して変換結果をシリアル化します。

import openvino as ov
from pathlib import Path


# Define the input shape
batch_size = 1
sequence_length = 29

# Modify the input shape of the dummy_input dictionary
dummy_input = {
    "input_ids": torch.zeros((batch_size, sequence_length), dtype=torch.long),
    "attention_mask": torch.zeros((batch_size, sequence_length), dtype=torch.long),
    "token_type_ids": torch.zeros((batch_size, sequence_length, 7), dtype=torch.long),
}


ov_model_xml_path = Path('models/ov_model.xml')

if not ov_model_xml_path.exists():
    ov_model = ov.convert_model(
        model,
        example_input=dummy_input
    )
    ov.save_model(ov_model, ov_model_xml_path)
WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11.
[ WARNING ]  Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s.
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1600: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  self.indices = torch.as_tensor(indices)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1601: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  self.num_segments = torch.as_tensor(num_segments, device=indices.device)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1703: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  batch_size = torch.prod(torch.tensor(list(index.batch_shape())))
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1779: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  [torch.as_tensor([-1], dtype=torch.long), torch.as_tensor(vector_shape, dtype=torch.long)], dim=0
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1782: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  flat_values = values.reshape(flattened_shape.tolist())
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1784: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  out = torch.zeros(int(flat_index.num_segments), dtype=torch.float, device=flat_values.device)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1792: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.as_tensor(index.batch_shape(), dtype=torch.long),
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1793: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.as_tensor([index.num_segments], dtype=torch.long),
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1794: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.as_tensor(vector_shape, dtype=torch.long),
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1799: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  output_values = segment_means.clone().view(new_shape.tolist()).to(values.dtype)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1730: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  batch_shape = torch.as_tensor(
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1734: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  num_segments = torch.as_tensor(num_segments)  # create a rank 0 tensor (scalar) containing num_segments (e.g. 64)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1745: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  new_shape = [int(x) for x in new_tensor.tolist()]
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1748: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  multiples = torch.cat([batch_shape, torch.as_tensor([1])], dim=0)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1749: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  indices = indices.repeat(multiples.tolist())
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:316: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.as_tensor(self.config.max_position_embeddings - 1, device=device), position - first_position
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1260: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  indices=torch.min(row_ids, torch.as_tensor(self.config.max_num_rows - 1, device=row_ids.device)),
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1265: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  indices=torch.min(column_ids, torch.as_tensor(self.config.max_num_columns - 1, device=column_ids.device)),
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1957: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  column_logits += CLOSE_ENOUGH_TO_LOG_ZERO * torch.as_tensor(
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1962: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  column_logits += CLOSE_ENOUGH_TO_LOG_ZERO * torch.as_tensor(
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1998: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  labels_per_column, _ = reduce_sum(torch.as_tensor(labels, dtype=torch.float32, device=labels.device), col_index)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2021: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.as_tensor(labels, dtype=torch.long, device=labels.device), cell_index
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2028: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  column_mask = torch.as_tensor(
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2053: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  selected_column_id = torch.as_tensor(
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2058: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  selected_column_mask = torch.as_tensor(

OpenVINO モデルの実行

OpenVINO を使用して推論を実行するデバイスをドロップダウン・リストから選択します。

import ipywidgets as widgets

core = ov.Core()

device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value='AUTO',
    description='Device:',
    disabled=False,
)

device
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

ov.compile_model を使用して、デバイスへのロードに使用できるようにします。入力を準備するには、元のトークナイザーを使用します。

inputs = tokenizer(table=table, queries=question, padding="max_length", return_tensors="pt")

compiled_model = core.compile_model(ov_model_xml_path, device.value)
result = compiled_model((inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"]))

そして、結果を後処理する必要があります。これには、TableQuestionAnsweringPipeline後処理メソッドのコードの適切なパートを使用できます。

logits = result[0]
logits_aggregation = result[1]


predictions = tokenizer.convert_logits_to_predictions(inputs, torch.from_numpy(result[0]))
answer_coordinates_batch = predictions[0]
aggregators = {}
aggregators_prefix = {}
answers = []
for index, coordinates in enumerate(answer_coordinates_batch):
    cells = [table.iat[coordinate] for coordinate in coordinates]
    aggregator = aggregators.get(index, "")
    aggregator_prefix = aggregators_prefix.get(index, "")
    answer = {
        "answer": aggregator_prefix + ", ".join(cells),
        "coordinates": coordinates,
        "cells": [table.iat[coordinate] for coordinate in coordinates],
    }
    if aggregator:
        answer["aggregator"] = aggregator

    answers.append(answer)

print(answers[0]["cells"][0])
53

また、オリジナルのパイプラインを使用することもできます。これには、OpenVINO モデルを推論に使用するための forward メソッドを置き換える TapasForQuestionAnswering クラスのラッパーを作成し、元のモデルクラスのメソッドと属性をパイプラインに統合する必要があります。

from transformers import TapasConfig


# get config for pretrained model
config = TapasConfig.from_pretrained('google/tapas-large-finetuned-wtq')



class TapasForQuestionAnswering(TapasForQuestionAnswering):  # it is better to keep the class name to avoid warnings
    def __init__(self, ov_model_path):
        super().__init__(config)  # pass config from the pretrained model
        self.tqa_model = core.compile_model(ov_model_path, device.value)

    def forward(self, input_ids, *, attention_mask, token_type_ids):
        results = self.tqa_model((input_ids, attention_mask, token_type_ids))

        return torch.from_numpy(results[0]), torch.from_numpy(results[1])


compiled_model = TapasForQuestionAnswering(ov_model_xml_path)
tqa = pipeline(task="table-question-answering", model=compiled_model, tokenizer=tokenizer)
print(tqa(table=table, query=question)["cells"][0])
53

インタラクティブな推論

import urllib.request

import gradio as gr
import pandas as pd


urllib.request.urlretrieve(
    url="https://github.com/openvinotoolkit/openvino_notebooks/files/13215688/eu_city_population_top10.csv",
    filename="eu_city_population_top10.csv"
)


def display_table(csv_file_name):
    table = pd.read_csv(csv_file_name.name, delimiter=",")
    table = table.astype(str)

    return table


def highlight_answers(x, coordinates):
    highlighted_table = pd.DataFrame('', index=x.index, columns=x.columns)
    for coordinates_i in coordinates:
        highlighted_table.iloc[coordinates_i[0], coordinates_i[1]] = "background-color: lightgreen"

    return highlighted_table


def infer(query, csv_file_name):
    table = pd.read_csv(csv_file_name.name, delimiter=",")
    table = table.astype(str)

    result = tqa(table=table, query=query)
    table = table.style.apply(highlight_answers, axis=None, coordinates=result["coordinates"])

    return result["answer"], table


with gr.Blocks(title="TAPAS Table Question Answering") as demo:
    with gr.Row():
        with gr.Column():
            search_query = gr.Textbox(label="Search query")
            csv_file = gr.File(label="CSV file")
            infer_button = gr.Button("Submit", variant="primary")
        with gr.Column():
            answer = gr.Textbox(label="Result")
            result_csv_file = gr.Dataframe(label="All data")

    examples = [
        ["What is the city with the highest population that is not a capital?", "eu_city_population_top10.csv"],
        ["In which country is Madrid?", "eu_city_population_top10.csv"],
        ["In which cities is the population greater than 2,000,000?", "eu_city_population_top10.csv"],
    ]
    gr.Examples(examples, inputs=[search_query, csv_file])

    # Callbacks
    csv_file.upload(display_table, inputs=csv_file, outputs=result_csv_file)
    csv_file.select(display_table, inputs=csv_file, outputs=result_csv_file)
    csv_file.change(display_table, inputs=csv_file, outputs=result_csv_file)
    infer_button.click(infer, inputs=[search_query, csv_file], outputs=[answer, result_csv_file])

try:
    demo.queue().launch(debug=False)
except Exception:
    demo.queue().launch(share=True, debug=False)
Running on local URL:  http://127.0.0.1:7860

To create a public link, set share=True in launch().