LiteRT のご紹介: Google のオンデバイス AI 用の高性能ランタイム（旧称 TensorFlow Lite）です。

このページは Cloud Translation API によって翻訳されました。

LiteRT を使ってみる

このガイドでは、デバイスで LiteRT モデルを実行して、予測を行います。これを実現するには、LiteRT を使用します。インタープリター: 静的なグラフの順序付けとカスタムの（動的でない）負荷、初期化、実行のレイテンシを最小限に抑えることができます。

LiteRT 推論は通常、次の手順で行います。

モデルの読み込み: .tflite モデルをメモリに読み込みます。このメモリには、モデルの実行グラフです
データの変換: 入力データを所定の形式に変換し、定義できます。通常、モデルの未加工の入力データが入力と一致しないデータ形式を標準化できます。たとえば、メッセージのサイズをそのモデルに適合するように画像形式を変更します。
推論の実行: LiteRT モデルを実行して予測を行います。この LiteRT API を使用してモデルを実行します。いくつかの作業がインタプリタの構築やテンソルの割り当てなどのステップを実行するだけです
出力の解釈: 出力テンソルを意味のある方法で解釈します。役立ちますたとえば、モデルが特定の単語を確率のリストです。確率を関連する値にマッピングするのは任意です。出力の形式を設定します。

このガイドでは、LiteRT インタープリタにアクセスし、 C++、Java、Python を使用して推論を行います。

対応プラットフォーム

TensorFlow 推論 API は、最も一般的なモバイルデバイスおよび埋め込みプラットフォーム（Android、iOS、Linux など）複数のプログラミング言語。

ほとんどの場合、API 設計は使いやすさよりもパフォーマンスを優先します。あります。LiteRT は小型デバイスで高速な推論を行えるよう設計されているため、不要なコピーを作成して、利便性を犠牲にすることができます。

すべてのライブラリで、LiteRT API を使用すると、モデルを読み込み、入力をフィードして、推論出力を取得できます。

Android プラットフォーム

Android では、Java または C++ API を使用して LiteRT 推論を実行できます。「 Java API は便利で、Android Studio 内で直接使用できます。アクティビティクラス。C++ API は柔軟性とスピードに優れていますが、 Java レイヤと C++ レイヤの間でデータを移動する JNI ラッパーの作成。

詳しくは、C++ と Java のセクションをご覧ください。 Android クイックスタートに沿って操作してください。

iOS プラットフォーム

iOS では、LiteRT は次の言語で利用できます。 Swift および Objective-C 使用できます。また、C API 直接使用できます。

Swift、Objective-C、C API を確認する iOS クイックスタートに沿って操作してください。

Linux プラットフォーム

Linux プラットフォームでは、 C++。

モデルを読み込んで実行する

LiteRT モデルを読み込んで実行する手順は次のとおりです。

モデルをメモリに読み込む。
既存のモデルに基づいて Interpreter を作成します。
入力テンソル値の設定。
推論の呼び出し。
テンソル値の出力。

Android（Java）

LiteRT で推論を実行するための Java API は、主に次の目的で設計されています。そのため、Android ライブラリの依存関係として使用できます。 com.google.ai.edge.litert。

Java では、Interpreter クラスを使用してモデルを読み込み、モデルをドライブします。説明します。多くの場合、必要な API はこれだけです。

Interpreter は、FlatBuffers（.tflite）ファイルを使用して初期化できます。

public Interpreter(@NotNull File modelFile);

MappedByteBuffer を使用する場合:

public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);

どちらの場合も、有効な LiteRT モデルを指定する必要があります。指定しない場合、API がスローします。 IllegalArgumentException。MappedByteBuffer を使用して Interpreter は、存続期間中、変更されないままにする必要があります。 Interpreter。

モデルに対して推論を実行する際の推奨方法は、シグネチャを使用することです。 TensorFlow 2.5 以降に変換されたモデルの場合

try (Interpreter interpreter = new Interpreter(file_of_tensorflowlite_model)) {
  Map<String, Object> inputs = new HashMap<>();
  inputs.put("input_1", input1);
  inputs.put("input_2", input2);
  Map<String, Object> outputs = new HashMap<>();
  outputs.put("output_1", output1);
  interpreter.runSignature(inputs, outputs, "mySignature");
}

runSignature メソッドは、次の 3 つの引数を取ります。

Inputs : シグネチャ内の入力名から入力にマッピングする渡されます。
Outputs : 署名の出力名から出力への出力マッピングのマップ分析できます
Signature Name（省略可）: 署名の名前（署名は 1 つです）。

モデルにシグネチャが定義されていない場合に推論を実行する別の方法。 Interpreter.run() を呼び出すだけです。例:

try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.run(input, output);
}

run() メソッドは、入力を 1 つだけ受け取り、出力を 1 つだけ返します。もし複数の入力や複数の出力がある場合は、代わりに以下を使用します。

interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);

この場合、inputs の各エントリは入力テンソルに対応し、 map_of_indices_to_outputs は、出力テンソルのインデックスを対応する出力データです。

どちらの場合も、テンソルのインデックスは、モデルの作成時に LiteRT コンバータをインストールしておく必要があります。注意 input 内のテンソルの順序が、LiteRT に指定された順序と一致する必要があるコンバータ。

Interpreter クラスには、オペレーション名を使用した任意のモデル入力または出力のインデックス:

public int getInputIndex(String opName);
public int getOutputIndex(String opName);

opName がモデル内の有効な演算でない場合、 IllegalArgumentException。

また、Interpreter がリソースを所有していることにも注意してください。メモリリークを避けるために、リソースは使用後に解放する必要があります。

interpreter.close();

Java を使用したプロジェクトの例については、Android オブジェクト検出の例アプリ。

サポートされるデータタイプ

LiteRT を使用するには、入力テンソルと出力テンソルのデータ型が次のプリミティブ型:

float
int
long
byte

String 型もサポートされますが、エンコード方法はサポートしています。特に、文字列テンソルの形状によってテンソル内の文字列の配置です。各要素自体が可変長の文字列です。この意味で、テンソルの（バイト）サイズは形と型のみから計算されるため、文字列を変換できません。単一のフラットな ByteBuffer 引数として指定します。

Integer や Float などのボックス型を含む他のデータ型が使用されている場合、 IllegalArgumentException がスローされます。

入力

各入力は、サポートされている配列または多次元配列にするプリミティブ型、または適切なサイズの未加工の ByteBuffer です。入力が入力テンソルが 1 つの配列または多次元配列である場合、推論時に配列のディメンションに暗黙的にサイズ変更されます。入力が呼び出し側はまず、関連する入力のサイズを手動で変更する必要があります。推論を実行する前にテンソルを（Interpreter.resizeInput() を介して）出力します。

ByteBuffer を使用する場合は、直接バイトバッファを使用することをおすすめします。これにより、不要なコピーを回避する Interpreter。ByteBuffer がダイレクトバイトの場合順序は ByteOrder.nativeOrder() にする必要があります。一定期間にその状態を維持する必要があります。

出力

各出力は、サポートされているまたは適切なサイズの ByteBuffer で返されます。注意すべき点は動的な出力があり、出力テンソルの形状はモデルによってあります。既存の Java Inference API ですが、計画された拡張機能によってこれが可能になります。

iOS - Swift

Swift API Cocoapods の TensorFlowLiteSwift Pod で使用できます。

まず、TensorFlowLite モジュールをインポートする必要があります。

import TensorFlowLite

// Getting model path
guard
  let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite")
else {
  // Error handling...
}

do {
  // Initialize an interpreter with the model.
  let interpreter = try Interpreter(modelPath: modelPath)

  // Allocate memory for the model's input `Tensor`s.
  try interpreter.allocateTensors()

  let inputData: Data  // Should be initialized

  // input data preparation...

  // Copy the input data to the input `Tensor`.
  try self.interpreter.copy(inputData, toInputAt: 0)

  // Run inference by invoking the `Interpreter`.
  try self.interpreter.invoke()

  // Get the output `Tensor`
  let outputTensor = try self.interpreter.output(at: 0)

  // Copy output to `Data` to process the inference results.
  let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y})
  let outputData =
        UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize)
  outputTensor.data.copyBytes(to: outputData)

  if (error != nil) { /* Error handling... */ }
} catch error {
  // Error handling...
}

iOS（Objective-C）

Objective-C API Cocoapods の LiteRTObjC Pod で使用できます。

まず、TensorFlowLiteObjC モジュールをインポートする必要があります。

@import TensorFlowLite;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];
NSError *error;

// Initialize an interpreter with the model.
TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath
                                                                  error:&error];
if (error != nil) { /* Error handling... */ }

// Allocate memory for the model's input `TFLTensor`s.
[interpreter allocateTensorsWithError:&error];
if (error != nil) { /* Error handling... */ }

NSMutableData *inputData;  // Should be initialized
// input data preparation...

// Get the input `TFLTensor`
TFLTensor *inputTensor = [interpreter inputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy the input data to the input `TFLTensor`.
[inputTensor copyData:inputData error:&error];
if (error != nil) { /* Error handling... */ }

// Run inference by invoking the `TFLInterpreter`.
[interpreter invokeWithError:&error];
if (error != nil) { /* Error handling... */ }

// Get the output `TFLTensor`
TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy output to `NSData` to process the inference results.
NSData *outputData = [outputTensor dataWithError:&error];
if (error != nil) { /* Error handling... */ }

Objective-C コードの C API

Objective-C API はデリゲートをサポートしていません。代理人を使用するには、 Objective-C コードを使用する場合、基盤となる C API。

#include "tensorflow/lite/c/c_api.h"

TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]);
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();

// Create the interpreter.
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);

// Allocate tensors and populate the input tensor data.
TfLiteInterpreterAllocateTensors(interpreter);
TfLiteTensor* input_tensor =
    TfLiteInterpreterGetInputTensor(interpreter, 0);
TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
                           input.size() * sizeof(float));

// Execute inference.
TfLiteInterpreterInvoke(interpreter);

// Extract the output tensor data.
const TfLiteTensor* output_tensor =
    TfLiteInterpreterGetOutputTensor(interpreter, 0);
TfLiteTensorCopyToBuffer(output_tensor, output.data(),
                         output.size() * sizeof(float));

// Dispose of the model and interpreter objects.
TfLiteInterpreterDelete(interpreter);
TfLiteInterpreterOptionsDelete(options);
TfLiteModelDelete(model);

C++

LiteRT で推論を実行するための C++ API は、Android、iOS、実行できます。iOS の C++ API は、bazel を使用している場合にのみ使用できます。

C++ では、モデルは FlatBufferModel クラス。 LiteRT モデルがカプセル化されており、いくつかの異なるモデルでモデルの保存場所に応じて、いくつかの方法があります。

class FlatBufferModel {
  // Build a model based on a file. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromFile(
      const char* filename,
      ErrorReporter* error_reporter);

  // Build a model based on a pre-loaded flatbuffer. The caller retains
  // ownership of the buffer and should keep it alive until the returned object
  // is destroyed. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
      const char* buffer,
      size_t buffer_size,
      ErrorReporter* error_reporter);
};

モデルが FlatBufferModel オブジェクトとして作成されたので、モデルを実行できます。を Interpreter。 1 つの FlatBufferModel を複数のユーザーが同時に使用できる Interpreter。

Interpreter API の重要な部分をコードスニペットに示しますご覧ください次の点に注意してください。

文字列の比較を避けるため、テンソルは整数で表される（および文字列ライブラリへの固定の依存関係）を使用します。
同時スレッドからインタープリタにアクセスすることはできません。
入力テンソルと出力テンソルのメモリ割り当ては、次の呼び出しによってトリガーする必要があります。テンソルのサイズを変更した直後の AllocateTensors()。

C++ で LiteRT を最も簡単に使用すると、次のようになります。

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
    tflite::FlatBufferModel::BuildFromFile(filename);

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if needed.
interpreter->AllocateTensors();

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

interpreter->Invoke();

float* output = interpreter->typed_output_tensor<float>(0);

その他のサンプルコードについては、 minimal.cc および label_image.cc。

Python

推論を実行するための Python API は、 Interpreter: モデルを読み込む推論を実行できます

LiteRT パッケージをインストールします。

$ python3 -m pip install ai-edge-litert

LiteRT インタープリタをインポートする

from ai_edge_litert.interpreter import Interpreter
Interpreter = Interpreter(model_path=args.model.file)

次の例は、Python インタープリタを使用して FlatBuffers（.tflite）ファイルを使用して、ランダムな入力データで推論を実行します。

この例は、定義した値を使用して SavedModel から変換する場合に推奨されます。 SignatureDef。

class TestModel(tf.Module):
  def __init__(self):
    super(TestModel, self).__init__()

  @tf.function(input_signature=[tf.TensorSpec(shape=[1, 10], dtype=tf.float32)])
  def add(self, x):
    '''
    Simple method that accepts single input 'x' and returns 'x' + 4.
    '''
    # Name the output 'result' for convenience.
    return {'result' : x + 4}

SAVED_MODEL_PATH = 'content/saved_models/test_variable'
TFLITE_FILE_PATH = 'content/test_variable.tflite'

# Save the model
module = TestModel()
# You can omit the signatures argument and a default signature name will be
# created with name 'serving_default'.
tf.saved_model.save(
    module, SAVED_MODEL_PATH,
    signatures={'my_signature':module.add.get_concrete_function()})

# Convert the model using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
tflite_model = converter.convert()
with open(TFLITE_FILE_PATH, 'wb') as f:
  f.write(tflite_model)

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)

# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# my_signature is callable with input as arguments.
output = my_signature(x=tf.constant([1.0], shape=(1,10), dtype=tf.float32))
# 'output' is dictionary with all outputs from the inference.
# In this case we have single output 'result'.
print(output['result'])

別の例は、モデルに SignatureDefs が定義されていない場合です。

import numpy as np
import tensorflow as tf

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

モデルを変換済みの .tflite ファイルとして読み込む代わりに、次の操作を行います。デベロッパーは、コードを LiteRT コンパイラを使用すると、Keras モデルを LiteRT 形式に変換してから、推論:

import numpy as np
import tensorflow as tf

img = tf.keras.Input(shape=(64, 64, 3), name="img")
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
val = img + const
out = tf.identity(val, name="out")

# Convert to LiteRT format
converter = tf.lite.TFLiteConverter.from_keras_model(tf.keras.models.Model(inputs=[img], outputs=[out]))
tflite_model = converter.convert()

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

# Continue to get tensors and so forth, as shown above...

その他の Python サンプルコードについては、 label_image.py。

動的形状モデルを使用して推論を実行する

動的な入力シェイプでモデルを実行する場合は、入力シェイプのサイズを変更する推論を実行できますそうしないと、TensorFlow モデルの None 形状が LiteRT モデルでは 1 のプレースホルダに置き換えられます。

次の例は、実行前に入力シェイプのサイズを変更する方法を示しています。さまざまな言語での推論を行えます。どの例でも、入力の形状が [1/None, 10] として定義されているため、[3, 10] にサイズ変更する必要があります。

C++ の例

// Resize input tensors before allocate tensors
interpreter->ResizeInputTensor(/*tensor_index=*/0, std::vector<int>{3,10});
interpreter->AllocateTensors();

Python の例:

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_path=TFLITE_FILE_PATH)

# Resize input shape for dynamic shape model and allocate tensor
interpreter.resize_tensor_input(interpreter.get_input_details()[0]['index'], [3, 10])
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()