Since the LiteRT builtin operator library only supports a limited number of TensorFlow operators, not every model is convertible. For details, refer to operator compatibility.
To allow conversion, users can provide their own custom implementation of an unsupported TensorFlow operator in LiteRT, known as a custom operator. If instead, you wish to combine a series of unsupported (or supported) TensorFlow operators into a single fused optimized custom operator, refer to operator fusing.
Using custom operators consists of four steps.
Create a TensorFlow Model. Make sure the Saved Model (or Graph Def) refers to the correctly named LiteRT operator.
Convert to a LiteRT Model. Make sure you set the right LiteRT converter attribute in order to successfully convert the model.
Create and register the operator. This is so that the LiteRT runtime knows how to map your operator and parameters in your graph to executable C/C++ code.
Test and profile your operator. If you wish to test just your custom operator, it is best to create a model with just your custom operator and use the benchmark_model program.
Let’s walk through an end-to-end example of running a model with a custom
operator tf.atan
(named as Atan
, refer to Create a TensorFlow Model.) which
is supported in TensorFlow, but unsupported in LiteRT.
The TensorFlow Text operator is an example of a custom operator. See the Convert TF Text to LiteRT tutorial for a code example.
Example: Custom Atan
operator
Let’s walk through an example of supporting a TensorFlow operator that
LiteRT does not have. Assume we are using the Atan
operator and that
we are building a very simple model for a function y = atan(x + offset)
, where
offset
is trainable.
Create a TensorFlow Model
The following code snippet trains a simple TensorFlow model. This model just
contains a custom operator named Atan
, which is a function y = atan(x +
offset)
, where offset
is trainable.
import tensorflow as tf
# Define training dataset and variables
x = [-8, 0.5, 2, 2.2, 201]
y = [-1.4288993, 0.98279375, 1.2490457, 1.2679114, 1.5658458]
offset = tf.Variable(0.0)
# Define a simple model which just contains a custom operator named `Atan`
@tf.function(input_signature=[tf.TensorSpec.from_tensor(tf.constant(x))])
def atan(x):
return tf.atan(x + offset, name="Atan")
# Train model
optimizer = tf.optimizers.Adam(0.01)
def train(x, y):
with tf.GradientTape() as t:
predicted_y = atan(x)
loss = tf.reduce_sum(tf.square(predicted_y - y))
grads = t.gradient(loss, [offset])
optimizer.apply_gradients(zip(grads, [offset]))
for i in range(1000):
train(x, y)
print("The actual offset is: 1.0")
print("The predicted offset is:", offset.numpy())
The actual offset is: 1.0
The predicted offset is: 0.99999905
At this point, if you try to generate a LiteRT model with the default converter flags, you will get the following error message:
Error:
error: 'tf.Atan' op is neither a custom op nor a flex op.
Convert to a LiteRT Model
Create a LiteRT model with custom operators, by setting the converter
attribute allow_custom_ops
as shown below:
converter = tf.lite.TFLiteConverter.from_concrete_functions([atan.get_concrete_function()], atan) converter.allow_custom_ops = True tflite_model = converter.convert()
At this point, if you run it with the default interpreter using commands such as follows:
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
You will still get the error:
Encountered unresolved custom op: Atan.
Create and register the operator.
#include "third_party/tensorflow/lite/c/c_api.h"
#include "third_party/tensorflow/lite/c/c_api_opaque.h"
LiteRT custom operators are defined using a simple pure-C API that
consists of an opaque type (TfLiteRegistrationExternal
) and related functions.
TfLiteRegistrationExternal
is an opaque type:
typedef struct TfLiteRegistrationExternal TfLiteRegistrationExternal;
TfLiteRegistrationExternal
stores the operator's identity and implementation.
(Note that the operator is distinct from its operands, which are stored in the
LiteRT graph nodes for nodes that call the operator.)
Instances of this type are constructed with calls to
TfLiteRegistrationExternalCreate
and can be destroyed by calling
TfLiteRegistrationExternalDelete
.
The operator's identity is set via the parameters to the constructor function
TfLiteRegistrationExternalCreate
:
TfLiteRegistrationExternal*
TfLiteRegistrationExternalCreate(
TfLiteBuiltinOperator builtin_code, // Normally `TfLiteBuiltinCustom`.
const char* custom_name, // The name of the custom op.
int version // Normally `1` for the first version of a custom op.
);
The operator implementation can define "methods" with the following signatures.
All of these methods are optional, but for an operator to be successfully
evaluated, the operator implementation needs to define and set (using the setter
functions) at least the Prepare
and Invoke
methods.
// Initializes the op from serialized data.
void* Init(TfLiteOpaqueContext* context, const char* buffer, size_t length);
// Deallocates the op.
// The pointer `buffer` is the data previously returned by an Init invocation.
void Free(TfLiteOpaqueContext* context, void* buffer);
// Called when the inputs that this node depends on have been resized.
TfLiteStatus Prepare(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node);
// Called when the node is executed. (Should read node inputs and write to
// node outputs).
TfLiteStatus Invoke(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node);
// Retrieves the async kernel.
TfLiteAsyncKernel AsyncKernel(TfLiteOpaqueContext* context,
TfLiteOpaqueNode* node);
The function names (or namespace prefixes, for C++) in your op implementation don't have to match the function names in the above code snippet, since the TF Lite custom ops API will only use their addresses. Indeed we recommend that you declare them in an anonymous namespace or as static functions.
But it is a good idea to include your operator name as a namespace or prefix on these function names:
C++
namespace my_namespace::my_custom_op { void* Init(TfLiteOpaqueContext* context, const char* buffer, size_t length) { ... } // ... plus definitions of Free, Prepare, and Invoke ... }
C
void* MyCustomOpInit(TfLiteOpaqueContext* context, const char* buffer, size_t length) { ... } // ... plus definitions of MyCustomOpFree, MyCustomOpPrepare, and // MyCustomOpInvoke.
Since this is a C API, these "methods" are implemented as C function pointers in
the TfLiteRegistrationExternal
type, which are set by passing the addresses of
your implementation functions to the corresponding setter functions
TfLiteRegistrationExternalSet
MethodName:
void TfLiteRegistrationExternalSetInit(
TfLiteRegistrationExternal* registration,
void* (*init)(TfLiteOpaqueContext* context, const char* buffer,
size_t length));
void TfLiteRegistrationExternalSetFree(
TfLiteRegistrationExternal* registration,
void (*free)(TfLiteOpaqueContext* context, void* data));
void TfLiteRegistrationExternalSetPrepare(
TfLiteRegistrationExternal* registration,
TfLiteStatus (*prepare)(TfLiteOpaqueContext* context,
TfLiteOpaqueNode* node));
void TfLiteRegistrationExternalSetInvoke(
TfLiteRegistrationExternal* registration,
TfLiteStatus (*invoke)(TfLiteOpaqueContext* context,
TfLiteOpaqueNode* node));
void TfLiteRegistrationExternalSetAsyncKernel(
TfLiteRegistrationExternal* registration,
struct TfLiteAsyncKernel* (*async_kernel)(TfLiteOpaqueContext* context,
TfLiteOpaqueNode* node));
Refer to
common.h
for details on TfLiteContext
and TfLiteNode
. TfLiteContext
provides error
reporting facilities and access to global objects, including all the tensors.
TfLiteNode
allows operator implementations to access their inputs and outputs.
When the interpreter loads a model, it calls the Init()
method once for each
node in the graph. A given Init()
will be called more than once if the op is
used multiple times in the graph. For custom ops a configuration buffer will be
provided, containing a flexbuffer that maps parameter names to their values. The
buffer is empty for builtin ops because the interpreter has already parsed the
op parameters. Kernel implementations that require state should initialize it
here and transfer ownership to the caller. For each Init()
call, there will be
a corresponding call to Free()
, allowing implementations to dispose of the
buffer they might have allocated in Init()
.
Whenever the input tensors are resized, the interpreter will go through the
graph notifying implementations of the change. This gives them the chance to
resize their internal buffer, check validity of input shapes and types, and
recalculate output shapes. This is all done through the Prepare()
method, and
implementations can access their state using
TfLiteOpaqueNodeGetUserData(node)
.
Finally, each time inference runs, the interpreter traverses the graph calling
the Invoke()
method, and here too the state is available as
TfLiteOpaqueNodeGetUserData(node)
.
Custom ops can be implemented by defining those "method" functions, and then
defining a function that returns an instance of TfLiteRegistrationExternal
constructed by calling TfLiteRegistrationExternalCreate
and then the relevant
setter methods:
C++
namespace my_namespace::my_custom_op { namespace { void* Init(TfLiteOpaqueContext* context, const char* buffer, size_t length) { ... } void Free(TfLiteOpaqueContext* context, void* buffer) { ... } TfLiteStatus Prepare(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { ... } TfLiteStatus Invoke(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) {... } }; const TfLiteRegistrationExternal* MyCustomOpRegistrationExternal() { // Singleton instance, intentionally never destroyed. static const TfLiteRegistrationExternal* my_custom_op = ()[] { TfLiteRegistrationExternal* r = TfLiteRegistrationExternalCreate( kTfLiteBuiltinCustom, "MyCustomOp", /*version=*/ 1); TfLiteRegistrationExternalSetInit(r, Init); TfLiteRegistrationExternalSetFree(r, Free); TfLiteRegistrationExternalSetPrepare(r, Prepare); TfLiteRegistrationExternalSetInvoke(r, Eval); return r; }; return my_custom_op; } const TfLiteRegistration* MyCustomOpRegistration() { static const TfLiteRegistration my_custom_op { .registration_external = MyCustomOpRegistrationExternal(); }; return my_custom_op; } } // namespace my_namespace
C
static void* MyCustomOpInit(TfLiteOpaqueContext* context, const char* buffer, size_t length) { ... } static void MyCustomOpFree(TfLiteOpaqueContext* context, void* buffer) { ... } static TfLiteStatus MyCustomOpPrepare(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { ... } static TfLiteStatus MyCustomOpInvoke(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) {... } static TfLiteRegistrationExternal* MyCustomOpCreate() { const TfLiteRegistrationExternal* r = TfLiteRegistrationExternalCreate( kTfLiteBuiltinCustom, "MyCustomOp", /*version=*/ 1); TfLiteRegistrationExternalSetInit(r, MyCustomOpInit); TfLiteRegistrationExternalSetFree(r, MyCustomOpFree); TfLiteRegistrationExternalSetPrepare(r, MyCustomOpPrepare); TfLiteRegistrationExternalSetInvoke(r, MyCustomOpEval); return r; } const TfLiteRegistrationExternal* MyCustomOpRegistrationExternal() { // Singleton instance, intentionally never destroyed. static const TfLiteRegistrationExternal* my_custom_op = MyCustomOpCreate(); return my_custom_op; } const TfLiteRegistration MyCustomOpRegistration() { static const TfLiteRegistration my_custom_op { .registration_external = MyCustomOpRegistrationExternal(); }; return my_custom_op; }
Note that registration is not automatic and an explicit call to your
MyCustomOpRegistration
function should be made (see details below). While the
standard BuiltinOpResolver
(available from the :builtin_ops
target) takes
care of the registration of builtins, custom ops will have to be collected in
separate custom libraries.
Defining the kernel in the LiteRT runtime
All we need to do to use the op in LiteRT is define two functions
(Prepare
and Eval
), and a third to construct a TfLiteRegistrationExternal
:
C++
namespace atan_op { namespace { TfLiteStatus AtanPrepare(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { TF_LITE_OPAQUE_ENSURE_EQ(context, TfLiteOpaqueNodeNumInputs(node), 1); TF_LITE_OPAQUE_ENSURE_EQ(context, TfLiteOpaqueNodeNumOutputs(node), 1); const TfLiteOpaqueTensor* input = TfLiteOpaqueNodeGetInput(context, node, 0); TfLiteOpaqueTensor* output = TfLiteOpaqueNodeGetOutput(context, node, 0); int num_dims = TfLiteOpaqueTensorNumDimensions(input); TfLiteIntArray* output_size = TfLiteIntArrayCreate(num_dims); for (int i=0; i < num_dims; ++i) { output_size->data[i] = input->dims->data[i]; } return TfLiteOpaqueContextResizeTensor(context, output, output_size); } TfLiteStatus AtanEval(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { const TfLiteOpaqueTensor* input = TfLiteOpaqueNodeGetInput(context, node, 0); TfLiteOpaqueTensor* output = TfLiteOpaqueNodeGetOutput(context, node, 0); float* input_data = static_cast<float*>(TfLiteOpaqueTensorData(input)); float* output_data = static_cast<float*>(TfLiteOpaqueTensorData(output)); size_t count = 1; int num_dims = TfLiteOpaqueTensorNumDimensions(input); for (int i = 0; i < num_dims; ++i) { count *= input->dims->data[i]; } for (size_t i = 0; i < count; ++i) { output_data[i] = atan(input_data[i]); } return kTfLiteOk; } } // anonymous namespace const TfLiteRegistrationExternal* AtanOpRegistrationExternal() { // Singleton instance, intentionally never destroyed. static const TfLiteRegistrationExternal* atan_op = ()[] { auto* r = TfLiteRegistrationExternalCreate( kTfLiteBuiltinCustom, "ATAN", /*version=*/ 1); TfLiteRegistrationExternalSetPrepare(r, Prepare); TfLiteRegistrationExternalSetInvoke(r, Eval); return r; }; return atan_op; } const TfLiteRegistration AtanOpRegistration() { static const TfLiteRegistration atan_op { .registration_external = AtanOpRegistrationExternal(); }; return atan_op; } } // namespace atan_op
C
static TfLiteStatus AtanPrepare(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { TF_LITE_OPAQUE_ENSURE_EQ(context, TfLiteOpaqueNodeNumInputs(node), 1); TF_LITE_OPAQUE_ENSURE_EQ(context, TfLiteOpaqueNodeNumOutputs(node), 1); const TfLiteOpaqueTensor* input = TfLiteOpaqueNodeGetInput(context, node, 0); TfLiteOpaqueTensor* output = TfLiteOpaqueNodeGetOutput(context, node, 0); int num_dims = TfLiteOpaqueTensorNumDimensions(input); TfLiteIntArray* output_size = TfLiteIntArrayCreate(num_dims); for (int i = 0; i < num_dims; ++i) { output_size->data[i] = input->dims->data[i]; } return TfLiteOpaqueContextResizeTensor(context, output, output_size); } static TfLiteStatus AtanEval(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { const TfLiteOpaqueTensor* input = TfLiteOpaqueNodeGetInput(context, node, 0); TfLiteOpaqueTensor* output = TfLiteOpaqueNodeGetOutput(context, node, 0); float* input_data = static_cast<float*>(TfLiteOpaqueTensorData(input)); float* output_data = static_cast<float*>(TfLiteOpaqueTensorData(output)); size_t count = 1; int num_dims = TfLiteOpaqueTensorNumDimensions(input); for (int i = 0; i < num_dims; ++i) { count *= input->dims->data[i]; } for (size_t i = 0; i < count; ++i) { output_data[i] = atan(input_data[i]); } return kTfLiteOk; } static const TfLiteRegistrationExternal* AtanOpCreate() { TfLiteRegistrationExternal* r = TfLiteRegistrationExternalCreate( kTfLiteBuiltinCustom, "ATAN", /*version=*/ 1); TfLiteRegistrationExternalSetPrepare(r, Prepare); TfLiteRegistrationExternalSetInvoke(r, Eval); return r; } const TfLiteRegistrationExternal* AtanOpRegistrationExternal() { // Singleton instance, intentionally never destroyed. static const TfLiteRegistrationExternal* atan_op = AtanOpCreate(); return atan_op; } const TfLiteRegistration AtanOpRegistration() { static const TfLiteRegistration atan_op { .registration_external = AtanOpRegistrationExternal(); }; return atan_op; }
When initializing the OpResolver
, add the custom op into the resolver (see
below for an example). This will register the operator with LiteRT so
that LiteRT can use the new implementation. Note that the last two
arguments in TfLiteRegistration
correspond to the AtanPrepare
and AtanEval
functions you defined for the custom op. If you used AtanInit
and AtanFree
functions to initialize variables used in the op and to free up space,
respectively, then they would be added to the first two arguments of
TfLiteRegistration
; those arguments are set to nullptr
in this example.
Register the operator with the kernel library
Now we need to register the operator with the kernel library. This is done with
an OpResolver
. Behind the scenes, the interpreter will load a library of
kernels which will be assigned to execute each of the operators in the model.
While the default library only contains builtin kernels, it is possible to
replace/augment it with a custom library op operators.
The OpResolver
class, which translates operator codes and names into actual
code, is defined like this:
class OpResolver {
public:
virtual TfLiteRegistration* FindOp(tflite::BuiltinOperator op) const = 0;
virtual TfLiteRegistration* FindOp(const char* op) const = 0;
...
};
Note that for backwards compatibility, this class uses the older concrete type
TfLiteRegistration
rather than the opaque type TfLiteRegistrationExternal
,
but the TfLiteRegistration
struct contains a registration_external
field of
type TfLiteRegistrationExternal*
.
The MutableOpResolver
and BuiltinOpResolver
classes are derived from
OpResolver
:
class MutableOpResolver : public OpResolver {
public:
MutableOpResolver(); // Constructs an initially empty op resolver.
void AddBuiltin(tflite::BuiltinOperator op, const TfLiteRegistration* registration) = 0;
void AddCustom(const char* op, const TfLiteRegistration* registration) = 0;
void AddAll(const MutableOpResolver& other);
...
};
class BuiltinOpResolver : public MutableOpResolver {
public:
BuiltinOpResolver(); // Constructs an op resolver with all the builtin ops.
};
Regular usage (without custom ops) requires that you use the BuiltinOpResolver
and write:
tflite::ops::builtin::BuiltinOpResolver resolver;
To add the custom op created above, you can instead use a MutableOpResolver
,
and call AddCustom
(before you pass the resolver to the
InterpreterBuilder
):
tflite::ops::builtin::MutableOpResolver resolver;
resolver.AddAll(tflite::ops::builtin::BuiltinOpResolver());
resolver.AddCustom("Atan", AtanOpRegistration());
If the set of builtin ops is deemed to be too large, a new OpResolver
could be
code-generated based on a given subset of ops, possibly only the ones contained
in a given model. This is the equivalent of TensorFlow's selective registration
(and a simple version of it is available in the tools
directory).
If you want to define your custom operators in Java, you would currently need to build your own custom JNI layer and compile your own AAR in this jni code. Similarly, if you wish to define these operators available in Python you can place your registrations in the Python wrapper code.
Note that a similar process as above can be followed for supporting a set of
operations instead of a single operator. Just add as many AddCustom
operators
as you need. In addition, MutableOpResolver
also allows you to override
implementations of builtins by using AddBuiltin
.
Test and profile your operator
To profile your op with the LiteRT benchmark tool, you can use the
benchmark model tool
for LiteRT. For testing purposes, you can make your local build of
LiteRT aware of your custom op by adding the appropriate AddCustom
call (as show above) to
register.cc
Best practices
Optimize memory allocations and de-allocations cautiously. Allocating memory in
Prepare
is more efficient than inInvoke
, and allocating memory before a loop is better than in every iteration. Use temporary tensors data rather than mallocing yourself (see item 2). Use pointers/references instead of copying as much as possible.If a data structure will persist during the entire operation, we advise pre-allocating the memory using temporary tensors. You may need to use an OpData struct to reference the tensor indices in other functions. See the example in the kernel for convolution. A sample code snippet is below.
struct MyOpData { int temp_tensor_index; ... }; void* Init(TfLiteOpaqueContext* context, const char* buffer, size_t length) { auto* op_data = new MyOpData{}; ... return op_data; } void Free(TfLiteOpaqueContext* context, void* buffer) { ... delete reinterpret_cast<MyOpData*>(buffer); } TfLiteStatus Prepare(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { ... auto* op_data = reinterpret_cast<MyOpData*>(TfLiteOpaqueNodeGetUserData(node)); const int num_temporaries = 1; int temporary_tensor_indices[num_temporaries]; TfLiteOpaqueTensorBuilder* builder = TfLiteOpaqueTensorBuilderCreate(); TfLiteOpaqueTensorBuilderSetType(builder, kTfLiteFloat32); TfLiteOpaqueTensorBuilderSetAllocationType(builder, kTfLiteArenaRw); TfLiteOpaqueContextAddTensor(context, builder, &temporary_tensor_indices[0]); TfLiteOpaqueTensorBuilderDelete(builder); TfLiteOpaqueNodeSetTemporaries(node, temporary_tensor_indices, num_temporaries); op_data->temp_tensor_index = temporary_tensor_indices[0]; ... return kTfLiteOk; } TfLiteStatus Invoke(TfLiteOpaqueContext* context, TfLiteOpaqueNode* node) { ... auto* op_data = reinterpret_cast<MyOpData*>( TfLiteOpaqueNodeGetUserData(node)); TfLiteOpaqueTensor* temp_tensor = TfLiteOpaqueContextGetOpaqueTensor(context, op_data->temp_tensor_index); TF_LITE_OPAQUE_ENSURE(context, TfLiteTensorType(temp_tensor) == kTfLiteFloat32); TF_LITE_OPAQUE_ENSURE(context, TfLiteTensorGetAllocationType(temp_Tensor) == kTfLiteArenaRw); void *temp_data = TfLiteTensorData(temp_tensor); TF_LITE_OPAQUE_ENSURE(context, temp_data != nullptr); ... return kTfLiteOk; }
If it doesn't cost too much wasted memory, prefer using a static fixed size array (or a pre-allocated
std::vector
inResize
) rather than using a dynamically allocatedstd::vector
every iteration of execution.Avoid instantiating standard library container templates that don't already exist, because they affect binary size. For example, if you need a
std::map
in your operation that doesn't exist in other kernels, using astd::vector
with direct indexing mapping could work while keeping the binary size small. See what other kernels use to gain insight (or ask).Check the pointer to the memory returned by
malloc
. If this pointer isnullptr
, no operations should be performed using that pointer. If youmalloc
in a function and have an error exit, deallocate memory before you exit.Use
TF_LITE_OPAQUE_ENSURE(context, condition)
to check for a specific condition. Your code must not leave memory hanging whenTF_LITE_OPAQUE_ENSURE
is used, i.e., these macros should be used before any resources are allocated that will leak.