This is a TFX-component that allows its users to apply a user code to a schema produced by the SchemaGen component, and curate it based on domain knowledge. It fits seamlessly into the ML-pipline made with TFX, and allows schema manipulation based on a module file provided by the User.
To run locally: taxi_example_local.py TO run in colab: taxi_example_colab.ipynb
The custom component takes for input the user module file, and the schema generated by the SchemaGen component on the specified data.
On running the component, it outputs the modified schema based on the code provided in the module file.
The Schema Curation component provides a solution to curating the schema based on user knowledge. As a user, you only have to define a single function called the schema_fn
. in schema_fn
you define a series of funcitons that manipulate the input schema to produce the required one.
An example is:
def schema_fn(schema):
"""modifies the infered schema.
Args:
schema:schema generated by SchemaGen component of tfx
"""
#changing "tips" into optional feature
feature = tfdv.get_feature(schema, 'tips')
feature.presence.min_fraction = 0.9
return schema
schemacomponent
├── component
│ ├── component.py
│ ├── component_test.py
│ ├── executor.py
| ├── executor_test.py
│ ├── __init__.py
├── CONTRIBUTING.md
├── example
│ ├── __init__.py
│ ├── module_file.py
│ ├── taxi_example_colab.ipynb
│ ├── taxi_example_local.py
├── __init__.py
├── PROPOSAL.md
└── README.md
The project follows the structure specified by the TFX documentation for a TFX fully custom component.
The SchemaCurationSpec
class defines the input, output and execution parameters required by the component.
The Executor
class defines the functioning of the component, a subclass of the base_executor.BaseExecutor
with the overriden Do
function.
Finally the SchemaCuration
class integrates the fully custom component into the ML pipeline.
The component includes separate unit tests for the component and the executor.
Schema Curation Custom Component was made as a part of TFX-Addons through the Outreachy program. You may view the linked Pull Request in TFX-Addons here and the issue here for relevant discussions related to the project.
- Robert Crowe
- Thea Lamkin
- Josh Gordon
- Pratishtha Abrol (Team Leader)
- Fatimah Adwan
- Kshitijaa Jaglan
- Nirzari Gupta