Skip to content

Latest commit

 

History

History

schema_curation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Schema Curation Custom Component

Python TensorFlow

This is a TFX-component that allows its users to apply a user code to a schema produced by the SchemaGen component, and curate it based on domain knowledge. It fits seamlessly into the ML-pipline made with TFX, and allows schema manipulation based on a module file provided by the User.

Usage

Examples demonstrating how to use Schema curation component

To run locally: taxi_example_local.py TO run in colab: taxi_example_colab.ipynb

Documentation

Inputs:

The custom component takes for input the user module file, and the schema generated by the SchemaGen component on the specified data.

Output:

On running the component, it outputs the modified schema based on the code provided in the module file.

Module file

The Schema Curation schema_fn:

The Schema Curation component provides a solution to curating the schema based on user knowledge. As a user, you only have to define a single function called the schema_fn. in schema_fn you define a series of funcitons that manipulate the input schema to produce the required one.

An example is:

def schema_fn(schema):
  """modifies the infered schema.
  Args:
    schema:schema generated by SchemaGen component of tfx
  """
  #changing "tips" into optional feature
  feature = tfdv.get_feature(schema, 'tips') 
  feature.presence.min_fraction = 0.9
  
  return schema

Project Structure

Directory Structure

schemacomponent
├── component
│   ├── component.py
│   ├── component_test.py
│   ├── executor.py
|   ├── executor_test.py
│   ├── __init__.py
├── CONTRIBUTING.md
├── example
│   ├── __init__.py
│   ├── module_file.py
│   ├── taxi_example_colab.ipynb
│   ├── taxi_example_local.py
├── __init__.py
├── PROPOSAL.md
└── README.md

The project follows the structure specified by the TFX documentation for a TFX fully custom component.

The SchemaCurationSpec class defines the input, output and execution parameters required by the component.

The Executor class defines the functioning of the component, a subclass of the base_executor.BaseExecutor with the overriden Do function.

Finally the SchemaCuration class integrates the fully custom component into the ML pipeline.

Unit Tests

The component includes separate unit tests for the component and the executor.

Credits

Schema Curation Custom Component was made as a part of TFX-Addons through the Outreachy program. You may view the linked Pull Request in TFX-Addons here and the issue here for relevant discussions related to the project.

The Team:

Mentors:

  • Robert Crowe
  • Thea Lamkin
  • Josh Gordon

Interns: