Operator Factory

See OperatorFactory

For an example, see the getting started guide.

Re-Interpreting Data

Ultimately, each object is created by its constructor being called. Since each factory and component is fundamentally a pydantic.BaseModel object, its fields are defined at the class level, but how those fields get populated is customizable.

Let’s say you have an old class like this::

class Url(OperatorComponent):
    url: str

and then, at some point, you decide you want to update it to store the pieces of the url, like this::

class Url(OperatorComponent):
    scheme: str
    host: str
    path: str
    params: Dict[str, Any] = {}

To avoid breaking existing code or data, you can write a custom constructor to permit old-style data definitions to continue to work::

class Url(OperatorComponent):
    ...

    def __init__(self, url):
        parsed = urllib.parse.urlparse(url)
        super().__init__(scheme=parsed.scheme, host=parsed.host, ...)

This allows separating the input data definition for a type from its internal representation.

Note that, in many cases, pydantic may already have features that allow a variety of tweaks without needing a custom constructor. Single- and multi-field validation and transformation, amongst other features, are well-supported.

Components

OperatorFactory defines a standard interface for how the operator should be constructed from the metadata, and provides the task_id attribute by default to facilitate that. OperatorComponent, on the other hand, is currently just a proxy for pydantic.BaseModel to encourage a consistent pattern for sharing data and validation across a DAG. This may change in the future.

Task ID’s and Groups

OperatorFactory provides some pre-baked tooling to help define Task IDs.

  • OperatorFactory inherently has an optional task_id: Optional[str] attribute which can be set in data. To disallow using this, you can define a validator function that ensures this is empty.

  • The default_task_id() property can be overridden to provide a default value if no task ID is specified. This is recommended to implement where practical.

  • The get_task_id() method provides either the custom task ID, if provided, or the default if possible.

If implementing an operator pattern wrapped as a group (using _make_operators()), then the task ID is used as the group ID, and the inner operator task ID’s need only be unique within that context.