Operator Factory
See OperatorFactory
For an example, see the getting started guide.
Re-Interpreting Data
Ultimately, each object is created by its constructor being called. Since each factory and component is fundamentally a pydantic.BaseModel
object, its fields are defined at the class level, but how those fields get populated is customizable.
Let’s say you have an old class like this::
class Url(OperatorComponent):
url: str
and then, at some point, you decide you want to update it to store the pieces of the url, like this::
class Url(OperatorComponent):
scheme: str
host: str
path: str
params: Dict[str, Any] = {}
To avoid breaking existing code or data, you can write a custom constructor to permit old-style data definitions to continue to work::
class Url(OperatorComponent):
...
def __init__(self, url):
parsed = urllib.parse.urlparse(url)
super().__init__(scheme=parsed.scheme, host=parsed.host, ...)
This allows separating the input data definition for a type from its internal representation.
Note that, in many cases, pydantic
may already have features that allow a variety of tweaks without needing a custom constructor. Single- and multi-field validation and transformation, amongst other features, are well-supported.
Components
OperatorFactory
defines a standard interface for how the operator should be constructed from the metadata, and provides the task_id
attribute by default to facilitate that. OperatorComponent
, on the other hand, is currently just a proxy for pydantic.BaseModel
to encourage a consistent pattern for sharing data and validation across a DAG. This may change in the future.
Task ID’s and Groups
OperatorFactory
provides some pre-baked tooling to help define Task IDs.
OperatorFactory
inherently has an optionaltask_id: Optional[str]
attribute which can be set in data. To disallow using this, you can define a validator function that ensures this is empty.The
default_task_id()
property can be overridden to provide a default value if no task ID is specified. This is recommended to implement where practical.The
get_task_id()
method provides either the custom task ID, if provided, or the default if possible.
If implementing an operator pattern wrapped as a group (using _make_operators()
), then the task ID is used as the group ID, and the inner operator task ID’s need only be unique within that context.