flytekitplugins.spark.schema
Implements how Classic SparkDataFrame should be read using the open method of FlyteSchema
class ClassicSparkDataFrameSchemaReader(
from_path: str,
cols: typing.Optional[typing.Dict[str, type]],
fmt: <enum 'SchemaFormat'>,
)
| Parameter |
Type |
Description |
from_path |
str |
|
cols |
typing.Optional[typing.Dict[str, type]] |
|
fmt |
<enum 'SchemaFormat'> |
|
| Property |
Type |
Description |
column_names |
None |
|
from_path |
None |
|
def all(
kwargs,
) -> pyspark.sql.classic.dataframe.DataFrame
| Parameter |
Type |
Description |
kwargs |
**kwargs |
|
def iter(
kwargs,
) -> typing.Generator[~T, NoneType, NoneType]
| Parameter |
Type |
Description |
kwargs |
**kwargs |
|
Implements how Classic SparkDataFrame should be written using open method of FlyteSchema
class ClassicSparkDataFrameSchemaWriter(
to_path: str,
cols: typing.Optional[typing.Dict[str, type]],
fmt: <enum 'SchemaFormat'>,
)
| Parameter |
Type |
Description |
to_path |
str |
|
cols |
typing.Optional[typing.Dict[str, type]] |
|
fmt |
<enum 'SchemaFormat'> |
|
| Property |
Type |
Description |
column_names |
None |
|
to_path |
None |
|
def write(
dfs: pyspark.sql.classic.dataframe.DataFrame,
kwargs,
)
| Parameter |
Type |
Description |
dfs |
pyspark.sql.classic.dataframe.DataFrame |
|
kwargs |
**kwargs |
|
Transforms Classic Spark DataFrame’s to and from a Schema (typed/untyped)
def ClassicSparkDataFrameTransformer()
| Property |
Type |
Description |
is_async |
None |
|
name |
None |
|
python_type |
None |
This returns the python type |
type_assertions_enabled |
None |
Indicates if the transformer wants type assertions to be enabled at the core type engine layer |
def assert_type(
t: Type[T],
v: T,
)
| Parameter |
Type |
Description |
t |
Type[T] |
|
v |
T |
|
def from_binary_idl(
binary_idl_object: Binary,
expected_python_type: Type[T],
) -> Optional[T]
This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.`
For untyped dict, dataclass, and pydantic basemodel:
Life Cycle (Untyped Dict as example):
python val -> msgpack bytes -> binary literal scalar -> msgpack bytes -> python val
(to_literal) (from_binary_idl)
For attribute access:
Life Cycle:
python val -> msgpack bytes -> binary literal scalar -> resolved golang value -> binary literal scalar -> msgpack bytes -> python val
(to_literal) (propeller attribute access) (from_binary_idl)
| Parameter |
Type |
Description |
binary_idl_object |
Binary |
|
expected_python_type |
Type[T] |
|
def from_generic_idl(
generic: Struct,
expected_python_type: Type[T],
) -> Optional[T]
TODO: Support all Flyte Types.
This is for dataclass attribute access from input created from the Flyte Console.
Note:
- This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.
| Parameter |
Type |
Description |
generic |
Struct |
|
expected_python_type |
Type[T] |
|
def get_literal_type(
t: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType
Converts the python type to a Flyte LiteralType
| Parameter |
Type |
Description |
t |
typing.Type[pyspark.sql.classic.dataframe.DataFrame] |
|
def guess_python_type(
literal_type: LiteralType,
) -> Type[T]
Converts the Flyte LiteralType to a python object type.
| Parameter |
Type |
Description |
literal_type |
LiteralType |
|
def isinstance_generic(
obj,
generic_alias,
)
| Parameter |
Type |
Description |
obj |
|
|
generic_alias |
|
|
def schema_match(
schema: dict,
) -> bool
Check if a JSON schema fragment matches this transformer’s python_type.
For BaseModel subclasses, automatically compares the schema’s title, type, and
required fields against the type’s own JSON schema. For other types, returns
False by default — override if needed.
| Parameter |
Type |
Description |
schema |
dict |
|
def to_html(
ctx: FlyteContext,
python_val: T,
expected_python_type: Type[T],
) -> str
Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div
| Parameter |
Type |
Description |
ctx |
FlyteContext |
|
python_val |
T |
|
expected_python_type |
Type[T] |
|
def to_literal(
ctx: flytekit.core.context_manager.FlyteContext,
python_val: pyspark.sql.classic.dataframe.DataFrame,
python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal
Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these
do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating
what was the mismatch
| Parameter |
Type |
Description |
ctx |
flytekit.core.context_manager.FlyteContext |
A FlyteContext, useful in accessing the filesystem and other attributes |
python_val |
pyspark.sql.classic.dataframe.DataFrame |
The actual value to be transformed |
python_type |
typing.Type[pyspark.sql.classic.dataframe.DataFrame] |
The assumed type of the value (this matches the declared type on the function) |
expected |
flytekit.models.types.LiteralType |
Expected Literal Type |
def to_python_value(
ctx: flytekit.core.context_manager.FlyteContext,
lv: flytekit.models.literals.Literal,
expected_python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> ~T
Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised
| Parameter |
Type |
Description |
ctx |
flytekit.core.context_manager.FlyteContext |
FlyteContext |
lv |
flytekit.models.literals.Literal |
The received literal Value |
expected_python_type |
typing.Type[pyspark.sql.classic.dataframe.DataFrame] |
Expected native python type that should be returned |
Implements how SparkDataFrame should be read using the open method of FlyteSchema
class SparkDataFrameSchemaReader(
from_path: str,
cols: typing.Optional[typing.Dict[str, type]],
fmt: <enum 'SchemaFormat'>,
)
| Parameter |
Type |
Description |
from_path |
str |
|
cols |
typing.Optional[typing.Dict[str, type]] |
|
fmt |
<enum 'SchemaFormat'> |
|
| Property |
Type |
Description |
column_names |
None |
|
from_path |
None |
|
def all(
kwargs,
) -> pyspark.sql.dataframe.DataFrame
| Parameter |
Type |
Description |
kwargs |
**kwargs |
|
def iter(
kwargs,
) -> typing.Generator[~T, NoneType, NoneType]
| Parameter |
Type |
Description |
kwargs |
**kwargs |
|
Implements how SparkDataFrame should be written to using open method of FlyteSchema
class SparkDataFrameSchemaWriter(
to_path: str,
cols: typing.Optional[typing.Dict[str, type]],
fmt: <enum 'SchemaFormat'>,
)
| Parameter |
Type |
Description |
to_path |
str |
|
cols |
typing.Optional[typing.Dict[str, type]] |
|
fmt |
<enum 'SchemaFormat'> |
|
| Property |
Type |
Description |
column_names |
None |
|
to_path |
None |
|
def write(
dfs: pyspark.sql.dataframe.DataFrame,
kwargs,
)
| Parameter |
Type |
Description |
dfs |
pyspark.sql.dataframe.DataFrame |
|
kwargs |
**kwargs |
|
Transforms Spark DataFrame’s to and from a Schema (typed/untyped)
def SparkDataFrameTransformer()
| Property |
Type |
Description |
is_async |
None |
|
name |
None |
|
python_type |
None |
This returns the python type |
type_assertions_enabled |
None |
Indicates if the transformer wants type assertions to be enabled at the core type engine layer |
def assert_type(
t: Type[T],
v: T,
)
| Parameter |
Type |
Description |
t |
Type[T] |
|
v |
T |
|
def from_binary_idl(
binary_idl_object: Binary,
expected_python_type: Type[T],
) -> Optional[T]
This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.`
For untyped dict, dataclass, and pydantic basemodel:
Life Cycle (Untyped Dict as example):
python val -> msgpack bytes -> binary literal scalar -> msgpack bytes -> python val
(to_literal) (from_binary_idl)
For attribute access:
Life Cycle:
python val -> msgpack bytes -> binary literal scalar -> resolved golang value -> binary literal scalar -> msgpack bytes -> python val
(to_literal) (propeller attribute access) (from_binary_idl)
| Parameter |
Type |
Description |
binary_idl_object |
Binary |
|
expected_python_type |
Type[T] |
|
def from_generic_idl(
generic: Struct,
expected_python_type: Type[T],
) -> Optional[T]
TODO: Support all Flyte Types.
This is for dataclass attribute access from input created from the Flyte Console.
Note:
- This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.
| Parameter |
Type |
Description |
generic |
Struct |
|
expected_python_type |
Type[T] |
|
def get_literal_type(
t: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType
Converts the python type to a Flyte LiteralType
| Parameter |
Type |
Description |
t |
typing.Type[pyspark.sql.dataframe.DataFrame] |
|
def guess_python_type(
literal_type: LiteralType,
) -> Type[T]
Converts the Flyte LiteralType to a python object type.
| Parameter |
Type |
Description |
literal_type |
LiteralType |
|
def isinstance_generic(
obj,
generic_alias,
)
| Parameter |
Type |
Description |
obj |
|
|
generic_alias |
|
|
def schema_match(
schema: dict,
) -> bool
Check if a JSON schema fragment matches this transformer’s python_type.
For BaseModel subclasses, automatically compares the schema’s title, type, and
required fields against the type’s own JSON schema. For other types, returns
False by default — override if needed.
| Parameter |
Type |
Description |
schema |
dict |
|
def to_html(
ctx: FlyteContext,
python_val: T,
expected_python_type: Type[T],
) -> str
Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div
| Parameter |
Type |
Description |
ctx |
FlyteContext |
|
python_val |
T |
|
expected_python_type |
Type[T] |
|
def to_literal(
ctx: flytekit.core.context_manager.FlyteContext,
python_val: pyspark.sql.dataframe.DataFrame,
python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal
Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these
do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating
what was the mismatch
| Parameter |
Type |
Description |
ctx |
flytekit.core.context_manager.FlyteContext |
A FlyteContext, useful in accessing the filesystem and other attributes |
python_val |
pyspark.sql.dataframe.DataFrame |
The actual value to be transformed |
python_type |
typing.Type[pyspark.sql.dataframe.DataFrame] |
The assumed type of the value (this matches the declared type on the function) |
expected |
flytekit.models.types.LiteralType |
Expected Literal Type |
def to_python_value(
ctx: flytekit.core.context_manager.FlyteContext,
lv: flytekit.models.literals.Literal,
expected_python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> ~T
Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised
| Parameter |
Type |
Description |
ctx |
flytekit.core.context_manager.FlyteContext |
FlyteContext |
lv |
flytekit.models.literals.Literal |
The received literal Value |
expected_python_type |
typing.Type[pyspark.sql.dataframe.DataFrame] |
Expected native python type that should be returned |