1.16.15

flytekitplugins.spark.schema

Directory

Classes

Class Description
ClassicSparkDataFrameSchemaReader Implements how Classic SparkDataFrame should be read using the open method of FlyteSchema.
ClassicSparkDataFrameSchemaWriter Implements how Classic SparkDataFrame should be written using open method of FlyteSchema.
ClassicSparkDataFrameTransformer Transforms Classic Spark DataFrame’s to and from a Schema (typed/untyped).
SparkDataFrameSchemaReader Implements how SparkDataFrame should be read using the open method of FlyteSchema.
SparkDataFrameSchemaWriter Implements how SparkDataFrame should be written to using open method of FlyteSchema.
SparkDataFrameTransformer Transforms Spark DataFrame’s to and from a Schema (typed/untyped).

flytekitplugins.spark.schema.ClassicSparkDataFrameSchemaReader

Implements how Classic SparkDataFrame should be read using the open method of FlyteSchema

class ClassicSparkDataFrameSchemaReader(
    from_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
Parameter Type Description
from_path str
cols typing.Optional[typing.Dict[str, type]]
fmt <enum 'SchemaFormat'>

Properties

Property Type Description
column_names None
from_path None

Methods

Method Description
all()
iter()

all()

def all(
    kwargs,
) -> pyspark.sql.classic.dataframe.DataFrame
Parameter Type Description
kwargs **kwargs

iter()

def iter(
    kwargs,
) -> typing.Generator[~T, NoneType, NoneType]
Parameter Type Description
kwargs **kwargs

flytekitplugins.spark.schema.ClassicSparkDataFrameSchemaWriter

Implements how Classic SparkDataFrame should be written using open method of FlyteSchema

class ClassicSparkDataFrameSchemaWriter(
    to_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
Parameter Type Description
to_path str
cols typing.Optional[typing.Dict[str, type]]
fmt <enum 'SchemaFormat'>

Properties

Property Type Description
column_names None
to_path None

Methods

Method Description
write()

write()

def write(
    dfs: pyspark.sql.classic.dataframe.DataFrame,
    kwargs,
)
Parameter Type Description
dfs pyspark.sql.classic.dataframe.DataFrame
kwargs **kwargs

flytekitplugins.spark.schema.ClassicSparkDataFrameTransformer

Transforms Classic Spark DataFrame’s to and from a Schema (typed/untyped)

def ClassicSparkDataFrameTransformer()

Properties

Property Type Description
is_async None
name None
python_type None This returns the python type
type_assertions_enabled None Indicates if the transformer wants type assertions to be enabled at the core type engine layer

Methods

Method Description
assert_type()
from_binary_idl() This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.
from_generic_idl() TODO: Support all Flyte Types.
get_literal_type() Converts the python type to a Flyte LiteralType.
guess_python_type() Converts the Flyte LiteralType to a python object type.
isinstance_generic()
schema_match() Check if a JSON schema fragment matches this transformer’s python_type.
to_html() Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div.
to_literal() Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
to_python_value() Converts the given Literal to a Python Type.

assert_type()

def assert_type(
    t: Type[T],
    v: T,
)
Parameter Type Description
t Type[T]
v T

from_binary_idl()

def from_binary_idl(
    binary_idl_object: Binary,
    expected_python_type: Type[T],
) -> Optional[T]

This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.`

For untyped dict, dataclass, and pydantic basemodel: Life Cycle (Untyped Dict as example): python val -> msgpack bytes -> binary literal scalar -> msgpack bytes -> python val (to_literal) (from_binary_idl)

For attribute access: Life Cycle: python val -> msgpack bytes -> binary literal scalar -> resolved golang value -> binary literal scalar -> msgpack bytes -> python val (to_literal) (propeller attribute access) (from_binary_idl)

Parameter Type Description
binary_idl_object Binary
expected_python_type Type[T]

from_generic_idl()

def from_generic_idl(
    generic: Struct,
    expected_python_type: Type[T],
) -> Optional[T]

TODO: Support all Flyte Types. This is for dataclass attribute access from input created from the Flyte Console.

Note:

  • This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.
Parameter Type Description
generic Struct
expected_python_type Type[T]

get_literal_type()

def get_literal_type(
    t: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType

Converts the python type to a Flyte LiteralType

Parameter Type Description
t typing.Type[pyspark.sql.classic.dataframe.DataFrame]

guess_python_type()

def guess_python_type(
    literal_type: LiteralType,
) -> Type[T]

Converts the Flyte LiteralType to a python object type.

Parameter Type Description
literal_type LiteralType

isinstance_generic()

def isinstance_generic(
    obj,
    generic_alias,
)
Parameter Type Description
obj
generic_alias

schema_match()

def schema_match(
    schema: dict,
) -> bool

Check if a JSON schema fragment matches this transformer’s python_type.

For BaseModel subclasses, automatically compares the schema’s title, type, and required fields against the type’s own JSON schema. For other types, returns False by default — override if needed.

Parameter Type Description
schema dict

to_html()

def to_html(
    ctx: FlyteContext,
    python_val: T,
    expected_python_type: Type[T],
) -> str

Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div

Parameter Type Description
ctx FlyteContext
python_val T
expected_python_type Type[T]

to_literal()

def to_literal(
    ctx: flytekit.core.context_manager.FlyteContext,
    python_val: pyspark.sql.classic.dataframe.DataFrame,
    python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
    expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal

Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type. Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating what was the mismatch

Parameter Type Description
ctx flytekit.core.context_manager.FlyteContext A FlyteContext, useful in accessing the filesystem and other attributes
python_val pyspark.sql.classic.dataframe.DataFrame The actual value to be transformed
python_type typing.Type[pyspark.sql.classic.dataframe.DataFrame] The assumed type of the value (this matches the declared type on the function)
expected flytekit.models.types.LiteralType Expected Literal Type

to_python_value()

def to_python_value(
    ctx: flytekit.core.context_manager.FlyteContext,
    lv: flytekit.models.literals.Literal,
    expected_python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> ~T

Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised

Parameter Type Description
ctx flytekit.core.context_manager.FlyteContext FlyteContext
lv flytekit.models.literals.Literal The received literal Value
expected_python_type typing.Type[pyspark.sql.classic.dataframe.DataFrame] Expected native python type that should be returned

flytekitplugins.spark.schema.SparkDataFrameSchemaReader

Implements how SparkDataFrame should be read using the open method of FlyteSchema

class SparkDataFrameSchemaReader(
    from_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
Parameter Type Description
from_path str
cols typing.Optional[typing.Dict[str, type]]
fmt <enum 'SchemaFormat'>

Properties

Property Type Description
column_names None
from_path None

Methods

Method Description
all()
iter()

all()

def all(
    kwargs,
) -> pyspark.sql.dataframe.DataFrame
Parameter Type Description
kwargs **kwargs

iter()

def iter(
    kwargs,
) -> typing.Generator[~T, NoneType, NoneType]
Parameter Type Description
kwargs **kwargs

flytekitplugins.spark.schema.SparkDataFrameSchemaWriter

Implements how SparkDataFrame should be written to using open method of FlyteSchema

class SparkDataFrameSchemaWriter(
    to_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
Parameter Type Description
to_path str
cols typing.Optional[typing.Dict[str, type]]
fmt <enum 'SchemaFormat'>

Properties

Property Type Description
column_names None
to_path None

Methods

Method Description
write()

write()

def write(
    dfs: pyspark.sql.dataframe.DataFrame,
    kwargs,
)
Parameter Type Description
dfs pyspark.sql.dataframe.DataFrame
kwargs **kwargs

flytekitplugins.spark.schema.SparkDataFrameTransformer

Transforms Spark DataFrame’s to and from a Schema (typed/untyped)

def SparkDataFrameTransformer()

Properties

Property Type Description
is_async None
name None
python_type None This returns the python type
type_assertions_enabled None Indicates if the transformer wants type assertions to be enabled at the core type engine layer

Methods

Method Description
assert_type()
from_binary_idl() This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.
from_generic_idl() TODO: Support all Flyte Types.
get_literal_type() Converts the python type to a Flyte LiteralType.
guess_python_type() Converts the Flyte LiteralType to a python object type.
isinstance_generic()
schema_match() Check if a JSON schema fragment matches this transformer’s python_type.
to_html() Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div.
to_literal() Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
to_python_value() Converts the given Literal to a Python Type.

assert_type()

def assert_type(
    t: Type[T],
    v: T,
)
Parameter Type Description
t Type[T]
v T

from_binary_idl()

def from_binary_idl(
    binary_idl_object: Binary,
    expected_python_type: Type[T],
) -> Optional[T]

This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.`

For untyped dict, dataclass, and pydantic basemodel: Life Cycle (Untyped Dict as example): python val -> msgpack bytes -> binary literal scalar -> msgpack bytes -> python val (to_literal) (from_binary_idl)

For attribute access: Life Cycle: python val -> msgpack bytes -> binary literal scalar -> resolved golang value -> binary literal scalar -> msgpack bytes -> python val (to_literal) (propeller attribute access) (from_binary_idl)

Parameter Type Description
binary_idl_object Binary
expected_python_type Type[T]

from_generic_idl()

def from_generic_idl(
    generic: Struct,
    expected_python_type: Type[T],
) -> Optional[T]

TODO: Support all Flyte Types. This is for dataclass attribute access from input created from the Flyte Console.

Note:

  • This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.
Parameter Type Description
generic Struct
expected_python_type Type[T]

get_literal_type()

def get_literal_type(
    t: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType

Converts the python type to a Flyte LiteralType

Parameter Type Description
t typing.Type[pyspark.sql.dataframe.DataFrame]

guess_python_type()

def guess_python_type(
    literal_type: LiteralType,
) -> Type[T]

Converts the Flyte LiteralType to a python object type.

Parameter Type Description
literal_type LiteralType

isinstance_generic()

def isinstance_generic(
    obj,
    generic_alias,
)
Parameter Type Description
obj
generic_alias

schema_match()

def schema_match(
    schema: dict,
) -> bool

Check if a JSON schema fragment matches this transformer’s python_type.

For BaseModel subclasses, automatically compares the schema’s title, type, and required fields against the type’s own JSON schema. For other types, returns False by default — override if needed.

Parameter Type Description
schema dict

to_html()

def to_html(
    ctx: FlyteContext,
    python_val: T,
    expected_python_type: Type[T],
) -> str

Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div

Parameter Type Description
ctx FlyteContext
python_val T
expected_python_type Type[T]

to_literal()

def to_literal(
    ctx: flytekit.core.context_manager.FlyteContext,
    python_val: pyspark.sql.dataframe.DataFrame,
    python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
    expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal

Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type. Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating what was the mismatch

Parameter Type Description
ctx flytekit.core.context_manager.FlyteContext A FlyteContext, useful in accessing the filesystem and other attributes
python_val pyspark.sql.dataframe.DataFrame The actual value to be transformed
python_type typing.Type[pyspark.sql.dataframe.DataFrame] The assumed type of the value (this matches the declared type on the function)
expected flytekit.models.types.LiteralType Expected Literal Type

to_python_value()

def to_python_value(
    ctx: flytekit.core.context_manager.FlyteContext,
    lv: flytekit.models.literals.Literal,
    expected_python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> ~T

Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised

Parameter Type Description
ctx flytekit.core.context_manager.FlyteContext FlyteContext
lv flytekit.models.literals.Literal The received literal Value
expected_python_type typing.Type[pyspark.sql.dataframe.DataFrame] Expected native python type that should be returned