flytekitplugins.spark.schema

Class	Description
`ClassicSparkDataFrameSchemaReader`	Implements how Classic SparkDataFrame should be read using the `open` method of FlyteSchema.
`ClassicSparkDataFrameSchemaWriter`	Implements how Classic SparkDataFrame should be written using `open` method of FlyteSchema.
`ClassicSparkDataFrameTransformer`	Transforms Classic Spark DataFrame’s to and from a Schema (typed/untyped).
`SparkDataFrameSchemaReader`	Implements how SparkDataFrame should be read using the `open` method of FlyteSchema.
`SparkDataFrameSchemaWriter`	Implements how SparkDataFrame should be written to using `open` method of FlyteSchema.
`SparkDataFrameTransformer`	Transforms Spark DataFrame’s to and from a Schema (typed/untyped).

flytekitplugins.spark.schema.ClassicSparkDataFrameSchemaReader

Implements how Classic SparkDataFrame should be read using the open method of FlyteSchema

Parameters

        
    
class ClassicSparkDataFrameSchemaReader(
    from_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)

Parameter	Type	Description
`from_path`	`str`
`cols`	`typing.Optional[typing.Dict[str, type]]`
`fmt`	`<enum 'SchemaFormat'>`

Properties

Property	Type	Description
`column_names`	`typing.Optional[typing.List[str]]`
`from_path`	`str`

Methods

Method	Description
`all()`
`iter()`

all()

        
    
def all(
    kwargs,
) -> pyspark.sql.classic.dataframe.DataFrame

Parameter	Type	Description
`kwargs`	`**kwargs`

iter()

        
    
def iter(
    kwargs,
) -> typing.Generator[~T, NoneType, NoneType]

Parameter	Type	Description
`kwargs`	`**kwargs`

flytekitplugins.spark.schema.ClassicSparkDataFrameSchemaWriter

Implements how Classic SparkDataFrame should be written using open method of FlyteSchema

Parameters

        
    
class ClassicSparkDataFrameSchemaWriter(
    to_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)

Parameter	Type	Description
`to_path`	`str`
`cols`	`typing.Optional[typing.Dict[str, type]]`
`fmt`	`<enum 'SchemaFormat'>`

Properties

Property	Type	Description
`column_names`	`typing.Optional[typing.List[str]]`
`to_path`	`str`

Methods

Method	Description
`write()`

write()

        
    
def write(
    dfs: pyspark.sql.classic.dataframe.DataFrame,
    kwargs,
)

Parameter	Type	Description
`dfs`	`pyspark.sql.classic.dataframe.DataFrame`
`kwargs`	`**kwargs`

flytekitplugins.spark.schema.ClassicSparkDataFrameTransformer

Transforms Classic Spark DataFrame’s to and from a Schema (typed/untyped)

Parameters

        
    
def ClassicSparkDataFrameTransformer()

Properties

Property	Type	Description
`is_async`	`bool`
`name`	`None`
`python_type`	`Type[T]`	This returns the python type
`type_assertions_enabled`	`bool`	Indicates if the transformer wants type assertions to be enabled at the core type engine layer

Methods

Method	Description
`assert_type()`
`from_binary_idl()`	This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.
`from_generic_idl()`	TODO: Support all Flyte Types.
`get_literal_type()`	Converts the python type to a Flyte LiteralType.
`guess_python_type()`	Converts the Flyte LiteralType to a python object type.
`isinstance_generic()`
`schema_match()`	Check if a JSON schema fragment matches this transformer’s python_type.
`to_html()`	Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div.
`to_literal()`	Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
`to_python_value()`	Converts the given Literal to a Python Type.

assert_type()

        
    
def assert_type(
    t: Type[T],
    v: T,
)

Parameter	Type	Description
`t`	`Type[T]`
`v`	`T`

from_binary_idl()

        
    
def from_binary_idl(
    binary_idl_object: Binary,
    expected_python_type: Type[T],
) -> Optional[T]

This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.｀

For untyped dict, dataclass, and pydantic basemodel: Life Cycle (Untyped Dict as example): python val -> msgpack bytes -> binary literal scalar -> msgpack bytes -> python val (to_literal) (from_binary_idl)

For attribute access: Life Cycle: python val -> msgpack bytes -> binary literal scalar -> resolved golang value -> binary literal scalar -> msgpack bytes -> python val (to_literal) (propeller attribute access) (from_binary_idl)

Parameter	Type	Description
`binary_idl_object`	`Binary`
`expected_python_type`	`Type[T]`

from_generic_idl()

        
    
def from_generic_idl(
    generic: Struct,
    expected_python_type: Type[T],
) -> Optional[T]

TODO: Support all Flyte Types. This is for dataclass attribute access from input created from the Flyte Console.

This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.

Parameter	Type	Description
`generic`	`Struct`
`expected_python_type`	`Type[T]`

get_literal_type()

        
    
def get_literal_type(
    t: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType

Converts the python type to a Flyte LiteralType

Parameter	Type	Description
`t`	`typing.Type[pyspark.sql.classic.dataframe.DataFrame]`

guess_python_type()

        
    
def guess_python_type(
    literal_type: LiteralType,
) -> Type[T]

Converts the Flyte LiteralType to a python object type.

Parameter	Type	Description
`literal_type`	`LiteralType`

isinstance_generic()

        
    
def isinstance_generic(
    obj,
    generic_alias,
)

Parameter	Type	Description
`obj`
`generic_alias`

schema_match()

        
    
def schema_match(
    schema: dict,
) -> bool

Check if a JSON schema fragment matches this transformer’s python_type.

For BaseModel subclasses, automatically compares the schema’s title, type, and required fields against the type’s own JSON schema. For other types, returns False by default — override if needed.

Parameter	Type	Description
`schema`	`dict`

to_html()

        
    
def to_html(
    ctx: FlyteContext,
    python_val: T,
    expected_python_type: Type[T],
) -> str

Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div

Parameter	Type	Description
`ctx`	`FlyteContext`
`python_val`	`T`
`expected_python_type`	`Type[T]`

to_literal()

        
    
def to_literal(
    ctx: flytekit.core.context_manager.FlyteContext,
    python_val: pyspark.sql.classic.dataframe.DataFrame,
    python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
    expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal

Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type. Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating what was the mismatch

Parameter	Type	Description
`ctx`	`flytekit.core.context_manager.FlyteContext`	A FlyteContext, useful in accessing the filesystem and other attributes
`python_val`	`pyspark.sql.classic.dataframe.DataFrame`	The actual value to be transformed
`python_type`	`typing.Type[pyspark.sql.classic.dataframe.DataFrame]`	The assumed type of the value (this matches the declared type on the function)
`expected`	`flytekit.models.types.LiteralType`	Expected Literal Type

to_python_value()

        
    
def to_python_value(
    ctx: flytekit.core.context_manager.FlyteContext,
    lv: flytekit.models.literals.Literal,
    expected_python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> ~T

Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised

Parameter	Type	Description
`ctx`	`flytekit.core.context_manager.FlyteContext`	FlyteContext
`lv`	`flytekit.models.literals.Literal`	The received literal Value
`expected_python_type`	`typing.Type[pyspark.sql.classic.dataframe.DataFrame]`	Expected native python type that should be returned

flytekitplugins.spark.schema.SparkDataFrameSchemaReader

Implements how SparkDataFrame should be read using the open method of FlyteSchema

Parameters

        
    
class SparkDataFrameSchemaReader(
    from_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)

Parameter	Type	Description
`from_path`	`str`
`cols`	`typing.Optional[typing.Dict[str, type]]`
`fmt`	`<enum 'SchemaFormat'>`

Properties

Property	Type	Description
`column_names`	`typing.Optional[typing.List[str]]`
`from_path`	`str`

Methods

Method	Description
`all()`
`iter()`

all()

        
    
def all(
    kwargs,
) -> pyspark.sql.dataframe.DataFrame

Parameter	Type	Description
`kwargs`	`**kwargs`

iter()

        
    
def iter(
    kwargs,
) -> typing.Generator[~T, NoneType, NoneType]

Parameter	Type	Description
`kwargs`	`**kwargs`

flytekitplugins.spark.schema.SparkDataFrameSchemaWriter

Implements how SparkDataFrame should be written to using open method of FlyteSchema

Parameters

        
    
class SparkDataFrameSchemaWriter(
    to_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)

Parameter	Type	Description
`to_path`	`str`
`cols`	`typing.Optional[typing.Dict[str, type]]`
`fmt`	`<enum 'SchemaFormat'>`

Properties

Property	Type	Description
`column_names`	`typing.Optional[typing.List[str]]`
`to_path`	`str`

Methods

Method	Description
`write()`

write()

        
    
def write(
    dfs: pyspark.sql.dataframe.DataFrame,
    kwargs,
)

Parameter	Type	Description
`dfs`	`pyspark.sql.dataframe.DataFrame`
`kwargs`	`**kwargs`

flytekitplugins.spark.schema.SparkDataFrameTransformer

Transforms Spark DataFrame’s to and from a Schema (typed/untyped)

Parameters

        
    
def SparkDataFrameTransformer()

Properties

Property	Type	Description
`is_async`	`bool`
`name`	`None`
`python_type`	`Type[T]`	This returns the python type
`type_assertions_enabled`	`bool`	Indicates if the transformer wants type assertions to be enabled at the core type engine layer

Methods

Method	Description
`assert_type()`
`from_binary_idl()`	This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.
`from_generic_idl()`	TODO: Support all Flyte Types.
`get_literal_type()`	Converts the python type to a Flyte LiteralType.
`guess_python_type()`	Converts the Flyte LiteralType to a python object type.
`isinstance_generic()`
`schema_match()`	Check if a JSON schema fragment matches this transformer’s python_type.
`to_html()`	Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div.
`to_literal()`	Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
`to_python_value()`	Converts the given Literal to a Python Type.

assert_type()

        
    
def assert_type(
    t: Type[T],
    v: T,
)

Parameter	Type	Description
`t`	`Type[T]`
`v`	`T`

from_binary_idl()

        
    
def from_binary_idl(
    binary_idl_object: Binary,
    expected_python_type: Type[T],
) -> Optional[T]

This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.｀

Parameter	Type	Description
`binary_idl_object`	`Binary`
`expected_python_type`	`Type[T]`

from_generic_idl()

        
    
def from_generic_idl(
    generic: Struct,
    expected_python_type: Type[T],
) -> Optional[T]

TODO: Support all Flyte Types. This is for dataclass attribute access from input created from the Flyte Console.

This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.

Parameter	Type	Description
`generic`	`Struct`
`expected_python_type`	`Type[T]`

get_literal_type()

        
    
def get_literal_type(
    t: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType

Converts the python type to a Flyte LiteralType

Parameter	Type	Description
`t`	`typing.Type[pyspark.sql.dataframe.DataFrame]`

guess_python_type()

        
    
def guess_python_type(
    literal_type: LiteralType,
) -> Type[T]

Converts the Flyte LiteralType to a python object type.

Parameter	Type	Description
`literal_type`	`LiteralType`

isinstance_generic()

        
    
def isinstance_generic(
    obj,
    generic_alias,
)

Parameter	Type	Description
`obj`
`generic_alias`

schema_match()

        
    
def schema_match(
    schema: dict,
) -> bool

Check if a JSON schema fragment matches this transformer’s python_type.

Parameter	Type	Description
`schema`	`dict`

to_html()

        
    
def to_html(
    ctx: FlyteContext,
    python_val: T,
    expected_python_type: Type[T],
) -> str

Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div

Parameter	Type	Description
`ctx`	`FlyteContext`
`python_val`	`T`
`expected_python_type`	`Type[T]`

to_literal()

        
    
def to_literal(
    ctx: flytekit.core.context_manager.FlyteContext,
    python_val: pyspark.sql.dataframe.DataFrame,
    python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
    expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal

Parameter	Type	Description
`ctx`	`flytekit.core.context_manager.FlyteContext`	A FlyteContext, useful in accessing the filesystem and other attributes
`python_val`	`pyspark.sql.dataframe.DataFrame`	The actual value to be transformed
`python_type`	`typing.Type[pyspark.sql.dataframe.DataFrame]`	The assumed type of the value (this matches the declared type on the function)
`expected`	`flytekit.models.types.LiteralType`	Expected Literal Type

to_python_value()

        
    
def to_python_value(
    ctx: flytekit.core.context_manager.FlyteContext,
    lv: flytekit.models.literals.Literal,
    expected_python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> ~T

Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised

Parameter	Type	Description
`ctx`	`flytekit.core.context_manager.FlyteContext`	FlyteContext
`lv`	`flytekit.models.literals.Literal`	The received literal Value
`expected_python_type`	`typing.Type[pyspark.sql.dataframe.DataFrame]`	Expected native python type that should be returned