# flytekitplugins.spark.schema

## Directory

### Classes

| Class | Description |
|-|-|
| [`ClassicSparkDataFrameSchemaReader`](https://www.union.ai/docs/v1/flyte/api-reference/plugins/spark/packages/flytekitplugins.spark.schema/page.md#flytekitpluginssparkschemaclassicsparkdataframeschemareader) | Implements how Classic SparkDataFrame should be read using the ``open`` method of FlyteSchema. |
| [`ClassicSparkDataFrameSchemaWriter`](https://www.union.ai/docs/v1/flyte/api-reference/plugins/spark/packages/flytekitplugins.spark.schema/page.md#flytekitpluginssparkschemaclassicsparkdataframeschemawriter) | Implements how Classic SparkDataFrame should be written using ``open`` method of FlyteSchema. |
| [`ClassicSparkDataFrameTransformer`](https://www.union.ai/docs/v1/flyte/api-reference/plugins/spark/packages/flytekitplugins.spark.schema/page.md#flytekitpluginssparkschemaclassicsparkdataframetransformer) | Transforms Classic Spark DataFrame's to and from a Schema (typed/untyped). |
| [`SparkDataFrameSchemaReader`](https://www.union.ai/docs/v1/flyte/api-reference/plugins/spark/packages/flytekitplugins.spark.schema/page.md#flytekitpluginssparkschemasparkdataframeschemareader) | Implements how SparkDataFrame should be read using the ``open`` method of FlyteSchema. |
| [`SparkDataFrameSchemaWriter`](https://www.union.ai/docs/v1/flyte/api-reference/plugins/spark/packages/flytekitplugins.spark.schema/page.md#flytekitpluginssparkschemasparkdataframeschemawriter) | Implements how SparkDataFrame should be written to using ``open`` method of FlyteSchema. |
| [`SparkDataFrameTransformer`](https://www.union.ai/docs/v1/flyte/api-reference/plugins/spark/packages/flytekitplugins.spark.schema/page.md#flytekitpluginssparkschemasparkdataframetransformer) | Transforms Spark DataFrame's to and from a Schema (typed/untyped). |

## flytekitplugins.spark.schema.ClassicSparkDataFrameSchemaReader

Implements how Classic SparkDataFrame should be read using the ``open`` method of FlyteSchema

```python
class ClassicSparkDataFrameSchemaReader(
    from_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
```
| Parameter | Type | Description |
|-|-|-|
| `from_path` | `str` | |
| `cols` | `typing.Optional[typing.Dict[str, type]]` | |
| `fmt` | `<enum 'SchemaFormat'>` | |

### Properties

| Property | Type | Description |
|-|-|-|
| `column_names` | `None` |  |
| `from_path` | `None` |  |

### Methods

| Method | Description |
|-|-|
| [`all()`](#all) |  |
| [`iter()`](#iter) |  |

#### all()

```python
def all(
    kwargs,
) -> pyspark.sql.classic.dataframe.DataFrame
```
| Parameter | Type | Description |
|-|-|-|
| `kwargs` | `**kwargs` | |

#### iter()

```python
def iter(
    kwargs,
) -> typing.Generator[~T, NoneType, NoneType]
```
| Parameter | Type | Description |
|-|-|-|
| `kwargs` | `**kwargs` | |

## flytekitplugins.spark.schema.ClassicSparkDataFrameSchemaWriter

Implements how Classic SparkDataFrame should be written using ``open`` method of FlyteSchema

```python
class ClassicSparkDataFrameSchemaWriter(
    to_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
```
| Parameter | Type | Description |
|-|-|-|
| `to_path` | `str` | |
| `cols` | `typing.Optional[typing.Dict[str, type]]` | |
| `fmt` | `<enum 'SchemaFormat'>` | |

### Properties

| Property | Type | Description |
|-|-|-|
| `column_names` | `None` |  |
| `to_path` | `None` |  |

### Methods

| Method | Description |
|-|-|
| [`write()`](#write) |  |

#### write()

```python
def write(
    dfs: pyspark.sql.classic.dataframe.DataFrame,
    kwargs,
)
```
| Parameter | Type | Description |
|-|-|-|
| `dfs` | `pyspark.sql.classic.dataframe.DataFrame` | |
| `kwargs` | `**kwargs` | |

## flytekitplugins.spark.schema.ClassicSparkDataFrameTransformer

Transforms Classic Spark DataFrame's to and from a Schema (typed/untyped)

```python
def ClassicSparkDataFrameTransformer()
```
### Properties

| Property | Type | Description |
|-|-|-|
| `is_async` | `None` |  |
| `name` | `None` |  |
| `python_type` | `None` | This returns the python type |
| `type_assertions_enabled` | `None` | Indicates if the transformer wants type assertions to be enabled at the core type engine layer |

### Methods

| Method | Description |
|-|-|
| [`assert_type()`](#assert_type) |  |
| [`from_binary_idl()`](#from_binary_idl) | This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access. |
| [`from_generic_idl()`](#from_generic_idl) | TODO: Support all Flyte Types. |
| [`get_literal_type()`](#get_literal_type) | Converts the python type to a Flyte LiteralType. |
| [`guess_python_type()`](#guess_python_type) | Converts the Flyte LiteralType to a python object type. |
| [`isinstance_generic()`](#isinstance_generic) |  |
| [`schema_match()`](#schema_match) | Check if a JSON schema fragment matches this transformer's python_type. |
| [`to_html()`](#to_html) | Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div. |
| [`to_literal()`](#to_literal) | Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type. |
| [`to_python_value()`](#to_python_value) | Converts the given Literal to a Python Type. |

#### assert_type()

```python
def assert_type(
    t: Type[T],
    v: T,
)
```
| Parameter | Type | Description |
|-|-|-|
| `t` | `Type[T]` | |
| `v` | `T` | |

#### from_binary_idl()

```python
def from_binary_idl(
    binary_idl_object: Binary,
    expected_python_type: Type[T],
) -> Optional[T]
```
This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.｀

For untyped dict, dataclass, and pydantic basemodel:
Life Cycle (Untyped Dict as example):
    python val -&gt; msgpack bytes -&gt; binary literal scalar -&gt; msgpack bytes -&gt; python val
                  (to_literal)                             (from_binary_idl)

For attribute access:
Life Cycle:
    python val -&gt; msgpack bytes -&gt; binary literal scalar -&gt; resolved golang value -&gt; binary literal scalar -&gt; msgpack bytes -&gt; python val
                  (to_literal)                            (propeller attribute access)                       (from_binary_idl)

| Parameter | Type | Description |
|-|-|-|
| `binary_idl_object` | `Binary` | |
| `expected_python_type` | `Type[T]` | |

#### from_generic_idl()

```python
def from_generic_idl(
    generic: Struct,
    expected_python_type: Type[T],
) -> Optional[T]
```
TODO: Support all Flyte Types.
This is for dataclass attribute access from input created from the Flyte Console.

Note:
- This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.

| Parameter | Type | Description |
|-|-|-|
| `generic` | `Struct` | |
| `expected_python_type` | `Type[T]` | |

#### get_literal_type()

```python
def get_literal_type(
    t: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType
```
Converts the python type to a Flyte LiteralType

| Parameter | Type | Description |
|-|-|-|
| `t` | `typing.Type[pyspark.sql.classic.dataframe.DataFrame]` | |

#### guess_python_type()

```python
def guess_python_type(
    literal_type: LiteralType,
) -> Type[T]
```
Converts the Flyte LiteralType to a python object type.

| Parameter | Type | Description |
|-|-|-|
| `literal_type` | `LiteralType` | |

#### isinstance_generic()

```python
def isinstance_generic(
    obj,
    generic_alias,
)
```
| Parameter | Type | Description |
|-|-|-|
| `obj` |  | |
| `generic_alias` |  | |

#### schema_match()

```python
def schema_match(
    schema: dict,
) -> bool
```
Check if a JSON schema fragment matches this transformer's python_type.

For BaseModel subclasses, automatically compares the schema's title, type, and
required fields against the type's own JSON schema. For other types, returns
False by default — override if needed.

| Parameter | Type | Description |
|-|-|-|
| `schema` | `dict` | |

#### to_html()

```python
def to_html(
    ctx: FlyteContext,
    python_val: T,
    expected_python_type: Type[T],
) -> str
```
Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div

| Parameter | Type | Description |
|-|-|-|
| `ctx` | `FlyteContext` | |
| `python_val` | `T` | |
| `expected_python_type` | `Type[T]` | |

#### to_literal()

```python
def to_literal(
    ctx: flytekit.core.context_manager.FlyteContext,
    python_val: pyspark.sql.classic.dataframe.DataFrame,
    python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
    expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal
```
Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these
do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating
what was the mismatch

| Parameter | Type | Description |
|-|-|-|
| `ctx` | `flytekit.core.context_manager.FlyteContext` | A FlyteContext, useful in accessing the filesystem and other attributes |
| `python_val` | `pyspark.sql.classic.dataframe.DataFrame` | The actual value to be transformed |
| `python_type` | `typing.Type[pyspark.sql.classic.dataframe.DataFrame]` | The assumed type of the value (this matches the declared type on the function) |
| `expected` | `flytekit.models.types.LiteralType` | Expected Literal Type |

#### to_python_value()

```python
def to_python_value(
    ctx: flytekit.core.context_manager.FlyteContext,
    lv: flytekit.models.literals.Literal,
    expected_python_type: typing.Type[pyspark.sql.classic.dataframe.DataFrame],
) -> ~T
```
Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised

| Parameter | Type | Description |
|-|-|-|
| `ctx` | `flytekit.core.context_manager.FlyteContext` | FlyteContext |
| `lv` | `flytekit.models.literals.Literal` | The received literal Value |
| `expected_python_type` | `typing.Type[pyspark.sql.classic.dataframe.DataFrame]` | Expected native python type that should be returned |

## flytekitplugins.spark.schema.SparkDataFrameSchemaReader

Implements how SparkDataFrame should be read using the ``open`` method of FlyteSchema

```python
class SparkDataFrameSchemaReader(
    from_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
```
| Parameter | Type | Description |
|-|-|-|
| `from_path` | `str` | |
| `cols` | `typing.Optional[typing.Dict[str, type]]` | |
| `fmt` | `<enum 'SchemaFormat'>` | |

### Properties

| Property | Type | Description |
|-|-|-|
| `column_names` | `None` |  |
| `from_path` | `None` |  |

### Methods

| Method | Description |
|-|-|
| [`all()`](#all) |  |
| [`iter()`](#iter) |  |

#### all()

```python
def all(
    kwargs,
) -> pyspark.sql.dataframe.DataFrame
```
| Parameter | Type | Description |
|-|-|-|
| `kwargs` | `**kwargs` | |

#### iter()

```python
def iter(
    kwargs,
) -> typing.Generator[~T, NoneType, NoneType]
```
| Parameter | Type | Description |
|-|-|-|
| `kwargs` | `**kwargs` | |

## flytekitplugins.spark.schema.SparkDataFrameSchemaWriter

Implements how SparkDataFrame should be written to using ``open`` method of FlyteSchema

```python
class SparkDataFrameSchemaWriter(
    to_path: str,
    cols: typing.Optional[typing.Dict[str, type]],
    fmt: <enum 'SchemaFormat'>,
)
```
| Parameter | Type | Description |
|-|-|-|
| `to_path` | `str` | |
| `cols` | `typing.Optional[typing.Dict[str, type]]` | |
| `fmt` | `<enum 'SchemaFormat'>` | |

### Properties

| Property | Type | Description |
|-|-|-|
| `column_names` | `None` |  |
| `to_path` | `None` |  |

### Methods

| Method | Description |
|-|-|
| [`write()`](#write) |  |

#### write()

```python
def write(
    dfs: pyspark.sql.dataframe.DataFrame,
    kwargs,
)
```
| Parameter | Type | Description |
|-|-|-|
| `dfs` | `pyspark.sql.dataframe.DataFrame` | |
| `kwargs` | `**kwargs` | |

## flytekitplugins.spark.schema.SparkDataFrameTransformer

Transforms Spark DataFrame's to and from a Schema (typed/untyped)

```python
def SparkDataFrameTransformer()
```
### Properties

| Property | Type | Description |
|-|-|-|
| `is_async` | `None` |  |
| `name` | `None` |  |
| `python_type` | `None` | This returns the python type |
| `type_assertions_enabled` | `None` | Indicates if the transformer wants type assertions to be enabled at the core type engine layer |

### Methods

| Method | Description |
|-|-|
| [`assert_type()`](#assert_type) |  |
| [`from_binary_idl()`](#from_binary_idl) | This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access. |
| [`from_generic_idl()`](#from_generic_idl) | TODO: Support all Flyte Types. |
| [`get_literal_type()`](#get_literal_type) | Converts the python type to a Flyte LiteralType. |
| [`guess_python_type()`](#guess_python_type) | Converts the Flyte LiteralType to a python object type. |
| [`isinstance_generic()`](#isinstance_generic) |  |
| [`schema_match()`](#schema_match) | Check if a JSON schema fragment matches this transformer's python_type. |
| [`to_html()`](#to_html) | Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div. |
| [`to_literal()`](#to_literal) | Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type. |
| [`to_python_value()`](#to_python_value) | Converts the given Literal to a Python Type. |

#### assert_type()

```python
def assert_type(
    t: Type[T],
    v: T,
)
```
| Parameter | Type | Description |
|-|-|-|
| `t` | `Type[T]` | |
| `v` | `T` | |

#### from_binary_idl()

```python
def from_binary_idl(
    binary_idl_object: Binary,
    expected_python_type: Type[T],
) -> Optional[T]
```
This function primarily handles deserialization for untyped dicts, dataclasses, Pydantic BaseModels, and attribute access.｀

For untyped dict, dataclass, and pydantic basemodel:
Life Cycle (Untyped Dict as example):
    python val -&gt; msgpack bytes -&gt; binary literal scalar -&gt; msgpack bytes -&gt; python val
                  (to_literal)                             (from_binary_idl)

For attribute access:
Life Cycle:
    python val -&gt; msgpack bytes -&gt; binary literal scalar -&gt; resolved golang value -&gt; binary literal scalar -&gt; msgpack bytes -&gt; python val
                  (to_literal)                            (propeller attribute access)                       (from_binary_idl)

| Parameter | Type | Description |
|-|-|-|
| `binary_idl_object` | `Binary` | |
| `expected_python_type` | `Type[T]` | |

#### from_generic_idl()

```python
def from_generic_idl(
    generic: Struct,
    expected_python_type: Type[T],
) -> Optional[T]
```
TODO: Support all Flyte Types.
This is for dataclass attribute access from input created from the Flyte Console.

Note:
- This can be removed in the future when the Flyte Console support generate Binary IDL Scalar as input.

| Parameter | Type | Description |
|-|-|-|
| `generic` | `Struct` | |
| `expected_python_type` | `Type[T]` | |

#### get_literal_type()

```python
def get_literal_type(
    t: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> flytekit.models.types.LiteralType
```
Converts the python type to a Flyte LiteralType

| Parameter | Type | Description |
|-|-|-|
| `t` | `typing.Type[pyspark.sql.dataframe.DataFrame]` | |

#### guess_python_type()

```python
def guess_python_type(
    literal_type: LiteralType,
) -> Type[T]
```
Converts the Flyte LiteralType to a python object type.

| Parameter | Type | Description |
|-|-|-|
| `literal_type` | `LiteralType` | |

#### isinstance_generic()

```python
def isinstance_generic(
    obj,
    generic_alias,
)
```
| Parameter | Type | Description |
|-|-|-|
| `obj` |  | |
| `generic_alias` |  | |

#### schema_match()

```python
def schema_match(
    schema: dict,
) -> bool
```
Check if a JSON schema fragment matches this transformer's python_type.

For BaseModel subclasses, automatically compares the schema's title, type, and
required fields against the type's own JSON schema. For other types, returns
False by default — override if needed.

| Parameter | Type | Description |
|-|-|-|
| `schema` | `dict` | |

#### to_html()

```python
def to_html(
    ctx: FlyteContext,
    python_val: T,
    expected_python_type: Type[T],
) -> str
```
Converts any python val (dataframe, int, float) to a html string, and it will be wrapped in the HTML div

| Parameter | Type | Description |
|-|-|-|
| `ctx` | `FlyteContext` | |
| `python_val` | `T` | |
| `expected_python_type` | `Type[T]` | |

#### to_literal()

```python
def to_literal(
    ctx: flytekit.core.context_manager.FlyteContext,
    python_val: pyspark.sql.dataframe.DataFrame,
    python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
    expected: flytekit.models.types.LiteralType,
) -> flytekit.models.literals.Literal
```
Converts a given python_val to a Flyte Literal, assuming the given python_val matches the declared python_type.
Implementers should refrain from using type(python_val) instead rely on the passed in python_type. If these
do not match (or are not allowed) the Transformer implementer should raise an AssertionError, clearly stating
what was the mismatch

| Parameter | Type | Description |
|-|-|-|
| `ctx` | `flytekit.core.context_manager.FlyteContext` | A FlyteContext, useful in accessing the filesystem and other attributes |
| `python_val` | `pyspark.sql.dataframe.DataFrame` | The actual value to be transformed |
| `python_type` | `typing.Type[pyspark.sql.dataframe.DataFrame]` | The assumed type of the value (this matches the declared type on the function) |
| `expected` | `flytekit.models.types.LiteralType` | Expected Literal Type |

#### to_python_value()

```python
def to_python_value(
    ctx: flytekit.core.context_manager.FlyteContext,
    lv: flytekit.models.literals.Literal,
    expected_python_type: typing.Type[pyspark.sql.dataframe.DataFrame],
) -> ~T
```
Converts the given Literal to a Python Type. If the conversion cannot be done an AssertionError should be raised

| Parameter | Type | Description |
|-|-|-|
| `ctx` | `flytekit.core.context_manager.FlyteContext` | FlyteContext |
| `lv` | `flytekit.models.literals.Literal` | The received literal Value |
| `expected_python_type` | `typing.Type[pyspark.sql.dataframe.DataFrame]` | Expected native python type that should be returned |

---
**Source**: https://github.com/unionai/unionai-docs/blob/main/content/api-reference/plugins/spark/packages/flytekitplugins.spark.schema.md
**HTML**: https://www.union.ai/docs/v1/flyte/api-reference/plugins/spark/packages/flytekitplugins.spark.schema/
