Dagster & Iceberg
This library provides I/O managers for reading and writing Apache Iceberg tables. It also provides a Dagster resource for accessing Iceberg tables.
Installation
- uv
- pip
uv add dagster-iceberg
pip install dagster-iceberg
Example
import pyarrow as pa
from dagster_iceberg.config import IcebergCatalogConfig
from dagster_iceberg.io_manager.arrow import PyArrowIcebergIOManager
import dagster as dg
@dg.asset
def my_table() -> pa.Table:
n_legs = pa.array([2, 4, 5, 100])
animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
names = ["n_legs", "animals"]
return pa.Table.from_arrays([n_legs, animals], names=names)
defs = dg.Definitions(
assets=[my_table],
resources={
"io_manager": PyArrowIcebergIOManager(
name="default",
config=IcebergCatalogConfig(
properties={
"type": "sql",
"uri": "sqlite:////tmp/warehouse/pyiceberg_catalog.db",
"warehouse": "file:///tmp/warehouse",
}
),
namespace="default",
)
},
)
About Apache Iceberg
Iceberg is a high-performance format for huge analytic tables. It brings the reliability and simplicity of SQL tables to big data, while making it possible for multiple engines to safely work with the same tables, at the same time.