14 docs tagged with "compute"

Compute integrations.

Dagster & AWS EMR

The AWS integration provides ways orchestrating data pipelines that leverage AWS services, including AWS EMR (Elastic MapReduce). This integration allows you to run and scale big data workloads using open source tools such as Apache Spark, Hive, Presto, and more.

Dagster & AWS Glue

The AWS integration library provides the PipesGlueClient resource, enabling you to launch AWS Glue jobs directly from Dagster assets and ops. This integration allows you to pass parameters to Glue code while Dagster receives real-time events, such as logs, asset checks, and asset materializations, from the initiated jobs. With minimal code changes required on the job side, this integration is both efficient and easy to implement.

Dagster & AWS Lambda

Using this integration, you can leverage AWS Lambda to execute external code as part of your Dagster pipelines. This is particularly useful for running serverless functions that can scale automatically and handle various workloads without the need for managing infrastructure. The PipesLambdaClient class allows you to invoke AWS Lambda functions and stream logs and structured metadata back to Dagster's UI and tools.

Dagster & Databricks

The Databricks integration library provides the `PipesDatabricksClient` resource, enabling you to launch Databricks jobs directly from Dagster assets and ops. This integration allows you to pass parameters to Databricks code while Dagster receives real-time events, such as logs, asset checks, and asset materializations, from the initiated jobs. With minimal code changes required on the job side, this integration is both efficient and easy to implement.

Dagster & Docker

The Docker integration library provides the PipesDockerClient resource, enabling you to launch Docker containers and execute external code directly from Dagster assets and ops. This integration allows you to pass parameters to Docker containers while Dagster receives real-time events, such as logs, asset checks, and asset materializations, from the initiated jobs. With minimal code changes required on the job side, this integration is both efficient and easy to implement.

Dagster & GCP Cloud Run

The community-supported dagster-contrib-gcp package provides integrations with Google Cloud Platform (GCP) services.

Dagster & GCP Dataproc

Using this integration, you can manage and interact with Google Cloud Platform's Dataproc service directly from Dagster. This integration allows you to create, manage, and delete Dataproc clusters, and submit and monitor jobs on these clusters.

Dagster & HashiCorp

The community-supported Nomad package provides an integration with HashiCorp Nomad.

Dagster & Hex

The community-supported Hex package provides an integration with Hex.

Dagster & Jupyter Notebooks

Dagstermill eliminates the tedious "productionization" of Jupyter notebooks.

Dagster & Kubernetes

The Kubernetes integration library provides the PipesK8sClient resource, enabling you to launch Kubernetes pods and execute external code directly from Dagster assets and ops. This integration allows you to pass parameters to Kubernetes pods while Dagster receives real-time events, such as logs, asset checks, and asset materializations, from the initiated jobs. With minimal code changes required on the job side, this integration is both efficient and easy to implement.