The Pipeline Framework You've Been Looking For!

Simple solutions for common data pipeline challenges

Stream Processing

  • Connect to Kafka topics instantly
  • Process message queues effortlessly
  • Handle real-time data feeds

Simple Configuration

  • Define pipelines in YAML
  • No streaming code required
  • Built-in error handling

Built-in Integrations

  • Connect to databases directly
  • Process API data streams
  • Write to vector stores

Common Use Cases

Real solutions for real streaming challenges

Stream to AI Vector Store

Transform streams into embeddings for real-time AI search and recommendations

Event Stream Processing

Process Kafka topics and message queues with simple YAML configuration

Real-time Data Feeds

Handle IoT sensors, logs, and live data streams without complex code

API Integration

Connect and transform API data streams automatically

Simple by Design

Transform any data source to any target with simple YAML configurations. DataYoga Transform handles the rest.

DataYoga Transform Pipeline Flow

Get Started in Minutes

Run your first data pipeline with our built-in example

Install DataYoga

Using pip package manager

1

Run 'pip install datayoga' to install the framework

Initialize Project

Create sample project

2

Run 'datayoga init hello_world' to create a new project with examples

Run Sample Pipeline

See it in action

3

Execute 'datayoga run sample.hello' to transform and display sample user data

Built for Real-World Applications

Processing Features

  • Back-pressure handling
  • Automatic retries
  • Stream checkpointing
  • Rate limiting

Integration Support

  • Apache Kafka
  • RabbitMQ
  • AWS SQS
  • REST APIs

Pipeline Examples

Flexible integrations across diverse sources and targets

Kafka to Vector DB

source:
  type: kafka
  topic: user-content
transform:
  type: embedding
  model: openai  
target:
  type: vectordb
  store: pinecone  # or milvus/weaviate/etc
  index: real-time-content

API to Queue

source:
  type: rest-api
  endpoint: /events
target:
  type: rabbitmq

Log Processing

source:
  type: file
  pattern: "*.log"
target:
  type: elasticsearch

Frequently Asked Questions

Common questions about DataYoga Transform

How is DataYoga Transform different from traditional ETL tools?

DataYoga Transform focuses on simplicity and flexibility. Instead of complex workflows or proprietary interfaces, you define pipelines in simple YAML files. This means faster development, easier maintenance, and no vendor lock-in.

Can I use DataYoga Transform alongside my existing data tools?

Yes! DataYoga Transform is designed to complement your existing stack. Use it for specific pipelines while keeping your current tools, or gradually migrate processes as needed.

Can I use DataYoga Transform for AI/ML pipelines?

Absolutely! DataYoga Transform makes it easy to build AI-enabled data pipelines. You can transform data streams into embeddings, connect to vector databases, and power real-time AI applications - all using the same simple YAML configuration you use for traditional pipelines.

Do I need to be a Python expert to use DataYoga Transform?

Not at all. While DataYoga is built in Python, you define pipelines using YAML configuration files. No Python coding required for standard pipelines.

How scalable is DataYoga Transform?

DataYoga Transform handles everything from simple one-off pipelines to production streaming workloads. Built-in features like back-pressure handling and checkpointing ensure reliable processing at scale.

Can I extend DataYoga Transform's functionality?

Yes! While the built-in blocks cover most needs, you can easily create custom blocks for specific requirements. The pluggable architecture makes extending functionality straightforward.

Ready to Build Your Data & AI Pipelines?

Start building flexible pipelines in minutes with DataYoga Transform