Skip to content

Simplan - Documentation

Simplan For Spark

Simplan - Documentation

Home
Home
- Overview
- Implementations
  Implementations
  - Data Processing
    Data Processing
    
    Overview
    
    Spark
    
    Flink
    
    Presto
  - Orchestration
    Orchestration
    
    Overview
  - Console
Getting Started
Framework
Framework
- Overview
- Tasks
- Operators
- Task Ordering
- Logging and Monitoring
  Logging and Monitoring
Spark
Spark
- Simplan For Spark Simplan For Spark
  Table of contents
- Getting Started
  Getting Started
  - Local Setup
  - Batch Processing Platform
    Batch Processing Platform
    
    Getting Started
    
    Custom Processor
    
    Shared Processor
    
    Sql Script
  - Superglue
    Superglue
    
    Superglue Launcher
- Contribution Guide
- System Configurations
  System Configurations
  - Spark Properties
- Operators
  Operators
  - Build Your Own
  - Sources/Sinks
    Sources/Sinks
    
    Batch Sources
    
    Batch Sinks
    
    Stream Sources
    Stream Sources
    
    Socket Streaming Source
    
    Kafka Streaming Source
    
    File Streaming Source
    
    Delta Table
  - Sinks
    Sinks
    
    Batch Sinks
    Batch Sinks
    
    CSV Sink
    
    JSON Sink
    
    AVRO Sink
    
    Parquet Sink
    
    Kafka Sink
    
    Stream Sinks
    Stream Sinks
    
    Socket Sink
    
    Kafka Sink
  - Transformations
    Transformations
    
    Spark SQL
    
    Filtering
    
    Projection
    
    Delta Merge Operator
- Qualified Params
  Qualified Params
  - Schema Param
Flink
Flink
- Overview
- Operators
  Operators
  - Build Your Own
  - Sources
    Sources
    
    Stream Sources
    Stream Sources
    
    Kafka
  - Sinks
    Sinks
    
    Stream Sinks
    Stream Sinks
    
    Kafka
  - Transformations
    Transformations
    
    Flink SQL
    
    Filtering
    
    Projection
    
    Hybrid Source
Release Notes

SimPlan for Apache Spark

For generating customer value from data, Data workers need to process large volumes of batch and streaming data. Separate codebase are maintained for Batch and Streaming modes which leads to siloed implementations for common data processing patterns. This leads to duplicate efforts from implementation to maintenance, hampering productivity.

Users will be able to provide business logic as operators in a config file and the framework will take care of the rest. The framework will take care of the execution of these operators and provide the results to the user. The framework will also provide the lineage of the data and the metrics of the execution.

Simplan Spark is an implementation of Simplan framework which adds SparkApplicationContext to the framework. Refer Simplan Framework documentation to learn more.

Tech Stack

Unified Data Processing Architecture

Features

Config Driven (Low/No code)
Pluggable/Reusable operators for common processing tasks
Batch and Streaming workloads
External Integrations : Redshift, Athena, Kafka etc
Built-In Quality control with circuit breakers.
Lineage, Observability, and Metrics tracking.
Integration for Intuit services like IDPS, Config Services, etc
Improves developer productivity by 10-100 times
Improves code quality, maintainability and reduces duplication

Presentations

Simplan @ DataAI Summit

Simplan Community

Join Simplan Slack Channel #simplan-community
Ask questions on StackOverflow using tag simplan-spark

Other Simplan Implementations