apache / spark
Apache Spark - A unified analytics engine for large-scale data processing
See what the GitHub community is most excited about today.
Apache Spark - A unified analytics engine for large-scale data processing
Source code for Twitter's Recommendation Algorithm
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Open-source high-performance RISC-V processor
Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
An open protocol for secure data sharing
Rocket Chip Generator
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
CMAK is a tool for managing Apache Kafka clusters
ZIO — A type-safe, composable library for async and concurrent programming in Scala
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
FEEL parser and interpreter written in Scala
workbench identity and access management
Functional GraphQL library for Scala
The Daml smart contract language
A Git platform powered by Scala with easy installation, high extensibility & GitHub API compatibility
An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
A Spark plugin for reading and writing Excel files