Data Processing with SMACK: Spark, Mesos, Akka, Cassandra, and Kafka

Processing layer: Spark

Almost MapReduce: bringing processing closer to data

There are plenty of frameworks already available or under active development (such as Hadoop, Cassandra, Kafka, Myriad, Storm and Samza) which are targeted to integrate widely used systems with Mesos resource management capabilities.

Ingesting the Data

Kafka acts as a buffer for incoming data

For keeping incoming data with some retention and its further pre-aggregation/processing, some sort of distributed commit log could be used. In this case, consumers will read data in batches, process it and store it into Cassandra in form of pre-aggregates.

Consuming the data: Spark Streaming

Designing for failure: backups and patching

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Tools For Building Microservices You Should Know

Cloud Networking: A Global Cloud Network for Distributed Business System

Introducing the Redis-full-check Tool

Using Google BigQuery as a tool to learn SQL

Quick start on Bee Network

Better UI testing in Ember

Streaming Zarr

logos

Intime Abandons Traditional Databases for PolarDB, Increasing Its ROI by More Than 200%

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Overview of GCP Dataproc Serverless Spark

Apache Hadoop’s Core: HDFS and MapReduce — Brief Summary

Streaming Analytics With KSQL vs. a Real-Time Analytics Database

Streaming Analytics vs Real-Time Analytics Database

Set your GCP up for Databricks deployment

Enabled APIs