Data Processing with SMACK: Spark, Mesos, Akka, Cassandra, and Kafka

Processing layer: Spark

Almost MapReduce: bringing processing closer to data

There are plenty of frameworks already available or under active development (such as Hadoop, Cassandra, Kafka, Myriad, Storm and Samza) which are targeted to integrate widely used systems with Mesos resource management capabilities.

Ingesting the Data

Kafka acts as a buffer for incoming data

For keeping incoming data with some retention and its further pre-aggregation/processing, some sort of distributed commit log could be used. In this case, consumers will read data in batches, process it and store it into Cassandra in form of pre-aggregates.

Consuming the data: Spark Streaming

Designing for failure: backups and patching

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store