Data Processing with SMACK: Spark, Mesos, Akka, Cassandra, and Kafka

Processing layer: Spark

Almost MapReduce: bringing processing closer to data

There are plenty of frameworks already available or under active development (such as Hadoop, Cassandra, Kafka, Myriad, Storm and Samza) which are targeted to integrate widely used systems with Mesos resource management capabilities.

Ingesting the Data

Kafka acts as a buffer for incoming data

For keeping incoming data with some retention and its further pre-aggregation/processing, some sort of distributed commit log could be used. In this case, consumers will read data in batches, process it and store it into Cassandra in form of pre-aggregates.

Consuming the data: Spark Streaming

Designing for failure: backups and patching



