Lyft’s Large-scale Flink-based Near Real-time Data Analytics Platform

1) Streaming Data Scenarios at Lyft

About Lyft

Streaming Data Scenarios at Lyft

Lyft’s Previous Data Analytics Platform and Architecture

Problems with the Previous Platform

2) Near Real-time Data Analytics Platform and Architecture

Architecture of the Near Real-time Platform

Platform Design

Platform Features and Applications

Flink-based Near Real-time Data Persistence

Multi-tier Compaction and Deduplication During ETL

3) In-depth Analysis of Platform Performance and Fault Tolerance

EventTime-driven Partition Sensing

Challenges in Schema Evolution

Deep Dive into AWS S3

Optimization Solution for Parquet

Data Backfilling-based Fault Tolerance for the Platform

4) Summary and Future Outlook

Experience and Lessons Learned

Future Outlook

Original Source:

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Getting started with the VideoCoin Publisher Studio

How To Leverage The Software Engineering Shortage

How to run your VirtualBox OVA Applications on Google Cloud

A simple workflow describing possible ways to launch a compute engine instance: from the file, or from a machine image.

Load that &%$*# Checkpoint!

Using Reddit API for Ruby

Assisting You Pick the Right Memory FoamMattress https://t.co/cSd0H1Qeca

Why self-learning coding platforms are evil

Service Communication and Cluster Setup in Service Fabric

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

GCP — Execute Jar on Databricks from Airflow — Big Data Processing

Apache Hadoop’s Core: HDFS and MapReduce — Brief Summary

Apache Airflow

Churn Prediction with PySpark and Google Cloud Dataproc