AliORC: A Combination of MaxCompute and Apache ORC

About the Author

Introduction to the Apache ORC Project and Alibaba’s Contribution

Apache ORC Project

ORC Adopter

Timeline

Contribution from Alibaba

Why Did Alibaba Cloud MaxCompute Choose ORC?

Row-Based vs. Column-Based Storage

A Quick Look at ORC

How About Apache Parquet

Benchmark: ORC vs. Parquet

Datasets

Storage Cost

Full Table Scan

AliORC = Alibaba ORC

What Is the Difference Between AliORC and Open-Source ORC?

AliORC Is More than Apache ORC

AliORC Optimization #1: Async Prefetch

AliORC Optimization #2: Small I/O Elimination

AliORC Optimization #3: Memory Management for Streams in Each Column

AliORC Optimization #4: Seek Read

AliORC Optimization #5: Adapting Dictionary Encoding

AliORC Optimization #6: Range Alignment for Range Partition

Value of AliORC for End Users

Advantages of Alibaba Cloud MaxCompute Compared with Similar Products

Why Did I Join the MaxCompute Team?

How Did I Take the Road of Big Data Technology?

Working Experience at Alibaba U.S. Office

How Did I Become the First Chinese ORC PMC?

Concluding Remarks

Original Source

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store