Using Hive in Apache Flink 1.9

Overview — Flink on Hive

Design Architecture

1 Metadata

2 Table Data

Project Progress

  • Apache Flink provides simple Data Definition Language (DDL) statements to read Hive metadata, such as show databases, show tables, and describe tables.
  • Allows using the Catalog API to modify Hive metadata, such as create table and drop table.
  • Allows reading Hive data in both partition tables and non-partition tables.
  • Supports writing Hive data in non-partitioned tables.
  • Supports file formats, such as Text, ORC, Parquet, and SequenceFile
  • Allows to call UDFs created in Hive
  • INSERT and OVERWRITE are not supported.
  • Writing data to the partition table is not supported.
  • ACID tables are not supported.
  • Bucket tables are not supported.
  • Data view is not supported.


1 Add Dependencies

2 Configure HiveCatalog

2.1 SQL Client

- name: myhive
type: hive
hive-conf-dir: /path/to/hive_conf_dir
hive-version: 2.3.4
Flink SQL> show catalogs;
Flink SQL> use catalog myhive;

2.2 Table API

String name = "myhive";
String defaultDatabase = "default";
String hiveConfDir = "/path/to/hive_conf_dir";
String version = "2.3.4";
TableEnvironment tableEnv = ⋯; // create TableEnvironment
HiveCatalog hiveCatalog = new HiveCatalog(name, defaultDatabase,
hiveConfDir, version);
tableEnv.registerCatalog(name, hiveCatalog);

3 Read and Write Hive Tables

3.1 SQL Client

Flink SQL> describe src;
|-- key: STRING
|-- value: STRING
Flink SQL> select * from src; key value
100 val_100
298 val_298
9 val_9
341 val_341
498 val_498
146 val_146
458 val_458
362 val_362
186 val_186
⋯⋯ ⋯⋯
Flink SQL> insert into src values ('newKey','newVal');

3.2 Table API

TableEnvironment tableEnv = ⋯; // create TableEnvironment
tableEnv.registerCatalog("myhive", hiveCatalog);
// set myhive as current catalog
Table src = tableEnv.sqlQuery("select * from src");
// write src into a sink or do further analysis
tableEnv.sqlUpdate("insert into src values ('newKey', 'newVal')");
tableEnv.execute("insert into src");

4 Support for Different Hive Versions

5 Selection of Execution Mode and Planner

# select the implementation responsible for planning table programs
# possible values are 'old' (used by default) or 'blink'
planner: blink
# 'batch' or 'streaming' execution
type: batch
EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

Future Planning

  • Adding support for more data types
  • Adding support for writing data to partition tables, including static and dynamic partitions
  • Adding support for INSERT and OVERWRITE
  • Support for View
  • Providing better support for DDL and DML statements
  • Adding support for TableSink in streaming mode to allow users to conveniently write stream data to Hive
  • Testing and supporting more Hive versions
  • Adding support for Bucket tables
  • Adding support for performance testing and optimization

Original Source:




Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Introducing Vale Server

"My story so far, after taking 16 days Free Web design and development training from Rivasult”

InterSystems Interoperability Contest

When “One For the Price of Two” is Both Good and Desirable!

Meet Our New Partner – Web3Auth

Cloud Native Model Driven Telemetry Stack on OpenShift

High level diagram

My story when started learning rust

Fission: A Deep Dive Into Serverless Kubernetes Frameworks (2)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

More from Medium

Apache Kafka for beginners

Kafka Connect: An Easier Way to Connect Messages with Data Stores

How we implemented Pod Logging at NetBook

Apache Airflow: Understanding Operators