Using Hive in Apache Flink 1.9

Overview — Flink on Hive

Design Architecture

1 Metadata

2 Table Data

Project Progress

  • Apache Flink provides simple Data Definition Language (DDL) statements to read Hive metadata, such as show databases, show tables, and describe tables.
  • Allows using the Catalog API to modify Hive metadata, such as create table and drop table.
  • Allows reading Hive data in both partition tables and non-partition tables.
  • Supports writing Hive data in non-partitioned tables.
  • Supports file formats, such as Text, ORC, Parquet, and SequenceFile
  • Allows to call UDFs created in Hive
  • INSERT and OVERWRITE are not supported.
  • Writing data to the partition table is not supported.
  • ACID tables are not supported.
  • Bucket tables are not supported.
  • Data view is not supported.


1 Add Dependencies

2 Configure HiveCatalog

2.1 SQL Client

- name: myhive
type: hive
hive-conf-dir: /path/to/hive_conf_dir
hive-version: 2.3.4
Flink SQL> show catalogs;
Flink SQL> use catalog myhive;

2.2 Table API

String name = "myhive";
String defaultDatabase = "default";
String hiveConfDir = "/path/to/hive_conf_dir";
String version = "2.3.4";
TableEnvironment tableEnv = ⋯; // create TableEnvironment
HiveCatalog hiveCatalog = new HiveCatalog(name, defaultDatabase,
hiveConfDir, version);
tableEnv.registerCatalog(name, hiveCatalog);

3 Read and Write Hive Tables

3.1 SQL Client

Flink SQL> describe src;
|-- key: STRING
|-- value: STRING
Flink SQL> select * from src; key value
100 val_100
298 val_298
9 val_9
341 val_341
498 val_498
146 val_146
458 val_458
362 val_362
186 val_186
⋯⋯ ⋯⋯
Flink SQL> insert into src values ('newKey','newVal');

3.2 Table API

TableEnvironment tableEnv = ⋯; // create TableEnvironment
tableEnv.registerCatalog("myhive", hiveCatalog);
// set myhive as current catalog
Table src = tableEnv.sqlQuery("select * from src");
// write src into a sink or do further analysis
tableEnv.sqlUpdate("insert into src values ('newKey', 'newVal')");
tableEnv.execute("insert into src");

4 Support for Different Hive Versions

5 Selection of Execution Mode and Planner

# select the implementation responsible for planning table programs
# possible values are 'old' (used by default) or 'blink'
planner: blink
# 'batch' or 'streaming' execution
type: batch
EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

Future Planning

  • Adding support for more data types
  • Adding support for writing data to partition tables, including static and dynamic partitions
  • Adding support for INSERT and OVERWRITE
  • Support for View
  • Providing better support for DDL and DML statements
  • Adding support for TableSink in streaming mode to allow users to conveniently write stream data to Hive
  • Testing and supporting more Hive versions
  • Adding support for Bucket tables
  • Adding support for performance testing and optimization

Original Source:




Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Back to Basics — OAuth 2.0

Building an Enterprise Data Lake On Cloud — Security

6. User Stories and Functional Requirements

An Introduction to Codeigniter PHP Framework

We Ignore the Most Critical Part of the Rent Versus Own Debate

Microservices — What Is Loose Coupling?

[44] Anonymous collaborations

Customer Superstars: Emily Dresner, CTO of Upside Travel

Upside Logo.jpg

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

More from Medium

Apache Airflow: Understanding Operators

End to end example of CI-CD pipeline using Azure Machine Learning

Form Fits Function: A Case Study on Integrating MongoDB and Redis with Apache Spark

DataFlow with Apache Nifi(Flight & Weather API, Writing various source)