How to Connect Tableau to MaxCompute Using HiveServer2 Proxy

What Is Hive?

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids portability of SQL-based applications to Hadoop.

What Is HiveServer2?

HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results. The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC.

What Is HiveServer2 Proxy?

HiveServer2 Proxy is a proxy, which is obtained after custom development based on the original HiveServer2. It does the job of accepting the Thrift request submitted by the client, de-serializing it and converting it into a request that MaxCompute can recognize, then submitting it to MaxCompute for processing, and converting the response to a client-recognizable process after MaxCompute has finished processing it. The Thrift response conforms to the Hive interface specification, enabling the Hive ecosystem to interoperate with MaxCompute. In short, its function is to provide a way for these tools to interact with MaxCompute without modifying the Hive ecosystem, so that we can reuse existing Hive tools. MaxCompute’s powerful big data engine.

Deploying HiveServer2 Proxy

Before you install HiveServer2 Proxy, you need to install Java 1.7.0 or Java 1.8.0 in your Linux server. And download the HiveServer2 Proxy for ODPS from here http://repo.aliyun.com/download/apache-hive-2.1.0-odps-proxy.tar.gz.

[root@HS2Proxy apache-hive-2.1.0-odps-proxy]# cd /usr/lib/jvm/jre-1.8.0-openjdk
[root@HS2Proxy jre-1.8.0-openjdk]# export JAVA_HOME=$(pwd)
[root@HS2Proxy jre-1.8.0-openjdk]# echo $JAVA_HOME
/usr/lib/jvm/jre-1.8.0-openjdk
[root@HS2Proxy ~]# cd ~/apache-hive-2.1.0-odps-proxy
[root@HS2Proxy apache-hive-2.1.0-odps-proxy]# export HIVE_HOME=$(pwd)
[root@HS2Proxy apache-hive-2.1.0-odps-proxy]# echo $HIVE_HOME
/root/apache-hive-2.1.0-odps-proxy
[root@HS2Proxy apache-hive-2.1.0-odps-proxy]# export HADOOP_HOME=$(pwd)/hadoop
[root@HS2Proxy apache-hive-2.1.0-odps-proxy]# echo $HADOOP_HOME
/root/apache-hive-2.1.0-odps-proxy/hadoop
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.execution.engine</name>
<value>odps</value>
<description>Hive execution engine, here is odps by default, no need to modify</description>
</property>
<property>
<name>hive.session.impl.classname</name>
<value>org.apache.hive.service.cli.session.HiveSessionOdpsImpl</value>
<description>HiveSession's odps plugin, no need to modify</description>
</property>
<property>
<name>odps.accessid</name>
<value>customeraccessid</value>
<description>Please modify it to your accessid</description>
</property>
<property>
<name>odps.accesskey</name>
<value>customeraccesskey</value>
<description> Please modify it to your accesskey</description>
</property>
<property>
<name>odps.project</name>
<value>odpsdemo</value>
<description>Please change to your default project</description>
</property>
<property>
<name>odps.projects</name>
<value>odpsdemo</value>
<description> Please change it to your project list. If there are multiple, please separate them with a comma. This configuration will take effect in show schemas. </description>
</property>
<property>
<name>odps.endpoint</name>
<value>https://service.odps.aliyun.com/api</value>
<description>ODPS endpoint</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>20000</value>
<description>HiveServer2 Thrift Server starts the service port in binary mode, which can be modified as appropriate to avoid port conflicts.</description>
</property>
</configuration>
[root@HS2Proxy conf]# tail -f /tmp/$USER/hive.log
2019-02-12T11:58:48,891 INFO [main] service.AbstractService: Service:HiveServer2 is started.
2019-02-12T11:58:48,893 INFO [main] server.Server: jetty-7.6.0.v20120127
2019-02-12T11:58:48,958 INFO [main] webapp.WebInfConfiguration: Extract jar:file:/root/apache-hive-2.1.0-odps-proxy/lib/hive-service-2.1.0.jar!/hive-webapps/hiveserver2/ to /tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp
2019-02-12T11:58:49,068 INFO [Thread-7] thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 20000 with 5...500 worker threads
2019-02-12T11:58:49,109 INFO [main] handler.ContextHandler: started o.e.j.w.WebAppContext{/,file:/tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp/},jar:file:/root/apache-hive-2.1.0-odps-proxy/lib/hive-service-2.1.0.jar!/hive-webapps/hiveserver2
2019-02-12T11:58:49,147 INFO [main] handler.ContextHandler: started o.e.j.s.ServletContextHandler{/static,jar:file:/root/apache-hive-2.1.0-odps-proxy/lib/hive-service-2.1.0.jar!/hive-webapps/static}
2019-02-12T11:58:49,148 INFO [main] handler.ContextHandler: started o.e.j.s.ServletContextHandler{/logs,file:/tmp/root/}
2019-02-12T11:58:49,168 INFO [main] server.HiveServer2: Web UI has started on port 10002
2019-02-12T11:58:49,167 INFO [main] server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:10002
2019-02-12T11:58:49,168 INFO [main] http.HttpServer: Started HttpServer[hiveserver2] on port 10002

Connecting Tableau to HiveServer2 Proxy

If you don’t already have Tableau, you can download Tableau for a 14-day trial from https://www.tableau.com/products/desktop/download

Project Demo

I will use Cloudera Hadoop as a connector and use the settings described in the following sections. You can fill in any username and password, because HiveServer2 Proxy do not support authentication at this moment, and the actual authentication is by using accessId and accesskey. Click Sign In and connect to Proxy.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com