By Geng Jiangtao.
Spark on MaxCompute is different to Spark’s native architecture. So, in this post, we’re going to show you how you can get Spark on MaxCompute set up elsewhere, in particular, how you can set up Spark on an Alibaba Cloud ECS server, in DataWorks, and in a local IDEA test environment.
The underlying architecture built for Spark on Alibaba Cloud is different to its native architecture, but offers full native support. Consider the two diagrams below.
The above diagrams show how Sparks works both in a native architecture and how it works on Alibaba Cloud. The diagram on the left shows the native Spark architecture, and the one on the right shows the architecture used for Spark on MaxCompute, which is a solution that runs on the Cupid platform. This architecture allows MaxCompute to provide Spark computing services and enable the Spark computing framework to be provided on a unified computing resource and dataset permission system.
Setting up Spark on an Alibaba Cloud ECS Server
Next, you’ll need to decompress the file.
tar -zxvf spark-2.3.0-odps0.30.0.tar.gz
Spark-default.conf in the file and configure it.
spark.hadoop.odps.access.key =# 其他的配置保持自带值一般就可以了
spark.hadoop.odps.end.point = http://service.cn.maxcompute.aliyun.com/api
spark.hadoop.odps.runtime.end.point = http://service.cn.maxcompute.aliyun-inc.com/api
spark.hadoop.odps.task.major.version = cupid_v2
spark.hadoop.odps.cupid.container.image.enable = true
spark.hadoop.odps.cupid.container.vm.engine.type = hyper
Download the corresponding code from GitHub, which you can find here. Then, take that code and upload it to your ECS server and decompress it.
You’re want to compress the code again. Compress the code into a JAR package. Before you do this, however, ensure that Maven is installed.
mvn clean package
Now check that this last operation was successful. Go to view and run the JAR package.
bin/spark-submit --master yarn-cluster --class com.aliyun.odps.spark.examples.SparkPi \
Setting up Spark in DataWorks
Log on to the DataWorks console and click Business Flow.
Open a business flow and create an ODPS Spark node.
Upload JAR package resources. You can select the JAR package to be uploaded and submit it.
Configure the corresponding ODPS Spark node. Save and submit the configuration. Click Run to view its running status.
Setting up Spark in a Local IDEA Test Environment
Download and decompress the client and template code. You can find the client here.
You’ll also need the template code, which you can find on GitHub. After you’ve got the code, open IDEA and choose File and then Open… to select the template code.
Install the Scala plugin.
And last configure the JDK and related dependencies.