MaxCompute and DataWorks Security Management Guide: Examples
Join us at the Alibaba Cloud ACtivate Online Conference on March 5–6 to challenge assumptions, exchange ideas, and explore what is possible through digital transformation.
The article MaxCompute and DataWorks Security Management Guide: Basics describes the relevant security models of MaxCompute and DataWorks, the correlation between the two products, and various security actions. This article will provide some referential examples for security management members.
Project Creation Example
We have known the security models of MaxCompute and DataWorks and the relationship between permissions of these two products. This article uses two basic business requirements to describe how to create and manage a project.
Scenario 1: Collaborative Business Development for ETL Tasks
In a collaborative development scenario, responsibilities and tasks are clearly assigned to members, and the regular development, debugging, and publishing procedures are required. Production data must be strictly controlled.
- A DataWorks project itself can allow multiple members to perform collaborative development work.
- The basic member roles (project administrator, developer, maintainer, deployer, and guest) in DataWorks can basically ensure explicit duty assignment among members.
- The Development and Production projects created in DataWorks can be used to perform regular development, debugging, and publishing and implement strict production data control.
Step 1: Create a project. The main configuration is shown in the following screenshot:
Details about the configurations are described as follows:
- Select “Standard (separation of development and production)” for the project model. The standard model will bind one DataWorks project to two MaxCompute projects (a development project and a production development). Development and debugging are performed in a development environment. Tasks in a production environment should be published from a development environment by following the publishing procedure so that tasks can run smoothly in the production environment.
- In the development environment, select Personal Account for “MaxCompute Access Identity”. A project member should use a personal account to perform task development and debugging in the development environment. This is because each member will use different MaxCompute production resources (tables, resources, and functions) during the process of development and debugging. To prevent excessive permissions from being assigned to a member, each member only applies for permissions that they need (primarily the production table permissions). This also enables easier security audits in the future.
- In the production environment, select Project Owner Account (the account of a project owner) for “MaxCompute Access Identity”. A production environment should ensure production stability and security. Normally, members are not allowed to freely submit tasks, and personal accounts are not allowed to have the table deletion and modification permissions to avoid reduced control over operations performed in the command line tool. To obtain the read access to a production table, a personal account must apply for the read access in order to implement better data security management.
Step 2: Add a project member. Add a RAM user as a project member in DataWorks and assign roles as needed. At the same time, the corresponding development project will assign the corresponding role to the RAM user. For information about permissions that each role has, refer to the permission relationship between MaxCompute and DataWorks as described in the Basics article.
- Project administrator: In addition to having full permissions of the developer and the maintainer, the project administrator can also perform operations at the project level, such as adding/removing the project members, granting roles, and creating custom resource groups. Meanwhile, a project administrator also has the “role_project_admin” role for a MaxCompute development project.
- Developer: Manages the design of data development page and the maintenance of workflows. Meanwhile, a developer also has the “role_project_dev” role for a MaxCompute development project.
- Maintainer: Manages the running conditions of all tasks in the operation and maintenance center page and handles them accordingly. Meanwhile, a maintainer also has the “role_project_pe” role for a MaxCompute development project.
- Deployer: Reviews the task code and decide whether to submit it to the maintainer (only in multi-project mode). Meanwhile, a deployer also has the “role_project_deploy” role for a MaxCompute development project.
- Guest: A guest only has the read-only permission and can view the workflow design and code content on the data development page. Meanwhile, a guest also has the “role_project_guest” role of a MaxCompute development project.
- Security administrator: A security administrator only has permissions of the data security guard module. Meanwhile, a security administrator also has the “role_project_security” role for a MaxCompute development project.
Step 3: Develop and debug tasks. If a developer is performing task development and debugging in the DataWorks “Data Development” module (the development project in MaxCompute) and needs to use a table in the production project, the developer can apply for the required table in the DataWorks Data Management module.
Step 4: Publish tasks to the production environment. After debugging tasks, a developer role packs tasks. A maintainer can review code and then publish a package to the production environment. (The procedure from a developer to a maintainer should be notified offline.) This process ensures that tasks cannot be freely published to the production environment for running there.
Step 5: Developers test production tasks. After tasks are published to the production environment, developers are advised to perform a test on production tasks in the operation and maintenance center in order to ensure that production tasks can run as expected. If executed tasks return a success status, it is still necessary to view the log to check if tasks are performed as expected. To perform further verification, you must query whether normal output is present in the result table. Generally, this querying operation is performed in the development interface. Individuals need to apply for permissions on the output table in the production environment, which are not granted to individuals by default and can be requested in the “Data Management” module of DataWorks.
Pay attention to the following after the preceding configuration and operations are performed:
- Multiple members collaboratively perform development work in the DataWorks Data Development module, all members of a project can view task code, and members with the edit permission can edit and modify tasks. Therefore, highly sensitive core code cannot have a high confidentiality level. Currently, tasks and data that requires a high confidentiality level can be developed in separate projects by specific members.
- Because access to MaxCompute in the production environment is implemented by a project owner, the owner of the created tables, function, and resources is always the project owner’s account, and the following cases may appear: “I am not the owner of the table my task created” or “I don’t have permissions to view the table my task created”.
- Because the owner of the development project and the production project is the same account, be sure to write and read a production project table to a development project first by publishing tasks to a production project and then obtain production data by using the development project.
Scenario 2: Simple Project Ownership based on Table Creation
In this scenario, each member of a single project can only perform operations on the tables created by themselves. This scenario is common for smaller businesses where member roles are basically consistent and business won’t need scaling. For example, only data acquisition is required, that is, only querying and downloading business data without performing data development (for example, the operations roles need to obtain some data for analysis).
- If a project doesn’t perform data development, then data that needs to be analyzed must be located in other projects. Meanwhile, in order to avoid the resource isolation between different primary accounts, the owner of the project (primary account) must be the same as the owner of the data development production project.
- The project mainly focuses on querying and downloading data. Therefore, each member needs to use his or her own permission to query and download data. When configuring MaxCompute settings for this project, set “MaxCompute Access Identity” to “Personal Account”.
- When “MaxCompute Access Identity” is set to “Personal Account”, each project role in DataWorks will be granted the corresponding MaxCompute role permissions. However, each member can only perform operations on the tables they created. Therefore, this default role permission must be used properly.
Step 1: Create a project. Note that the primary account must be the primary account for the project where data to be analyzed is located. The project configuration is as follows:
Step 2: Create a custom MaxCompute role and grant permission to that role. Use the primary account and perform operations in the console:
create role custom_dev;-- Creates a custom role
grant List, CreateInstance,CreateTable,CreateFunction,CreateResource on project prj_name to role custom_dev;-- Grants permission to the custom role
Step 3: Set “Allow an object creator to have the default access” for the project in MaxCompute. Use the primary account and perform operations in the console:
set ObjectCreatorHasAccessPermission=true; -- By default, this flag is set to true. You can run the following command to check the configuration
You can also configure this flag under "Project Management" -> "MaxCompute Settings" in DataWorks.
Step 4: Add a project member. In DataWorks, add a RAM user as a new member. If a member is added as a “developer” role, the role of that member in the corresponding MaxCompute project is role_project_dev. Use the console command line to view permissions of the primary account:
show grants for ram$ primary account: RAM user;
Step 5: Modify a new member’s MaxCompute permissions Use the primary account and perform operations in the console:
revoke role_project_dev from ram$ primary account: RAM user;-- Removes the default role from a new member Note that if a member is re-granted a role in the DataWorks "Member Management" page, the corresponding MaxCompute role is also re-granted to that member.
grant custom_dev to ram$ primary account: RAM user;-- Grants a custom role to a new member
By now proper configuration has been made for this project that requires special permission management. In addition, pay attention to the following points:
- If members of this project are granted the developer role again as previously mentioned, these members will be also re-granted the “role_project_dev” role.
- The preceding project configuration can only allow each member to view tables (objects) they created, but cannot allow each member to only view tasks they created.
- If members of the project need the permission to query tables, they need to apply for the permission (in the DataWorks “Data Management” page) or you can add the tables in the production project to a package, install that package to the project and grant the permission to members. For more information, refer to the package authorization management section in the Basics article.
Other Common Scenarios
Package Authorization Scenario
In this example, business analyzers need to view production tables, but they are not allowed to view production task code. Business analyzers need access to partial tables of multiple production projects.
A separate project can be created to allow business analyzers to view production tables but not production tasks. We can create a package in multiple production projects, add shared tables to that package, install the package in the analysis project and grant the permission on the package to analyzers. This can reduce the member management cost by eliminating the need to add analyzers to all production projects, and ensure that analyzers can only view tables in the package in the analysis project.
To do so, we can perform the following steps.
Create a package in production projects:
CREATE PACKAGE PACKAGE_NAME;
CREATE PACKAGE prj_prod2bi;
Add resources to be shared to the package in production projects:
ADD table TO PACKAGE [package name];
ADD table adl_test_table TO PACKAGE prj_prod2bi;
The production projects allow the analysis project to use the package:
ALLOW PROJECT [project allowed to install package] TO INSTALL PACKAGE [package name];
ALLOW PRJ_BI TO INSTALL PACKAGE prj_prod2bi;
The analysis project installs the package:
INSTALL PACKAGE [application name].[ package name];
INSTALL PACKAGE prj_prod.prj_prod2bi;
Grant the package permissions to users:
Grant the permission to users:
GRANT read on package prj_prod2bi TO USER [cloud account];
Grant the permission to roles:
GRANT read on package prj_prod2bi TO ROLE [rolename];
Data Security Self-Check Example
In the initial stage of a project, relatively little attention is given to user and permission management in order to speed up the progress of the project. When the project is in the stable development stage, data security becomes increasingly important. At this point, a self-check analysis of data security is required, and the generation and implementation of a plan is expected.
This example provides some data security adjustment ideas by showing the key adjustment aspects that a customer should focus on after performing a data security self-check.
Self-Check Principles and Recommendations
- Count the number of accounts. Count members of a DataWorks project and MaxCompute project users and make sure that each member only has one account for easy accountability and management.
- Count the number of inventory accounts and record the permissions. Count deprecated accounts and permissions: For a RAM user that already has roles in a MaxCompute or Dataworks project, revoke RAM user roles in the project and remove the RAM user from the project before deleting the RAM user. Otherwise, the RAM user will be residual in the project, displayed as “ p4_xxxxxxxxxxxxxxxxxxxx” and cannot be removed from the project (although it doesn’t influence project features). Revoke accounts and permissions that are deprecated due to changes in duties. It is recommended to delete unused accounts after verifying that they are not necessary and proper notifications are sent. Grant requested permissions when they are required and then revoke granted permissions when they are no longer needed.
- Survey and analyze personal accounts (You can open a ticket to push metadata for analysis and statistics). Find the queries submitted from personal accounts in the development phase within the recent three months (data retrieval and computing tasks submitted, primarily SQL tasks), count the top N users and select representative accounts to analyze their daily tasks. For example, members having the accounts are mainly working on algorithm development projects, and tasks required for their daily work are mainly SQL tasks. The SQL queries that they run are mainly development environment queries and table writing operations. There are also algorithm tasks and MR tasks, but the number of these tasks is relatively small when compared to SQL tasks. This is also compliant with actual development cases — SQL tasks are preferred for processing data where possible. Let’s look at another example: An account submits a large number of tasks because the ak of the account configures query software by using the adk so that multiple users can use this account to perform queries. The case where multiple users share one account should be adjusted.
- Count data downloads (You can open a ticket to push metadata for analysis and statistics). Count the requested data download request tasks in each project, and analyze and plan downloadable projects.
- Accounts and new and proper assignment of accounts. Adjustment principle: Each member uses his or her own account. Grant proper data access permissions depending on the business development groups and roles that members belong to, and disallow a member to user another member’s account. Avoid data security risk caused by excessive user permissions. For example, you can assign accounts according to the business groups in the data development process. Business groups may include the management group, the data integration group, the data model group, the algorithm group, the analysis group, the maintenance group, and the security group.
- Data flow control. Limit the export of data in some projects and control some members’ permissions. Free data flowing among projects will cause chaotic cloud platform data architecture as well as the risk of data leakage. Therefore, data flows are limited for most projects. For example, use MaxCompute to limit data flowing to specified projects or locations to avoid the risk of unauthorized data flows.
- Limit the export of data. Once exported from MaxCompute as files, data is no longer controllable. Therefore, reduce the risk of exporting data as files as much as possible. Assign specific user roles to limit some specified business groups to having permission to export data. This doesn’t influence daily development work.