MaxCompute and DataWorks Security Management Guide: Examples

Project Creation Example

We have known the security models of MaxCompute and DataWorks and the relationship between permissions of these two products. This article uses two basic business requirements to describe how to create and manage a project.

Scenario 1: Collaborative Business Development for ETL Tasks

In a collaborative development scenario, responsibilities and tasks are clearly assigned to members, and the regular development, debugging, and publishing procedures are required. Production data must be strictly controlled.

  • A DataWorks project itself can allow multiple members to perform collaborative development work.
  • The basic member roles (project administrator, developer, maintainer, deployer, and guest) in DataWorks can basically ensure explicit duty assignment among members.
  • The Development and Production projects created in DataWorks can be used to perform regular development, debugging, and publishing and implement strict production data control.

Implementation Steps

Step 1: Create a project. The main configuration is shown in the following screenshot:

  • Select “Standard (separation of development and production)” for the project model. The standard model will bind one DataWorks project to two MaxCompute projects (a development project and a production development). Development and debugging are performed in a development environment. Tasks in a production environment should be published from a development environment by following the publishing procedure so that tasks can run smoothly in the production environment.
  • In the development environment, select Personal Account for “MaxCompute Access Identity”. A project member should use a personal account to perform task development and debugging in the development environment. This is because each member will use different MaxCompute production resources (tables, resources, and functions) during the process of development and debugging. To prevent excessive permissions from being assigned to a member, each member only applies for permissions that they need (primarily the production table permissions). This also enables easier security audits in the future.
  • In the production environment, select Project Owner Account (the account of a project owner) for “MaxCompute Access Identity”. A production environment should ensure production stability and security. Normally, members are not allowed to freely submit tasks, and personal accounts are not allowed to have the table deletion and modification permissions to avoid reduced control over operations performed in the command line tool. To obtain the read access to a production table, a personal account must apply for the read access in order to implement better data security management.
  • Project administrator: In addition to having full permissions of the developer and the maintainer, the project administrator can also perform operations at the project level, such as adding/removing the project members, granting roles, and creating custom resource groups. Meanwhile, a project administrator also has the “role_project_admin” role for a MaxCompute development project.
  • Developer: Manages the design of data development page and the maintenance of workflows. Meanwhile, a developer also has the “role_project_dev” role for a MaxCompute development project.
  • Maintainer: Manages the running conditions of all tasks in the operation and maintenance center page and handles them accordingly. Meanwhile, a maintainer also has the “role_project_pe” role for a MaxCompute development project.
  • Deployer: Reviews the task code and decide whether to submit it to the maintainer (only in multi-project mode). Meanwhile, a deployer also has the “role_project_deploy” role for a MaxCompute development project.
  • Guest: A guest only has the read-only permission and can view the workflow design and code content on the data development page. Meanwhile, a guest also has the “role_project_guest” role of a MaxCompute development project.
  • Security administrator: A security administrator only has permissions of the data security guard module. Meanwhile, a security administrator also has the “role_project_security” role for a MaxCompute development project.
  • Multiple members collaboratively perform development work in the DataWorks Data Development module, all members of a project can view task code, and members with the edit permission can edit and modify tasks. Therefore, highly sensitive core code cannot have a high confidentiality level. Currently, tasks and data that requires a high confidentiality level can be developed in separate projects by specific members.
  • Because access to MaxCompute in the production environment is implemented by a project owner, the owner of the created tables, function, and resources is always the project owner’s account, and the following cases may appear: “I am not the owner of the table my task created” or “I don’t have permissions to view the table my task created”.
  • Because the owner of the development project and the production project is the same account, be sure to write and read a production project table to a development project first by publishing tasks to a production project and then obtain production data by using the development project.

Scenario 2: Simple Project Ownership based on Table Creation

In this scenario, each member of a single project can only perform operations on the tables created by themselves. This scenario is common for smaller businesses where member roles are basically consistent and business won’t need scaling. For example, only data acquisition is required, that is, only querying and downloading business data without performing data development (for example, the operations roles need to obtain some data for analysis).

  • If a project doesn’t perform data development, then data that needs to be analyzed must be located in other projects. Meanwhile, in order to avoid the resource isolation between different primary accounts, the owner of the project (primary account) must be the same as the owner of the data development production project.
  • The project mainly focuses on querying and downloading data. Therefore, each member needs to use his or her own permission to query and download data. When configuring MaxCompute settings for this project, set “MaxCompute Access Identity” to “Personal Account”.
  • When “MaxCompute Access Identity” is set to “Personal Account”, each project role in DataWorks will be granted the corresponding MaxCompute role permissions. However, each member can only perform operations on the tables they created. Therefore, this default role permission must be used properly.

Implementation Steps

Step 1: Create a project. Note that the primary account must be the primary account for the project where data to be analyzed is located. The project configuration is as follows:

create role custom_dev;-- Creates a custom role
grant List, CreateInstance,CreateTable,CreateFunction,CreateResource on project prj_name to role custom_dev;-- Grants permission to the custom role
set ObjectCreatorHasAccessPermission=true;    -- By default, this flag is set to true. You can run the following command to check the configuration
show SecurityConfiguration;
You can also configure this flag under "Project Management" -> "MaxCompute Settings" in DataWorks.
show grants for ram$ primary account: RAM user;
revoke role_project_dev from ram$ primary account: RAM user;-- Removes the default role from a new member Note that if a member is re-granted a role in the DataWorks "Member Management" page, the corresponding MaxCompute role is also re-granted to that member.
grant custom_dev to ram$ primary account: RAM user;-- Grants a custom role to a new member
  • If members of this project are granted the developer role again as previously mentioned, these members will be also re-granted the “role_project_dev” role.
  • The preceding project configuration can only allow each member to view tables (objects) they created, but cannot allow each member to only view tasks they created.
  • If members of the project need the permission to query tables, they need to apply for the permission (in the DataWorks “Data Management” page) or you can add the tables in the production project to a package, install that package to the project and grant the permission to members. For more information, refer to the package authorization management section in the Basics article.

Other Common Scenarios

Package Authorization Scenario

In this example, business analyzers need to view production tables, but they are not allowed to view production task code. Business analyzers need access to partial tables of multiple production projects.

CREATE PACKAGE PACKAGE_NAME;
Example:
CREATE PACKAGE prj_prod2bi;
ADD table TO PACKAGE [package name]; 
Example:
ADD table adl_test_table TO PACKAGE prj_prod2bi;
ALLOW PROJECT [project allowed to install package] TO INSTALL PACKAGE [package name];
Example:
ALLOW PRJ_BI TO INSTALL PACKAGE prj_prod2bi;
INSTALL PACKAGE [application name].[ package name]; 
Example:
INSTALL PACKAGE prj_prod.prj_prod2bi;
Grant the permission to users:
GRANT read on package prj_prod2bi TO USER [cloud account];
Grant the permission to roles:
GRANT read on package prj_prod2bi TO ROLE [rolename];

Data Security Self-Check Example

In the initial stage of a project, relatively little attention is given to user and permission management in order to speed up the progress of the project. When the project is in the stable development stage, data security becomes increasingly important. At this point, a self-check analysis of data security is required, and the generation and implementation of a plan is expected.

  • Count the number of accounts. Count members of a DataWorks project and MaxCompute project users and make sure that each member only has one account for easy accountability and management.
  • Count the number of inventory accounts and record the permissions. Count deprecated accounts and permissions: For a RAM user that already has roles in a MaxCompute or Dataworks project, revoke RAM user roles in the project and remove the RAM user from the project before deleting the RAM user. Otherwise, the RAM user will be residual in the project, displayed as “ p4_xxxxxxxxxxxxxxxxxxxx” and cannot be removed from the project (although it doesn’t influence project features). Revoke accounts and permissions that are deprecated due to changes in duties. It is recommended to delete unused accounts after verifying that they are not necessary and proper notifications are sent. Grant requested permissions when they are required and then revoke granted permissions when they are no longer needed.
  • Survey and analyze personal accounts (You can open a ticket to push metadata for analysis and statistics). Find the queries submitted from personal accounts in the development phase within the recent three months (data retrieval and computing tasks submitted, primarily SQL tasks), count the top N users and select representative accounts to analyze their daily tasks. For example, members having the accounts are mainly working on algorithm development projects, and tasks required for their daily work are mainly SQL tasks. The SQL queries that they run are mainly development environment queries and table writing operations. There are also algorithm tasks and MR tasks, but the number of these tasks is relatively small when compared to SQL tasks. This is also compliant with actual development cases — SQL tasks are preferred for processing data where possible. Let’s look at another example: An account submits a large number of tasks because the ak of the account configures query software by using the adk so that multiple users can use this account to perform queries. The case where multiple users share one account should be adjusted.
  • Count data downloads (You can open a ticket to push metadata for analysis and statistics). Count the requested data download request tasks in each project, and analyze and plan downloadable projects.
  • Accounts and new and proper assignment of accounts. Adjustment principle: Each member uses his or her own account. Grant proper data access permissions depending on the business development groups and roles that members belong to, and disallow a member to user another member’s account. Avoid data security risk caused by excessive user permissions. For example, you can assign accounts according to the business groups in the data development process. Business groups may include the management group, the data integration group, the data model group, the algorithm group, the analysis group, the maintenance group, and the security group.
  • Data flow control. Limit the export of data in some projects and control some members’ permissions. Free data flowing among projects will cause chaotic cloud platform data architecture as well as the risk of data leakage. Therefore, data flows are limited for most projects. For example, use MaxCompute to limit data flowing to specified projects or locations to avoid the risk of unauthorized data flows.
  • Limit the export of data. Once exported from MaxCompute as files, data is no longer controllable. Therefore, reduce the risk of exporting data as files as much as possible. Assign specific user roles to limit some specified business groups to having permission to export data. This doesn’t influence daily development work.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com