Out-of-the-Box MaxCompute Data Security Solution
MaxCompute is a multi-tenant big data processing platform that supports project-based security configurations to meet tenants’ requirements for data security. Project owners can customize their external account support and authentication models to protect their project data.
Prevent Data from Being Downloaded Locally
Prevent Data Leakage or Local Downloads
The data protection mechanism is also known as project protection. You can enable this feature in the MaxCompute console to disable exporting data from the server end.
--Sets ProjectProtection to allow data import and prohibit data export.
--The default value of ProjectProtection is false.
You can use DataWorks to analyze data, and download the analysis result displayed on the IDE. In this case, choose Project Management > Project Configuration and enable Select Result Can Be Downloaded.
Then, you cannot click Download to download the data on the query result page of DataWorks.
Data Export Method with Data Protection Enabled
Assume that a user applies for permissions to export a table containing your project data after you set ProjectProtection to true, and you have confirmed that the table does not contain any sensitive data. To meet the user’s service requirement, you can use either of the following data export methods that MaxCompute offers:
Configure an exception policy as a project owner to specify exceptions for protected project data. The configuration is as follows (performed on the MaxCompute console):
SET ProjectProtection=true WITH EXCEPTION <policyFile>
Exception policy-based data protection differs from policy-based authorization, even though they use the same syntax. An exception policy describes an exception for project protection. That is, the ProjectProtection settings do not take effect for access that matches the exception policy.
ProjectProtection controls the direction of data flows but not data access. Data access is a prerequisite for controlling the data flow direction.
In addition, you can run the following statement to verify the exception policy settings:
show grants [for <username>] [on type <objectType>]
Set a trusted project. If project protection is enabled for your project and your project imports data from another project, you can then specify that project as a trusted project. Importing data to the trusted project is not considered as a violation of the ProjectProtection rule. Run the following commands in the MaxCompute console:
--Displays all trusted projects.
add trustedproject <projectname>;
--Adds a trusted project.
remove trustedproject <projectname>;
--Removes a trusted project.
IP Whitelist Control
MaxCompute supports project-based IP whitelist control.
- After an IP whitelist is configured for a project, only IP addresses (console or SDK outbound IP addresses) in the whitelist can be used to access the project.
- An IP whitelist takes effect five minutes after the configuration is completed.
- Add the IP address of your PC to the whitelist to avoid blocking your PC from accessing your project.
IP addresses in an IP whitelist can be in the following formats:
- IP address: for example, 184.108.40.206
- Subnet mask: for example, 100.116.0.0/16
- Network segment: for example, 220.127.116.11–18.104.22.168
For more information, see IP whitelist control.
The policy mechanism of MaxCompute allows you to control a user or role to access specific resources (such as tables and UDFs) using a certain IP address.
The preceding policy authorizes the firstname.lastname@example.org user to use the IP address 10.32.180.0/23 to access the prj1 project only before 23:59:59 of November 11, 2013 (Beijing time). The user can only perform the CreateInstance, CreateTable, and List actions. The user cannot delete any tables in the project.
Data Security Guard (Data Masking)
Data Security Guard is a DataWorks data security module that provides functions including data masking and security audit.
Sensitive data displayed on the DataWorks UI is masked using asterisks (*).
Note: The data masking function of Data Security Guard does not take effect for data downloaded using Tunnel commands through the console.
Fine-Grained Permission Control
Column-Based Access Control
The label-based security mechanism (LabelSecurity) of a project is disabled by default and can be enabled by the project owner if needed.
The project table user_profile contains the following sensitive data in 5 of its 100 columns: id_card, credit_card, mobile, user_addr, and birthday. The current DAC mechanism allows all users to perform the Select action on this table. To prevent all users except the administrator from accessing the sensitive data in these five columns, the project owner performs the following configurations:
set label 2 to table user_profile(mobile, user_addr, birthday);
--Sets the sensitivity levels of the mobile, user_addr, and birthday columns of the user_profile table to 2.
set label 3 to table user_profile(id_card, credit_card);
--Sets the sensitivity levels of the id_card and credit_card columns of the user_profile table to 3.
Based on service requirements, Alice, who is involved in this project, needs to access data in the mobile column of the user_profile table in a week. As the project owner, you can perform the following configurations:
GRANT LABEL 2 ON TABLE user_profile TO USER alice WITH EXP 7;
For more information about column-based access control, visit https://www.alibabacloud.com/help/doc-detail/34604.htm.
User-Defined Role Management Based on Role Policies
If predefined DataWorks roles such as data developers, O&M engineers, and administrators cannot meet your customization requirements, you can use ACL to create roles such as data analysts and ETL developers to adapt to your service logic. You can use role policies to perform refined role management; for example, grant these roles access permissions to tables with names starting with ods_, grant permissions with conditions, or grant Deny permissions.
- Grant access permissions to one group of objects, such as all functions or tables with names starting with taobao, at one time.
- Authorization based on conditions includes time-based access, access using specified IP addresses, and SQL-based access to a specified table (access through other tasks will be denied).
The policy code is as follows:
get policy --Reads the policy of the project.
put policy <policyFile> --Configures (overwrites) a project policy.
get policy on role <roleName> --Reads the policy for a project role.
put policy <policyFile> on role <roleName> --Configures (overwrites) a policy for the project role.
Log on to DataWorks and choose Project Management > MaxCompute Configuration > User-defined Role.
Perform the following operations:
- Click New, enter a role name, and select accounts (sub-account users) to be added to the role.
- Role authorization can be table or project based.
- For table-based role authorization, select target tables and select actions that can be performed for each table.
Note: Methods 1 and 2 are different in the authorization mode. A role policy grants access permissions to multiple objects, for example, tables with names starting with taobao_, whereas DataWorks requires object (table or project) selection and object-specific permission configuration.
JDBC 2.4 (Enhanced Data Security)
MaxCompute JDBC 2.4 offers enhanced data security. JDBC packages are available at https://github.com/aliyun/aliyun-odps-jdbc/releases.
Procedure of using JDBC to enhance data security:
- Download JDBC 2.4 (recommended).
- Set the JDBC URL. Typical methods to set the tunnel endpoint are described at jdbc:odps:http://service.cn.maxcompute.aliyun-inc.com/api?tunnelEndpoint=http://dt.cn-shanghai.maxcompute.aliyun-inc.com.
- For more information about regions where MaxCompute is available and corresponding tunnel endpoints, visit https://help.aliyun.com/document_detail/34951.html.
- Enable project protection without exception.
- SET ProjectProtection=true
- For more information, see chapter 1 “Prevent data from being downloaded locally.”
- Set an upper limit to the number of data entries returned.
- setproject READ_TABLE_MAX_ROW=1000
- Query data using JDBC. A maximum of 1,000 data entries can be returned.
Note: If you have enabled project protection and query data using JDBC of a version earlier than JDBC 2.4, an error will occur because no permissions are granted).
To learn more about Alibaba Cloud MaxCompute, visit www.alibabacloud.com/product/maxcompute