Data Security Best Practices for MaxCompute

Image for post
Image for post

By Alibaba Cloud MaxCompute

As an enterprise-class cloud data warehousing solution adopting an SaaS-based model, Alibaba Cloud MaxCompute ensures continuous business and data security for its customers. MaxCompute recently upgraded its comprehensive security capabilities. This article describes the best practices based on the native and integrated security capabilities of MaxCompute and DataWorks in typical data risk scenarios, such as data misuse, abuse, breaches, and loss throughout the data lifecycle.

What Is MaxCompute?

MaxCompute draws strength from Alibaba Cloud’s large-scale computing and storage resources and provides a fully managed online data warehousing service through a serverless architecture. It breaks the limitations on resource scalability and elasticity, which are common on traditional data platforms, and minimizes investment in operations and maintenance (O&M)

MaxCompute supports a wide range of classic computing models, such as batch processing, machine learning, and interactive analytics, and offers comprehensive enterprise management functionality. MaxCompute allows you to easily integrate and manage enterprise data assets and streamlines the data platform architecture for faster mining of the value of data.

MaxCompute Upgrades Enterprise-level Security

  • Fine-grained authorization
  • Data encryption (Bring Your Own Key [BYOK])
  • Data masking (Data Security Guard)
  • Continuous backup and recovery
  • Cross-region disaster backup and recovery
  • Real-time audit logs

MaxCompute’s Security System

Image for post
Image for post
Figure 1: Security system of a big data platform

1) Infrastructure security and platform trustworthiness ensure the physical safety and network security of data centers. Countermeasures against risks at this level primarily include enhancing data center security facilities, data center security management, and data center network security.
2) System security of big data platforms. Countermeasures against risk at this level primarily include building subsystems such as access control, security isolation, risk control and audit, and data protection subsystems, and providing underlying platform-based capabilities for upper-level security applications or tools.
3) Security of data applications. Countermeasures against risks at this level primarily include providing users with tool-based data security products, optimizing the user experience, and helping users better cope with data risks.

The recent upgrade of MaxCompute’s security capabilities has introduced new features to access control, risk control and auditing, data protection, and other subsystems, as highlighted in yellow in the “Big data platform security” layer in Figure 1. In this article, we will introduce the best practices for major types of data risks, as shown in Figure 2. We will explain when, why, and how to use these new features in these best practices.

Image for post
Image for post
Figure 2: Major types of data risks

Countermeasures against Data Misuse

MaxCompute Provides Basic Metadata

Data Maps As a Data Management Tool

Image for post
Image for post
Figure 3: Understand data using a data map

Countermeasures against Data Abuse

  • Graded data management: categorize and grade data for management based on LabelSecurity of MaxCompute.
  • Authorization approval process: implement the least privilege principle for requests to access or use data based on MaxCompute’s column-level permission management.
  • Regular audit: analyze permission requests, approvals, and usage to perform pre-event approval and post-event auditing.
  • Timely cleaning: clean expired permissions in a timely manner to reduce data risks.

MaxCompute’s fine-grained permission system, if used with DataWorks or other GUI-based tools, can implement the best practice of the least privilege to reduce data abuse risks.

(New) MaxCompute’s fine-grained permission system offers refined permission management

Regardless of the access control mechanism you choose, three elements remain the same during the authorization and authentication: action, object, and subject, as shown in the following figure.

The new security capabilities that MaxCompute released in this upgrade also include an upgrade to the permission model to support finer-grained authorization and authentication for refined permission management. Main new features include:

  • Column-level permission management for ACLs to support conditions and authorization validity periods
  • Refined in-package resource permission management to support column-level permission management
  • Independent permission management for data downloads in batch download scenarios that have higher risks
  • Graded authorization management for administrator roles, with a built-in super administrator role to share the management workload of project owners
  • Improved RBAC, with LabelSecurity for roles supported
  • Enhanced permission management capabilities for applications
Image for post
Image for post
Figure 4: MaxCompute’s fine-grained permission system

Fine-grained permission management capabilities in this release are highlighted in orange.

GUI-based Permission Management with Security Center

Image for post
Image for post
Figure 5: GUI-based permission management with Security Center

The Security Center provides convenient permission management and visualized request and approval processes, in addition to permission auditing and management capabilities.

  • Self-help permission requests: You can select desired data tables or fields and quickly request permissions for them online.
  • Permission audit and revocation: Administrators can view, audit, and manage the data permissions granted to users. Users can also take the initiative to request the revocation of permissions that are no longer needed.
  • Permission approval management: The Security Center adopts an online approval and authorization mode that is visualized and process-based and supports post-event traceability of approval processes.

Countermeasures against Data Breaches

Data Lifecycle

Image for post
Image for post
Figure 6: Data lifecycle

Data breaches may occur in multiple stages of the data lifecycle, such as data transmission, storage, processing, and exchange. Therefore, we introduce the best practices to defend against data breaches at different stages of the data lifecycle.

First, data is collected from different channels and transferred to the big data platform through various channels. On the big data platform, data may be calculated and then written to disks for storage, be transferred between different tenants and services following a data sharing mechanism, or, after a certain period of time, be deleted and destroyed. Processed data is consumed by other data applications or users through different channels. (See Figure 7.)

Image for post
Image for post
Figure 7: Data lifecycle on a big data platform

(New) Countermeasure Against Data Breach Risks in Storage — Data Encryption (Storage Encryption)

In this upgrade, a storage encryption feature was released for MaxCompute to support encrypting data disks.

  • MaxCompute integrates the key management system KMS to safeguard the security of keys. KMS supports service-based keys and user-defined keys (BYOK).
  • You can enable the storage encryption feature when you create a MaxCompute project. If you are already using MaxCompute services, you can open a ticket to apply to enable this feature.
  • Encryption algorithms such as AES-256 and cryptographic algorithms recommended by Chinese authorities are supported.
  • This data encryption process is transparent to users, and no additional changes are introduced to any type of task.

Countermeasure against Data Breaches in Data Processing — MaxCompute’s Security Isolation Capability

MaxCompute creates an independent, isolated environment for executing data processing applications and supports all user-defined function (UDF) types, Java and Python UDFs, and open-source third-party computing engines such as Spark, Flink, and Tensorflow, enabling diversified data processing capabilities.

Image for post
Image for post
Figure 8: MaxCompute’s security isolation capabilities

Countermeasure against Data Breach Risks in Data Exchanges or Sharing — MaxCompute’s Data Isolation and Permission System

  • Secure isolation of multi-tenant data: MaxCompute supports multi-tenant scenarios where user data is stored in isolation in a distributed file system to enable multi-user collaboration and sharing without compromising data security. This achieves true multi-tenant resource isolation.
  • Project data isolation and sharing under the same tenant: A certain extent of data isolation and data sharing among different projects under the same tenant are common. The project-based protection mechanisms enable inter-project data isolation and security. In addition, the package mode allows you to share data and resources across projects with greater security and convenience. As described in the “MaxCompute’s fine-grained permission system offers refined permission management” section above, this upgrade of security capabilities has added fine-grained permission management for package data and resources, which enhances package data sharing and protection capabilities.
  • (New) Application-side data access control: A signature mechanism has been added to MaxCompute accessing applications to enhance the management over application-side access control. For example, you can only allow specific applications to execute authorization statements, which prevents illegal data authorization from taking advantage of APIs or non-compliant applications.
Image for post
Image for post
Figure 9: MaxCompute’s data isolation capabilities

(New) Sensitive Data Protection in the Data Lifecycle

  • Data classification and grading: MaxCompute’s LabelSecurity feature enables fine-grained permission management of data by classifying and grading data for access and use to ensure data security.
  • (New) Data masking: With the help of the MaxCompute platform’s UDFs, data masking implementations or applications based on security industry practices can mask any sensitive data in client outputs. Data masking implementations can also be used in concert with data classification and grading to enable different masking implementations for data in different classes or at different levels.
Image for post
Image for post
Figure 10: Protection of sensitive data

(New) Protect Sensitive Data with Data Security Guard

For more information about the service and its usage, see the Data Security Guard documentation.

Image for post
Image for post
Figure 11: Sensitive data protection tool — Data Security Guard

Countermeasures Against Data Loss

(New) MaxCompute’s Backup and Recovery

MaxCompute recently released continuous backup and recovery capabilities. The system automatically backs up and retains the data before a deletion or modification action is performed for a specific period of time. Within this period of time, you can recover the data quickly to prevent data loss due to incorrect operations.

Image for post
Image for post
Figure 12: MaxCompute’s continuous backup and recovery capabilities

(New) MaxCompute’s Geo-disaster Recovery

After you specify a backup location for the backup cluster of a MaxCompute project, MaxCompute can automatically implement data replication between the primary and backup clusters to ensure data consistency and achieve geo-disaster recovery. If a fault occurs, the MaxCompute project switches from the primary cluster to the backup cluster and uses the computing resources of the backup cluster to access the data in the backup cluster. In this way, the service is resumed and switched to the backup cluster.

Image for post
Image for post
Figure 13: MaxCompute’s geo-disaster recovery

Make Clever Use of Audits to Cope with Data Risks

MaxCompute provides comprehensive historical data and real-time logs.

  • Information Schema provides project metadata and historical usage data among other information. Privileges and History views can help you with data analysis and auditing for data permission usage and task execution.
  • (New) Real-time audit logs: MaxCompute keeps a full record of users’ actions such as DDL, authorization, and task execution events to meet the needs of real-time auditing and problem traceability and analysis.

You can build your own data risk control and audit systems based on Information Schema and real-time audit logs. Information Schema was released last year. Below, we will introduce the real-time audit log which is a new feature.

Not all users plan to build their own risk control and audit tools. Instead, they can use risk control and audit services in DataWorks for this purpose. With out-of-the-box services, there is no need to expend effort on secondary development, though customers enjoy a lower degree of customization.

(New) Real-time Audit Log

MaxCompute keeps a full record of users’ actions and pushes user behavior logs to Alibaba Cloud’s ActionTrail service. You can view and retrieve user behavior logs in ActionTrail and deliver the logs to a Log Service project or a specified Object Storage Service (OSS) bucket for the purposes of real-time auditing and event traceability and analysis.

ActionTrail supports auditing user behavior for instances, tables, functions, resources, users, roles, and privileges. For more information about this feature and its usage, see the Audit Log documentation.

Image for post
Image for post
Figure 14: MaxCompute’s audit log

Audit Tools in DataWorks

  • The Security Center described in the preceding section provides a permission auditing service.
  • Data Security Guard provides risk control and auditing services, as shown in Figure 15.
Image for post
Image for post
Figure 15: Risk control and auditing with Data Security Guard

Summary

Image for post
Image for post
Figure 16: Lifecycle-stage-specific data security practices on a big data platform

As a cloud data warehouse based on the SaaS model, MaxCompute boasts leading security capabilities and has passed multiple international, European, and Chinese security compliance certifications, including the internationally recognized ISO certification, SOC 1, 2, and 3 (SOC is short for System and Organization Control), Payment Card Industry Data Security Standard (PCI DSS), the C5 certification used in Europe, and Cybersecurity Multi-Level Protection Scheme 2.0 which is dominant in China. For more information about Alibaba Cloud’s security compliance certification system, see the Alibaba Cloud Trust Center — Certification of Compliance page. We welcome you to use MaxCompute to ensure enterprise-level big data security.

To learn more about Alibaba Cloud MaxCompute, visit https://www.alibabacloud.com/product/maxcompute

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store