The Internet is never a safe place. Most of the time, we rely too much on firewalls to contain security problems. Unfortunately, reliance on the firewall assumes that attacks always come from the outside while the truly destructive attacks often come from the inside.
In recent years, websites such as The Hacker News have reported on widespread attacks and ransoms due to data security problems. Versions earlier than Hadoop 1.0.0 provided no security support and assumed that any roles in a cluster are trusted. As a result, user access is not authenticated and malicious users can easily access clusters by means of masquerading.
To ensure the security of Hadoop clusters, user authentication and authorization must be implemented. To address this, the following common solutions are developed:
- Authentication: MIT Kerberos, Azure AD, Kerby
- Authorization: Apache Sentry (Cloudera), Apache Ranger (Hortonworks)
Hadoop Cluster support for Kerberos
After Hadoop 1. 0. 0 was released in 2012, Hadoop started to support Kerberos to ensure that the nodes in a cluster are trustworthy.
Before the cluster is deployed, Kerberos stores the authentication key on a trusted node. When the cluster runs, the nodes in the cluster are authenticated by the key, and only the successfully authenticated nodes can be used to provide services. Impersonated nodes cannot communicate with any nodes in the cluster because they do not carry the key information in advance. This prevents the malicious utilization of or tampering with the Hadoop cluster, ensuring its trustworthiness and security.
Introduction to Kerberos
Kerberos is a network authentication protocol that was designed to protect network servers in Athena projects. The name “Kerberos” is the name of the three-headed dog from Greek mythology. As its name implies, it provides strong authentication for the client-to-server access sequence by using key encryption technology. Kerberos can prevent eavesdropping and can replay attacks for data integrity. It is a system that uses symmetric key algorithms to manage keys. Kerberos-based products also use the public key encryption method for authentication.
So far, the latest version of Kerberos is V5, and the V1 to V3 versions are only available within MIT because DES encryption is used. In its early development, Kerberos was classified as military arms by the U.S. Export Controls and its export was banned until the Royal Swedish Institute of Engineering released Kerberos V4, namely KTH-KRB. Later, this team released V5 (Heimdal), which is one of the most common implementations of Kerberos V5.
The Kerberos V5 implementation version mentioned in this document refers to MIT Kerberos, which is updated regularly on a six-month basis. Presently, the latest version of MIT Kerberos is version 1.16.2, which was released on November 1, 2018.
Terms and Abbreviations
Kerberos some common terms including:
- Authentication Server (AS)
- Key Distribution Center (KDC)
- Ticket Granting Ticket (TGT): indicates the ticket of a ticket.
- Ticket Granting Server (TGS)
- Service Server (SS): indicates the specific service provider.
- Principal: indicates the authenticated individual.
- Ticket: indicates the ticket that is used by the client for authentication. Contains the user name, IP address, timestamp, validity period, and session key.
If Kerberos is used, a client can be served by going through these steps:
- Authentication: The client sends a message to the AS to obtain a TGT containing the timestamp.
- Authorization: The client uses the TGT to request the ticket of the specified service from the TGS.
- Service request: The client presents the service ticket to the specified service for communication authorization.
The Kerberos protocol belongs to the display layer of the network communication protocol. The communication process of Kerberos is as follows: First, the accessing user uses the shared key to obtain an identity certificate from an AS. Then, the user uses the identity certificate, but not the shared key, to communicate with the SS.
Detailed Communication Process
This process uses symmetric encryption as the encryption method and occurs in a Kerberos realm. The lower-case letters c, d, e, and g indicate the messages sent by the client, while the upper-case letters A, B, E, F, and H are the messages returned by each server.
Client Authentication (Kinit)
The client retrieves the TGT from the AS.
First, the user must log on to the client in either of the following ways:
- Entering the user ID and password or using the keytab function.
- The client runs a one-way function (normally the Hash function) that converts the password into the user’s secret key on the client.
- The client sends a plaintext message to the AS to request a service, for example, “User Sunny wants to request a service” (in this case, Sunny is the user ID). Note: The user does not have to send the secret key or password to the AS. Rather, the AS can retrieve the user’s password from the local database and convert it to the secret key for the same user.
- The AS checks whether the user ID exists in the local database. If it does, the local database returns two messages:
- [Message A]: Contains the session key between the client and the TGS (which is used for future communication between the client and the TGS), which is encrypted by the user’s secret key.
- [Message B]: Contains the TGT (which includes the session key between the client and the TGS in message A as well as the user ID, user URL, and TGT validity period), which is encrypted by the TGS’s secret key.
- Upon receiving both messages, the client first tries to use the user’s secret key to decrypt message A. If the password entered by the user does not match that in the AS database, the decryption fails. Message A can be decrypted only after the correct password is entered, and it uses the generated user’s secret key for decryption. After message A is decrypted, the session key between the client and the TGS can be retrieved. Note that the client cannot decrypt message B because this message is encrypted by the TGS’s secret key. With the session key, the client can then be authenticated by the TGS.
The client obtains the ticket from the TGS, namely the client-to-server ticket.
- When the client needs to request a service, it sends the following messages to the TGS:
- [Message c]: Contains the content of message B (namely the TGT encrypted by the TGS’s secret key) and the ID of the requested service (rather than the user ID).
- [Message d]: Contains the Authenticator (which includes the user ID and timestamp), which is encrypted by the session key between the client and the TGS.
- Upon receiving both messages, the TGS first checks whether the requested service exists in the KDC database. If it does, the TGS uses its own secret key to decrypt message B (namely the TGT) contained in message c to retrieve the previously generated session key. Then, the TGS uses this session key to decrypt message d to obtain the authenticator that contains the user ID and timestamp, and verifies the TGT and authenticator.
- After the verification succeeds, the TGS returns the following messages:
- [Message E]: Contains the client-to-server ticket (which includes the session key between the client and the SS as well as the user ID, user URL, and validity period), which is encrypted by the service’s secret key of the server that provides the service.
- [Message F]: Contains the session key between the client and the server (which is used for future communication between the client and the server service), which is encrypted by the session key between the client and the TGS.
- Upon receiving both messages, the client decrypts message F with the session key between the client and the TGS to obtain the session key between the client and the server. Note that the client cannot decrypt message E because this message is encrypted by the service’s secret key.
The client retrieves the service from the SS.
- After retrieving the session key between the client and the server, the client can use the service provided by the server. In this case, the client sends two messages to the specified SS:
- [Message E]: Contains the “client-to-server ticket” in message E as mentioned in the previous step, which is encrypted by the service’s secret key.
- [Message G]: Contains the new authenticator (which includes the user ID and timestamp), which is encrypted by the session key between the client and the SS.
- The SS uses its own service’s secret key to decrypt message E and obtain the session key between the client and the SS that was provided by the TGS. Then, the SS uses this session key to decrypt message G and obtain the authenticator. Meanwhile, the SS verifies the ticket and the authenticator as what the TGS did.
- If the verification succeeds, the SS returns a message (Confirmation letter: The identity has been successfully authenticated. The SS will provide the service for you).
- [Message H]: Contains the new timestamp (namely the timestamp sent by the client plus one; however, this mechanism has been discarded in V5), which is encrypted by the session key between the client and the SS.
- The client decrypts message H with the session key between the client and the SS to obtain the new timestamp and check whether it is correct. If the new timestamp is correct, the client trusts the server and sends a service request to the SS.
- The SS provides the corresponding service to the client.
HA Architecture of Kerberos
Kerberos supports two server redundancy modes in a realm: Master/Slave (MIT and Heimdal) and Multi-master modes (Windows Active Directory). If Kerberos is deployed in a production environment, we recommended that you use the single-master-and-multi-slave mode to ensure the high availability (HA) of Kerberos services.
Each KDC in Kerberos contains a copy of the database. The master KDC contains a writeable copy of the realm database, which is copied to the slave KDC at a fixed interval. All database changes such as password changes are made on the master KDC. When the master KDC becomes unavailable, the slave KDC provides a Kerberos ticket for service authorization rather than managing the database. An administrator is required to perform routine management tasks on KDCs.
The synchronization mechanism of Kerberos only replicates the contents of the primary database but does not pass the configuration files. Therefore, you must manually copy the following files to each slave KDC:
- master key stash file
At present, the most common HA solution for a single IDC is the combination of Keepalived and Rsync. Keepalived can build multiple stateful points into an HA service by using the virtual IP (VIP) failover method.
To use this method, first create a dump file for the database in the master KDC to dump the current Kerberos and KADM5 databases to ASCII files.
kdb5_util dump [-b7|-ov|-r13] [-verbose] [-mkey_convert] [-new_mkey_file mkey_file] [-rev] [-recurse] [filename [principals...]]
Then, use Rsync to synchronize the directory where the dump file is located to the corresponding directory on the slave machine before importing the directory to the KDC.
kdb5_util load [-b7|-ov|-r13] [-hash] [-verbose] [-update] filename [dbname]
All Hadoop requests use the KDC by requesting the domain name of the internal network and resolving it to the VIP bound to Keepalived:
Optimization and Prospects
User (Principal) Management
If the team already has a permissions system, it is difficult to integrate the existing identification system with Kerberos.
With the rapid growth of businesses and the increasing sizes of servers, the manual operations performed by a Kerberos principal (including addition, deletion, modification, and querying) become more frequent and troublesome. To address this, you need to standardize the principal requesting, maintenance, deletion, and keytab generation processes in the Kerberos management system. In addition, principal requesting and permissions management should be automated.
Data Synchronization Optimization
During Kerberos data synchronization, you can synchronize the generated data records to MySQL by using the MySQL dual-master synchronization method. In the cross-IDC scenario, you can use the Rsync tool to synchronize incremental KDC data. The Rsync server uses the keepalived VIP method in which the core IDC (A) serves as the active IDC. When the Kerberos host becomes faulty, to ensure the high availability of KDC data, the VIP migrates to another KDC host and the Rsync client synchronizes data with this KDC host, which now serves as the Rsync server.
Process management tools are used for the aliveness monitoring of Kerberos-related processes. When an unexpected process exit is detected, an email, WeChat, or DingTalk alarm is sent for actively restoring the process.
Deploying Kerberos in Hadoop clusters is tedious. Essentially, Kerberos is a protocol or secure channel. For most users, it is complicated to fully understand it. In this context, is there any better implementation that can free common users from these annoying details?
Reportedly, the Hadoop Authentication Service (HAS) co-developed by Alibaba and Intel has been applied to ApsaraDB for HBase2.0.
The HAS solution replaces the MIT Kerberos service with Kerby and uses the HAS plug-in authentication method to build an account and password system that is familiar for most users.
Currently, HAS is under development in the Apache Kerby project branch has-project and will be included as a new Kerby feature in the next release.
As a sub-project of Apache Directory, Apache Kerby is not drawing too much attention but has the potential for future success.