API Design Best Practices by Alibaba Researcher Gu Pu

16 min readMay 15, 2019

By Gu PU

APIs are the core of software systems. The complexity of software systems is the main factor that decides successful large-scale software systems. Complexity is not determined by a single aspect, but lots of small design considerations during the system design, especially during the API design. As John Ousterhout says, complexity is incremental [8]. The success of a system does not necessarily lie in some special features, but the accumulation of all small design efforts.

Let’s consider two questions: What is a good API design? How can we design good APIs?

API design is challenged by different factors in different scenarios. No universal design rules apply to all API design scenarios. Design rules and best practices in an API design scenario may not apply to another design scenario. Therefore, next we’ll try to provide some design suggestions and analyze in which scenarios these suggestions are applicable, so that we can establish some best practices accordingly.

Scope

This article mainly describes general API design and especially applies to the design of remote call APIs (RPC APIs or HTTP/RESTful APIs). However, problems specific to RESTful APIs are not discussed in this article.

In addition, this article assumes that a client interacts directly with the APIs of a remote server. At Alibaba, for several reasons, client SDKs are usually used to indirectly access remote services. In this article, we will not discuss the special issues resulting from using SDKs. Instead, we will assess the methods provided by SDKs as the proxy of the remote APIs. It is still applicable to the content of this article.

API Design Rule: What Is a Good API?

In this section, we’ll try to summarize the features (or design principles) that a good API should have. Let’s summarize some basic principles. The so-called basic principles are those that, if followed, can reduce the chances that an API will have design deficiencies and problems during the subsequent API evolution process.

A good API

Provides a good mental model: An API is used for interaction between programs. However, how an API will be used and maintained depends on users’ and maintainers’ clear and consistent understanding of the API. It is actually very hard to have a clear understanding.
Simple: Make things as simple as possible, but no simpler. In most cases, APIs are designed to be too complicated instead of being too simple in a real system, especially when designers consider the fact that the system will evolve as the requirements of the system continuously increase.
Allows multiple implementations: This is a more concrete principle and my favorite one. This principle is emphasized a lot by Sanjay Ghemawat. Generally, the decoupling principle, or loose coupling, is often mentioned in API design. However, compared with loose coupling, this principle is more implementable: If an API allows multiple different implementations, that API has already had good abstraction and is not related to one of its specific implementations. Therefore, that API is usually not tightly coupled with external systems. This principle is more essential.

Best practices

This section gives some more detailed and concrete design suggestions on how to easily design APIs that follow the preceding basic principles.

A Good API Example: Posix File API

If only one API design example is listed in this article, then the POSIX File API is perhaps the most helpful and implementable design example. Therefore, we could also call this article “Learn best practices of API design from the File API”.

To learn best practices of API design, we may simply think about how the File API is designed.

First, let’s look at the main File API [1] interfaces (for example, many C interfaces are Posix APIs). The following is an example of a simple I/O interface:

int open(const char *path, int oflag, .../*,mode_t mode */); 
int close (int filedes);
int remove( const char *fname );
ssize_t write(int fildes, const void *buf, size_t nbyte);
ssize_t read(int fildes, void *buf, size_t nbyte);

Why is File API considered a good classic design?

The File API has been around for decades (nearly 40 years since 1988). Although several generations of hardware and software systems have been developed, this set of core APIs remain stable. This is amazing.
This API provides a clear concept model that allows you to quickly understand the basic concepts behind this set of APIs: the definition of a File and its related operations (open, close, read, write).
The File API supports different file system implementations, which can even target different types of devices, including disks, block devices, pipes, shared memory, networks, and terminals. Some of these devices allow random access. Some only support sequential access. Some are persisted while others are not. However, all different devices and file system implementations can use the same interface so that the upper system layer does not have to pay attention to the differences in underlying implementations. This is a very powerful feature of this set of APIs.

For example, consider interfaces that open files but have different underlying implementations. These implementations can all be supported by using the same interface, different paths and Mount mechanisms. Other types include procfs and pipe.

int open(const char *path, int oflag, .../*,mode_t mode */);

For example, the underlying implementations of the cephfs and the local file system are completely different, but the upper client does not to differentiate them. We can use the same interface and different paths to differentiate the two file systems.

These are some of the reasons why the File API is so successful. In fact, it is so successful that currently *nix operating systems are all file-based.

Although we can consider the File API our reference API design, it is difficult to design an API that can remain stable for a long time. Having a good reference API alone is not enough to design another good API. So, next we will discuss some more specific issues.

Document Well

Write details in documents and keep documents updated. This is obviously important. However, the truth is that many API designers and maintainers do not pay adequate attention to the documentation.

Today, in the era of service-based/microservices architectures, an application relies on a large number of services, and each service API is continuously evolving. Documenting each field and each method and keeping them updated are very useful and crucial for reducing mistakes in the client development process, reducing the chance of problems cropping up and improving the overall R&D efficiency.

Carefully Define the “Resource” of Your API

If applicable, define resources by combining operations and resources. Today, many APIs can be defined in an abstract manner. This definition method has many advantages and is also suitable for designing RESTful APIs that use HTTP. In the process of designing an API, one important prerequisite is to properly define the resource itself. What is a proper definition? A resource itself is the abstraction of a set of core API operation objects.

The abstraction is the process of removing details. During the design process, if the processes and operation objects are concrete in the real world, it is not very hard to select abstract objects. However, it is necessary to carefully consider which details should be included. For example, for the File API, the abstraction of the File resource can be data records uniquely identified by a string. The definition omits how the file is identified (this is done by the specific implementations of file systems) and how to store structure details (also done by file systems).

Although we want an API to be simple, it’s more important to select the proper entity for modeling. When designing underlying systems, we prefer simple abstraction design. In some systems, it is not easy to design domain models themselves. We need to carefully consider how to define resources. Generally, if the concept in a domain model is similar to what people perceive in the real world, it is helpful to understand that model. It is usually important to select the right entity to build a model. For more information about designing a domain model, see related domain design articles, for example, this article by Guo Dongbai [2].

Choose the Right Level of Abstraction

Choose the right level of abstraction when designing objects. This is closely related to the previous question. Different concepts are often correlated. Again, take the File API for example. Multiple abstraction levels are available when we design such an API, such as:

Text and image hybrid object
“Data block” abstraction
“File” abstraction

These different abstract levels may describe the same thing, but are conceptually different. When an API is designed to interact with a client for data access, “File” is a more appropriate abstraction. When an API is designed to be used inside a file system or device driver, a data block or data block device may be a better abstraction. When designing a document editing tool, you may want to choose the “text and image hybrid object” abstraction level.

Another example is database-related API definitions. The underlying abstraction may target the data storage structure. In the middle layer are objects and protocols that the database logic layer needs to define for data interaction. The abstraction required in the view layer is also different [3].

Prefer Using Different Models for Different Layers

This principle is closely related to the previous one. However, it emphasizes that different layers have different models.

In a service-based architecture, data objects are usually processed in multiple layers. For example, View-Logic-Storage mentioned previously is a typical hierarchical structure. We recommend that you use different data structures in different layers. John Ousterhout [8] also emphasizes “different layers, different abstractions” in his book.

For example, the 7-layer model in networking is a typical example. Each layer has its own protocol and abstraction. The File API mentioned before is a logic-layer model. Different file storage implementations (file system implementations) adopt mutually independent models. For example, block devices, memory file systems, and disk file systems have their own storage implementation APIs.

In the process of API design, if designers tend to use the same model for different layers (for example, a service uses the back-end storage service model and its own model, as shown in the following figure), this may mean that the responsibility of the service is not clearly defined. We need to consider whether it is necessary to further define its function in a more specific manner.

That different layers adopt the same data structure can also hinder the API evolution and maintenance. In the system evolution process, many new requirements may be added over time. For example, the back-end storage may need to be changed, or cache needs to be separated due to performance optimization. At this point, you may find that binding data in two layers (or storing front-end JSON directly in the back-end storage) can lead to unnecessary coupling and hinder the system evolution.

Naming and Identification of the Resource

After an API defines a resource object, a name and identification must be provided. Two options are available for the naming/ID (not the ID within the system, but the ID that will be exposed to users):

Use a free-form string as the ID (string nameAsId)
Use structured data to express the naming/ID

The best option depends on your specific requirements. Using a free-form string as the name enables maximum flexibility for the specific system implementations. However, the consequent problem is that the internal structure of the naming (for example, path) is not a part of the API definition, but the implementation details. If the name itself has a structure, the client must have the logic that extracts the structure information. This is a trade-off that API designers need to make.

For example, the File API uses a free-form string to identify a file name, while the file URL is specified by the file system. This allows Windows to use the "D:\Documents\File.jpg"structure and Linux to use the "/etc/init.d/file.conf" structure. If the data structure of the file naming is defined as

{
   disk: string, 
   path: string
}

, which shows the structured data of "disk" and "path", this structure may adapt to Windows, but not to other file systems (that is, the implementation details are omitted).

If the abstraction model of a resource object naturally contains structured identity information, using the structured data can simplify the logic that the client uses to interact with that object and reinforce the conceptual model. In this case, the API obtains other advantages at the cost of identity flexibility. For example, a bank account transfer can be expressed as the following structured identity:

{
   account: number
   routing: number
}

, which consists of the account identity and the bank identity. This design includes some business logic, which is the internal system logic described, not the implementation details. In addition, this design can simplify specific implementations and avoid security problems that would otherwise be caused by a non-structured string identity. Therefore, the structured identity may be more appropriate in this case.

Another question is when to provide a unique numerical ID. This is a common question. To answer this question, we need to consider the following related questions first:

Have a structured or string identity already been provided to consistently and uniquely identify an object? If the answer is yes, a numerical ID is not really necessary.
Is a 64-bit integer long enough to meet the requirement?
Numerical IDs may not be very user-friendly. Are numerical IDs really helpful for users?

If you still consider it necessary and practical to use numerical IDs after carefully considering these questions, you can use numerical IDs. Otherwise, use numerical IDs with caution.

What Are the Conceptually Reasonable Operations for This Resource?

After determining the resource/object, we need to define which operations should be supported. At this time, the major consideration is whether it is “conceptually reasonable”. In other words, consider whether the combination of operations and a resource sounds naturally reasonable (if the resource itself is accurately and reasonably named; this is of course a big if as it's not easy to do). Operations are not always limited to CRUD (create, read, update, delete).

For example, if the object on which an API will perform operations is quota, the following operations sound naturally reasonable:

Update quota and transfer quota

However, the Create Quota operation is not naturally reasonable because a quota expresses an amount, which seemingly does not match the creation operation. Also consider these issues: Whether it is really necessary to create this object? What do we really need to do?

For Update Operations, Prefer Idempotence Whenever Feasible

Idempotence is “the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” [3].

Obviously, idempotence brings lots of system design convenience. For example, the client can retry more securely, making complicated process implementations much easier. However, it is not always easy to implement idempotence.

Idempotence of the Create operation

If a Create operation is called multiple times, an object may be created repeatedly. To implement an idempotent Create operation, a common method is to use a client-side-generated deduplication token (unique ID) and use this same unique ID during retry so that the server side can recognize duplicates.

Idempotence of the Update operation

To implement the idempotence of the Update operation, avoid using the Delta semantics for APIs that update values . We have two simple methods to implement an idempotent Update operation.

Incremental like the IncrementBy(3) semantics
SetNewTotal

During retry, semantics like IncrementBy sometimes cannot avoid errors, while the SetNewTotal(3) semantics (set the total to X) can easily ensure idempotence.

Note that in this example, IncrementBy also has one advantage: It enables parallel processing when multiple new requests occur at the same time, while SetTotal may cause values to overwrite (or block) each other during parallel updates.

Both IncrementBy and SetNewTotal have advantages and disadvantages. Which semantics should be used to implement idempotence depends on specific scenarios. If parallel updates are prioritized, use IncrementBy together with a deduplication token to implement idempotence.

Idempotence of the Delete operation: If the Delete operation does not implement idempotence, when an object is deleted and a second Delete operation is called, an error may occur because that object is not found. Generally speaking, this is not a problem: Even if the Delete operation is not idempotent, it has no additional effect. To implement idempotent delete operations, we can use the following methods: Archive -> Purge lifecycle management (to delete data step-by-step) or Purge Log.

Compatibility

API changes require compatibility. This is very important. Specifically, the compatibility in this case is backward compatibility that allows older client versions to access newer versions on the service side (if they belong to the same major version) and prevent incorrect behaviors so that the client does not experience downtime. Compatibility is especially important for remote APIs (HTTP/RPC APIs). Many articles have described compatibility. For example, the document [4] gives some compatibility suggestions.

Incompatible changes include but are not limited to the following:

Delete a method, field or enumerated numeric value.
Change the name of a method or field.
The method name field remains unchanged, but the semantics and behaviors are changed. This type of incompatibility is often likely to be ignored.

For more information about compatibility, see the document [4].

Another important compatibility issue is how to deal with incompatible API changes. Generally, incompatible changes will experience a process called deprecation, which is implemented step-by-step when a major version is released. The deprecation process is not described in detail here. In general, it is required to support newer and older fields, methods, and semantics and give the client enough time to upgrade on the condition that the compatibility with older versions is retained. This process is time-consuming. That’s why we emphasize the importance of good API design.

For an internal API upgrade, developers may tend to use a more efficient “sync release” model to make incompatible changes. That is to say, notify all the clients of an incompatible change to the service API and have all the clients release and update the change at the same time, and then switch to the new interface. This method is not highly recommended for the following reasons:

We usually do not know every client that uses a specific API.
The release process takes time and synchronous updates cannot effectively be implemented.
This model does not allow backward compatibility. It will be a very complicated issue if the new API requires rollback, because it is highly likely that no rollback plan has been prepared when using this release model and not every client will roll back even if a rollback plan is available.

Therefore, we highly recommend that you avoid using the “synchronous upgrade” model, resulting in incompatible API changes in production clusters.

Batch Mutations

How to design batch mutations is another common API design decision that developers have to make. Two methods are commonly used to design batch mutations:

Client-side batch mutations
Server-side batch mutations

The following figure shows the two methods to implement batch mutations.

API designers may want to implement server-side batching but we recommend to avoid it whenever possible, unless the atomic and transactional batching is helpful for customers. Server-side batch mutations have many disadvantages while client-side batch mutations have lots of advantages:

Server-side batch mutations increase the complexity of API semantics and implementations. For example, semantics and status expression that are partially updated.
Even if we want to support batch transactions, we need to consider whether different back-end implementations can support transactions.
Batch mutations pose a great challenge to the server-side performance and may easily lead to interface abuse on the client side.
Client-side batching allows the load to be shared by different servers (see the preceding figure).
Client-side batching allows the client to decide failure retry policies in a more flexible way.

Be Aware of the Full Replacement Risks

Full replacement update refers to replacing an old object/resource with a completely new object/resource in a Mutation API. The API is probably written like this:

UpdateFoo(Foo newFoo);

This is a very common Mutation design method. However, API designers must notice the potential risk of this method.

When full replacement is used, the object Foo to be updated may have a new member on the server, which may not have already been updated on a client and therefore is not known by the client. Adding a new member to the server is generally a compatible change. However, if another client that knows this new member has set a value for this member, and the full replacement is performed by a client that does not know this member, this member may be overwritten.

A more reliable update method is “update mask”, that is, to use specific parameters during API design to specify which members should be updated.

UpdateFoo {
  Foo newFoo; 
  boolen update_field1; // update mask
  boolen update_field2; // update mask
}

“Update mask” can also be expressed in the form ofrepeated “a. B. c. d”.

However, this type of API is uncommon, because it requires more complex maintenance and code implementation. “Update mask” is not a mandatory method in this section. That’s why the title of this section is “Be aware of the full replacement risks”.

Don’t Create Your Own Error Codes or Error Mechanisms

API designers sometimes want to create their own error codes or express different mechanisms for returning errors. They want to do so because they think that it may be helpful for users if details of each API is expressed and returned to users. However, this actually makes APIs more complicated and difficult-to-use.

Error handling is an important part of the user experience. To make it easier for users to use APIs, the best practice is to use standardized and unified error codes instead of creating a set of error codes for each different API. For example, HTTP has a set of standard error codes [7], and Google Cloud APIs also use unified error codes [5 ].

Why are API developers not recommended to create their own error code mechanisms?

Error handling is the job of a client. However, it is difficult for a client to notice all error details, especially in large amounts. Generally, error handling is divided into two more three types at most. For error handling, the thing that a client cares about the most is whether it should retry after an error or continue to return that error to the upper layer. It does not aim to recognize different error details. Multiple error code mechanisms only make error handling more complex.
Some people may think that providing more custom error codes is helpful for conveying information, but this is only meaningful if a separate mechanism is established for this purpose. If custom error codes are provided solely to convey information, a field in an error massage can serve the same purpose.

For more design patterns, see [5] Google Cloud API guide and [6] Microsoft API design best practices. Many questions described in this article are also discussed in the documents listed in the References section. In addition, these reference articles describe common API design specifications such as versioning, pagination, and filter. The related content is not repeated in this article.