ESR Optimizes Frontend Performance by Using Edge Computing Capabilities of CDN
For most Web pages, the page performance, such as Search Engine Optimization (SEO) and paid drainage, when accessing them for the first time is generally worse than that in the scenario where you access them from a related page. One major reason for this discrepancy in performance is that users who access a page for the first time need to face challenges in connection multiplexing and caching and utilization of local resources. When you are accessing a page for the first time, many terminal-based optimization means, such as preloading, pre-execution, and prerendering, cannot be implemented.
When the caching capability of the client cannot be utilized, we can improve the performance by using the caching capability of CDN nodes due to their proximity to users. However, common solutions for frontend performance optimization inevitably have some disadvantages. Based on the concept of Edge Side Includes (ESI), this article proposes a new optimization idea — the Edge Side Rendering (ESR) solution. That is, we use the edge computing capabilities of the Content Delivery Network (CDN) to successively return static content and dynamic content to users in a streaming mode.
Overview and Underlying Principles
Idea 1: SSR
To optimize performance, we generally use Server Side Rendering (SSR) to directly output the dynamic content from the server side.
Idea 2: CSR and CDN
To reduce the white screen duration, we consider using the edge caching capabilities of CDN to cache the HTML content of the page directly on a CDN node. In most scenarios, however, the main content of the page is dynamic or personalized. Caching all HTML content to a CDN node has a great impact on the business, so this method is acceptable in few scenarios. Then, what will happen if we cache the static HTML content to a CDN node? In fact, this is also a common operation. That is, the static HTML framework part is cached on a CDN node, which allows the user to see some content soon. Then, the user can initiate an asynchronous request from the client to obtain and render the dynamic content, which is a process of Client Side Rendering (CSR). The rendering process in the mode of CSR and CDN is as follows:
The strength of this mode is that the static framework of the page is cached on a CDN node, and the user can quickly see its content, reducing the anxiety about waiting due to the white screen. However, we must use JS and sends an asynchronous request before the full page content is rendered. The final meaningful dynamic content takes longer time to appear on the screen than in the SSR mode.
Idea 3: ESI
The mode of CSR and CDN reduces the white screen duration, but delays the display of dynamic content. This problem is due to the following reason. Dynamic content and static content on the page are divided into two stages, both of which are overlapped, with the download and execution of JS. How do we combine dynamic content with static content on a CDN node?
The ESI inspires us in this regard. ESI was originally a standard proposed by CDN service providers. By adding specific dynamic tags to HTML tags, it allows static content on the page to be cached on a CDN node, and dynamic content to be freely assembled. The diagram of rendering sequence of the ESI is as follows:
This solution looks great. It caches static content to a CDN node, and allows dynamic content to be spliced as the user initiates a dynamic request. However, the most critical problem is that, in the ESI mode, the first byte finally returned to the user has to wait until all dynamic content is obtained and spliced on the CDN node. That is, this mode does not reduce white screen time, but only decreases the volume of content transmitted between the CDN node and the server, with little performance optimization effect. Therefore, the ESI mode and the SSR mode have similar final effect.
Although the effect of the ESI fails to meet our expectations, this mode offers us a good direction of thinking. We can transform the ESI to ensure that static content is returned first and dynamic content, after obtained by the CDN node, is returned to the page. In this way, white screen time is surely shortened and the return of dynamic content is not delayed. To achieve the similar effect of streaming ESI, we must be able to perform fine-grained operations on requests on the CDN node and return content in a streaming mode. Do CDN nodes support such complex operations? Edge computing gives a positive answer to this question. Just as we perform service worker operations on the browsers, we can perform similar operations on the CDNs to flexibly program requests and responses.
Based on edge computing capabilities, Edge Side Rendering (ESR) has become a new choice for us. Details of the solution are as follows.
According to the core idea of ESR, edge computing capabilities help successively return static content and dynamic content to the user in a streaming mode. A CDN node is closer to the user than the server, with a shorter network delay. On the CDN node we can first quickly return to the user the page’s static content that can be cached. Meanwhile, we can initiate a request for dynamic content on the CDN node. And we can continue to return dynamic content to the user after static content is returned in the response stream. The diagram of final page rendering sequence is as follows:
As shown in the preceding figure, the CDN edge node can quickly return the first byte and the static content of the page. Then, the CDN node initiates a dynamic request to the server and returned such dynamic content to the user in a streaming mode. The solution has the following characteristics:
• The Time to First Byte (TTFB) of the first screen is short, and the static content (such as the header, basic structure, and skeletal diagram of the page) can be quickly visible.
• Compared with traditional browser rendering, this rendering mode is featured with earlier initiation of dynamic content by the CDN. This mode does not require the download and upload from a browser and the execution of JS. Theoretically, the final response time is the same as the time of obtaining the full dynamic page by directly accessing the server.
• After static content is returned, the HTML content can be partly parsed, and JS and CSS can be downloaded and executed. Such operations that may block the page are completed in advance. Therefore, the full dynamic content can be displayed more quickly after it is returned in a streaming mode.
• The network between the edge node and the server has more room for optimization than that between the client and the server. For example, dynamic acceleration and connection multiplexing between the edge and the server can reduce Transmission Control Protocol (TCP) connections and network transmission overhead for a dynamic request. As a result, the dynamic content is returned faster than when the client directly accesses the server.
Demo for Comparison
In our demo, we accessed the main search page of https://edge-routine.m.alibaba.com/ using Alibaba Cloud CDN. The following shows the comparison of loading on original pages in different networks (throttling configured through charles network throttle):
Unlimited Speed (Wi-Fi)
Limited Speed 4G
Limited Speed 3G
The preceding result shows that, as network speed becomes slow, the major elements appear faster in the CDN streaming rendering than in the original SSR mode. This is as expected because the slower network is accompanied by the longer loading time of static resources. In that case, loading static resources by the browser in advance will bring more obvious effect. Regardless of network conditions, the CDN streaming rendering has much shorter white screen time.
A template is like a syntax that contains an ESI block. A template allows us to extract the content to be dynamically requested, and to separate and cache the static content that can be returned. Therefore, a template essentially defines dynamic content and static content of the page.
In the streaming rendering process, the page template is parsed from top to bottom. The static content is directly returned to the user. The fetch logic is executed for the dynamic content. Static content and dynamic content may appear alternately in the whole process.
The designed types of templates are as follows:
1) Original HTML
This template brings the fewest affects to the existing business. To identify the dynamic content of the page, you only need to add some tags to the content in the existing SSR page.
2) Static template (no related scenarios are available for the moment)
This template needs to be sent to the CDN node separately. If the rendering layer has access to the FASS gateway and the SSR, the three sides can share the template content. When a template is released in the workflow, it is automatically synchronized to the CDN node and the cache on the CDN node is cleared. Dynamic content is rendered in two modes. In one mode, SSR is used to generate dynamic HTML fragments. In the other mode, the server provides dynamic data and dynamic HTML fragments are rendered on the edge node.
The strength of using SSR to render dynamic HTML fragments is that rendering the HTML template on the edge is not required and developers do not need to write two sets of template logic. However, SSR capabilities are required for the server, and the dynamic content for transmission is large.
Rendering on edge nodes also has its strengths. The server is only required to provide dynamic data but not SSR capabilities (In this case, the client must have CSR capabilities in case that an exception occurs). Moreover, dynamic content for transmission is small. However, dynamic content is not passed through in a streaming mode on the edge node. Such content is returned to the user after it is completely downloaded to and processed on the edge node.
2. Display of Static Content
Static content comes from templates. Static content is obtained differently from different types of templates. For the original HTML template, static content is, based on the HTML annotation tag, extracted from the full HTML content that is returned for the first dynamic request. Then, the static content is stored on the edge cache. For the static template, static content is obtained through the pulling of the template files cached on the CDN node and is stored on the edge cache. Static content has cache expiration time and version number.
The fixed static content of the template is returned directly to the user in the response. Subsequent static content, such as closed tags of the HTML and body, is presented in two modes:
In the first mode, static content is written to the response stream after dynamic content is returned. This mode supports the SEO. In this mode, however, dynamic content blocks subsequent static content. In the case of multiple dynamic content blocks, dynamic templates are displayed only in sequence.
In the other mode, static content is first returned, and dynamic content is placed in the corresponding position through scripts in a manner like BigPipe. Complete static content can be displayed at first, and multiple pieces of dynamic content can be displayed in the order of their arrival. This mode does not support SEO because dynamic content is appended by JS.
3. Dynamic Content
During rendering, dynamic content is parsed to the area where such content is dynamically obtained. On the edge node you can initiate a request for dynamic content. Dynamic content can be accelerated to arrive at the server (the origin). An edge node interacts with dynamic content at the backend in three modes:
In the first mode, dynamic content at the backend is returned in full page content, and needs to be annotated and extracted from the content. Although this mode intrudes less into the existing business, dynamic content for transmission is large. In addition, the complete HTML must be downloaded before dynamic content is extracted.
In the second mode, only the content of a dynamic block at the backend is returned. This mode allows dynamic content to be returned to the user in a response stream. The page needs to provide a Uniform Resource Locator (URL) that only returns the content of a dynamic block.
In the third mode, only data of the dynamic content at the backend is returned. To assist the dynamic rendering template in the static template, we can render the dynamic HTML on the edge node and return it to the user. In this mode, data for transmission at the backend is of small volume, and the backend does not need to offer any SSR capability. This mode also has demerits. A developer needs to maintain one more set of template logic. Moreover, complex template rendering on the edge node may incur CPU overhead and restriction.
Users interact with dynamic content on edge nodes in two modes:
Waterfall stream mode (corresponding to WATER_FALL in the routing configuration): Dynamic content is successively returned in the form of waterfall streams. Although multiple pieces of dynamic content are loaded in parallel on the edge node, the page content is displayed in order from top to bottom for the user. This mode supports the SEO and does not affect the loading sequence of the page modules. In the case of multiple dynamic modules, however, the full page framework is invisible, and the content of the first dynamic block blocks the display of subsequent dynamic block content. In addition, JS and CSS resources at the bottom of the page cannot be loaded and executed in advance.
Embedded mode (corresponding to ASYNC_INSERT in the routing configuration): Static content is returned on a one-off basis, and tags are placed to occupy positions for dynamic content. Subsequent dynamic content is inserted into the earlier occupied positions in innerHTML format. JS and CSS resources at the bottom of the page cannot be loaded and executed in advance, and you can first see the full page. However, this mode does not support the SEO, and the execution sequence of page modules changes with the return speed of a dynamic block. You need to make some judgments and to implement compatibility in the page logic in the browser.
version: '0.0.1'//Configure the version number.
pageName: 'seo', //The page name identifier.
match: '/abc/efg/.*', //The regular expression strings that match the path of the page.
renderType: 'ESR', //ESR.
templateType: 'FULL_HTML', //Template type: use the full HTML content generated by SSR as the template.
dynamicMode: 'WATER_FALL|ASYNC_INSERT', // Dynamic content is appended and returned: waterfall stream mode| asynchronous insertion (innerHTML).
templateUrl: ''// Template URL.
templateType: 'STATIC', // The static template that can be obtained through CDN URL.
dynamicMode: 'WATER_FALL|ASYNC_INSERT', // Dynamic content is appended and returned: waterfall stream mode| asynchronous insertion (innerHTML).
renderType: 'REDIRECT_302', // 302 redirect.
renderType: 'PROXY_PASS', // 301 redirect.
Currently, three rendering modes are designed for routes: streaming rendering, redirection, and reverse proxy. The configuration of redirection and reverse proxy is simple. You only need to extract the target URL in a similar way you configure Nginx.
Control of Affected Scope
CDN switch: Domain names are switch traffic by region and by proportion. We can switch back traffic for unified access.
Scope switch of edge computing: The CDN is configured with paths for edge computing to ensure that edge computing runs only along certain paths.
Routing switch of edge computing: Based on the route configuration in edge computing, some pages are allowed for streaming rendering. For requests for other pages, full page content is obtained through dynamic acceleration.
If there is a serious problem about the CDN, we can modify the DNS solution to redirect the request to the backend.
If the basic edge computing function is abnormal, you can disable edge computing on the CDN configuration platform and use the default dynamic acceleration.
If an error occurs before any response is returned to the client in the process of ESR, we can capture the error and then obtain the complete page content instead.
In the process of ESR, if static content has been returned to the client and an error occurs during the loading of dynamic content on the edge node (timeout, HTTP error code, and mismatch with the static content version number), the script tag of location.reload() is returned to end the response and then forcibly refresh the page. We can add a specified query parameter to ensure that the refreshing request can bypass ESR.
1) Phased release of edge computing code
The phased release of edge computing code is supported.
2) Phased release of route configuration
In the code of edge computing, we can load two configuration URLs of the phased release version and the official version according to a fixed proportion. In the case of phased release, only the configuration for phased release is released; in the case of full release, the full configuration is released. Release is accompanied by clearing of CDN cache.
3) Page content for phased release
A special template version number is given to the page for phased release. With this version number, the page can bypass ESR.
In the release mode of the frontend and backend separation, smooth release is a common problem. If static resources (JS and CSS) on the page and resources at the backend are not released together, HTML content returned from the backend may not match JS and CSS at the frontend. If such mismatch is not processed to be compatible, the style may be wrong or disorderly or the document selector cannot find the related element.
To ensure smooth release, we can choose to make compatibility in the code when we handle the requirement for simultaneous changes of the frontend and backend. Therefore, successive release does not affect the page availability.
Alternatively, we can solve this problem by using a version number. That is, we can manually configure a version number on the backend page. In the case of incompatible releases, we can first release the frontend resources, and then manually modify the version number at the backend. This ensures that only when the backend machine successfully releases resources, static resources of the new version will be cited in the HTML.
Smooth release always exists in the scenario of batch release or Beta release. In the ESR scenario, however, we cache static content on a CDN node, making the frontend and backend more probably inconsistent. To solve this problem, we need to identify risks during the release by business developers. If the compatibility is achieved, we will not need to carry out special processing. Otherwise, we need to modify the version number of the page template. When the new version number of the dynamic content does not match that of the static content, the dynamic content of the new version discards the current streaming rendering. This ensures that the problem of incompatibility between dynamic content and static content will not occur.
Edge CDN Service Providers
Currently, major CDN service providers support edge computing as follows:
Alibaba Cloud CDN supports edge computing of the environment similar to service worker and meets function requirements.
Nodes outside mainland China are limited for Alibaba Cloud CDN, and its performance in some regions can be comparable to or even exceed Akamai. However, some domain names perform slightly worse than domain names of Akamai due to fewer nodes.
Akamai only supports simple request rewriting computing but fails to meet the needs of ESR.
The ESI can assemble dynamic and static content, but does not support streaming. Dynamic content will block the first screen.
Akamai has many nodes outside mainland China, and has some performance advantages over Alibaba Cloud CDN in some regions.
Cloudflare supports edge computing of the environment similar to service worker and meets function requirements.
If you have no experience in using Cloudflare, you may find the process complicated.
We will conduct tests in a typical scenario of accessing a page for the first time. Currently, the phased release has been launched. The comparison between solutions with ESR and without ESR in Indonesia by using WebPagetest can show the optimization effect:
• The TTFB is reduced by 1s.
• The white screen time is reduced by 1s.
• The time of core content display is reduced by 500ms.
For information about the comparison result through WebPagetest, see