Information Input Considerations for Intelligently Generating Frontend Code
Background and Trends
With the development of artificial intelligence (AI) in recent years, AI is already being applied in various industries. In terms of web R&D and design, public data sets similar to Rico, a mobile application tagging data set, are constantly emerging. This allows industry researchers to conduct academic research in mobile application scenarios easily. In this field, intelligently generating frontend code is a promising direction of development.
Apart from data sets used for academic research, some Internet products directly provide services that intelligently generate code. These services seek to constantly improve their capabilities by evaluating the quality of the code generated as they provide services to users.
- imgcook: This product is adept at generating code based on design documents.
- fronty: This product is adept at generating website homepages based on images.
- Yotako: This product is adept at generating code based on Photoshop documents (PSDs).
These three code-generating products mostly use structured design documents or images as the input information. The parsing of the input information determines the quality of the final code output. Therefore, we need a way to evaluate the quality of the generated code. First, let’s look at the structure of frontend code.
Structure of Frontend Code
Generally, frontend code includes static GUI code and dynamic logic code.
Static GUI Code
Static GUI code generally consists of HyperText Markup Language (HTML) and Cascading Style Sheets (CSS) code. The visual style description of the code can be based on the fill styles of the layer-level nodes in the design phase. However, the hierarchical description of the code is usually unique to frontend code and cannot be reflected in the design phase.
Dynamic Logic Code
With the popularization of serverless, frontend developers also need to develop code for Function as a Server (FaaS) at the server end, which includes obtaining server-end data from requirements and processing the data. This is similar to the encoding of data flows behind the programs.
The preceding three types of code account for most frontend programming content. To intelligently generate the code, we need to make the following preparations:
- Implement structured descriptions of the code so the code can be converted through data description.
- Extract key information from related materials for structured data description.
This article focuses on ways to obtain information. The following section introduces the sources of input information used to generate code.
Sources of Input Information
When given business requirements, we can access the following information before R&D: product requirement documents (PRDs) from product managers, interactive documents from interaction designers, visual documents from visual designers, and finalized page images. If we compare the information flow of a web page (program) with the information theory (information source :arrow_right: encoder :arrow_right: channel :arrow_right: decoder) proposed by the famous mathematician Claude Shannon:
- The information source corresponds to the PRD, indicating the planned requirement for the web page (program).
- The encoder corresponds to the interactive document, which gives the designer’s interactions concerning the requirements and the layout of the overall design format of the page.
- The channel corresponds to the design document, wherein the information used by the designer, such as images, colors, texts, and sounds, acts as the information transmission channels in the design.
- The decoder corresponds to the process by which these channels translate the input information into an output that users can understand.
The description above shows the flows of information in various stages allow users to understand and acquire information about an application. If only one channel is considered, you will miss other important information. This is also true for the acquisition of R&D status information. For example, if you only consider the visual document without considering the PRD during R&D, the delivered code cannot meet the requirements.
The PRD includes the required functional points for the frontend and server-end data. It is also the starting point for obtaining user information. However, the content of the PRD cannot be described in a structured way. This means human intervention is required to understand its information. What if we required product managers to deliver PRDs that are described in a structured way?
As you can see, if PRDs are described in a structured manner, we can easily extract the information of a specific module from the preceding three parts of frontend code. Then, we can use capabilities, such as natural language processing (NLP), with existing data models in specific fields to analyze and describe important logic code in a structured way.
Visual Document Design
The content description of a visual document is relatively simple. Currently, designers can use design tools, such as Sketch, Photoshop, and XD, which come with some structured descriptions. We only need to use the developer capabilities of the design tools to extract the structured information from the visual document and convert the information into the structured description we need. Then, we can fill the structured description in our static GUI visual code.
Interactive Document (Dynamic Effect) Design
Here, we are mainly considering interactive documents that focus on dynamic effects and responses. Different from visual documents that describe static information of layer-level nodes, interactive documents usually describe responses between node statuses. In common design tools that can produce interactive responses (such as AE and Principle), if we can extract the structured description of interactions provided by designers from the design tools, then we can also convert the description into corresponding interactive actions in our programming. Next, we can use the actions to implement frontend interactive logic code.
Model Information Learning
In addition to obtaining information directly from the information media as described above, we can also enrich the extracted information through model training and learning based on our experience. For example, common web components cannot be described in design documents. Therefore, we can use a target detection model to extract information about common web components. We can also use classified learning on the text in a business field to automatically map static text fields in a design document to dynamic fields.
Different from the information media mentioned above, images do not contain structured information. We can only intuitively obtain pixel lattice information in the images. However, using deep learning, we can extract basic design elements, such as text, images, and shapes, as well as style attributes from the images.
We can also train a model to identify the descriptions of targets (components) in an image.
If some static text fields in a design document are mapped to dynamic fields issued by APIs, we can analyze the text content of designer descriptions and find data fields that may map to the text content for a specific business. For example, if we see
xxx Flagship Store, we may guess that it is the
Store Title field in the
Store data source for business data. Therefore, we can implement automatic mapping between the field and the corresponding field in the API when we generate code.
Structure description refers to the output (JSON data description) extracted from the design document mentioned above. When we use static GUI code to describe pages or applications, we also use certain semantic structures (structural layout groups) in addition to basic style information. A visual document does not contain structure semantics. Therefore, we often need to use a training model to obtain structure semantics based on our experience with layout structure semantics. This way, we can make the layout structure description of the GUI more semantic.
Deep Connections of Information
There are many sources of information used in the intelligent generation of code, each of which is related to the ultimately generated code to a greater or lesser extent. If certain information cannot be understood or used, the generated code will be incomplete. Therefore, it is also important to connect the information in different dimensions.
The following figure shows an attempt at correlated analysis of the information from different channels. We can see that input information from different channels can be correlated in different dimensions of a module. In other words, the module corresponds to static GUI code, frontend logic code, server-end logic code, and identified model components within the module. If we connect the information, the generated module code not only contains the static UI code obtained from the visual document in the initial phase but also other logic and semantic information that can be used to automatically convert the static code into dynamic code. This way, we can obtain the complete code, including code for field binding, module material identification, and rendering logic.
After two years of development and refinement, imgcook has proved that it is possible to extract information from design documents to automatically generate some GUI code. In the future, we will continuously investigate methods of correlated information analysis to establish complete graphs and generate more accurate code.