Ideas for Using Recursive Autoencoders for Intelligent Layout Generation

Image for post
Image for post

By Queyue

Overview

The READ: Recursive Autoencoders for Document Layout Generation paper combined a recursive neural network (RvNN) and an autoencoder to automatically generate different layouts from random Gaussian distributions. It also introduced a combined measurement method to measure the method quality. The following sections describe the methods used in this paper and some ideas for implementing these methods in frontend scenarios. If you are interested in this paper, click here to read it.

Image for post
Image for post

RvNN and Autoencoder

RvNN

Image for post
Image for post

As shown in the preceding figure, each input of a neural network contains two vectors, which we call child vectors. After the network performs a forward operation, the two child vectors form a parent vector. The parent vector then enters the network with another child node to generate a vector. This forms a recursive process. Finally, a root node or root vector is generated.

Image for post
Image for post

If each leaf node in the preceding example is a word expression, the resulting trained network can merge these words into a vector in the semantic space. The vector can represent the corresponding sentence, and similar vectors can express the same semantics.

VAE

Image for post
Image for post

Let°Øs say you have two networks. The first network maps a high-dimensional vector x to a low-dimensional vector z. In the preceding figure, we map an image to a one-dimensional vector. Then, the second network maps vector z to a high-dimensional vector x1. Our training target is to make x and x1 as close as possible. This way, we can regard vector z as a representation of x. The trained x1 is a generative network and can generate the image represented by z.

VAEs have made some improvements to normal autoencoders. They allow the encoder to generate an extra random vector, which we call the variance. The variance and z are combined as the input for the next step, ensuring a more stable model.

Training Data Representation

Image for post
Image for post

As shown in the preceding figure, to better adapt the data to the RvNN, we convert it into a binary tree. Then, we scan the design document from left to right and from top to bottom. Each basic element is a leaf node. The nodes are merged from the bottom up based on the scanning sequence. We call merged nodes internal nodes, and these are ultimately merged into a root node. Each internal node has relative position information that is the relative position of the two merged elements. This information can be classified into right, bottom right, bottom, enclosed, and bottom left. The width and height of the bounding box for each leaf node are first normalized to [0, 1].

Building a Recursive Model

SRE

Image for post
Image for post

As shown in the preceding formula, x1 and x2 are the n_D vectors of the two input nodes, and r represents the relative position of the two vectors, with the element on the left as the reference. f expresses the multi-layer perceptron of the current SRE. After continuous encoding in the RvNN, a vector that represents the root node is generated.

SRD

Image for post
Image for post

During recursion, SREs and SRDs can use the same model structure. For example, we can classify SREs and SRDs into different types based on their relative position type and use the same network to train SREs and SRDs of the same type. In addition, we can train a neural network to determine whether a node is an internal node or a leaf node. If it is an internal node, we will continue to decode it. If it is a leaf node, we will map the node to the bounding box and classification of the element.

Model Training

Image for post
Image for post
Image for post
Image for post

indicates the reconstruction error of a leaf node, which is the difference between the initial and decoded vectors of the leaf node.

Image for post
Image for post

indicates the reconstruction error of the relative position, which is the difference between the initial and decoded relative positions.

Image for post
Image for post

indicates the classification loss of a relative position type, which is a standard cross-entropy loss function.

Image for post
Image for post

indicates a Kullback-Leibler (KL) divergence between the space p(z) represented by the vector of the final root node and the standard Gaussian distribution q(z). Whenever possible, we want the vector of the root node to be a Gaussian distribution because we want the input of the SRD to be a random vector from the Gaussian distribution sample, so the model can automatically generate a layout.

I will not give a detailed description of the formulas for the preceding four loss functions here. Now, we can start to train the model based on the synthetic loss function.

Model Measurement

Experimental Datasets

ICDAR2015 Dataset

User-Solicited (US) Dataset

Experimental Results

Image for post
Image for post

The preceding figure shows the results of a comparison between the method proposed in the paper and the probabilistic approach in the ICDAR dataset. The value indicates the latent distribution similarity. According to the value, the method proposed in the paper is better.

Image for post
Image for post

The preceding figure shows the results of a comparison between the method proposed in the paper and LayoutGAN. The method proposed in the paper requires less training data samples and generates more elements.

Some Ideas

Undoubtedly, research in this field requires a large amount of training data. If the solution proposed in the paper is applied to data in frontend scenarios, it could provide a sample generation and enhancement method, allowing us to obtain reasonable training data. Later, we will do additional research to verify the feasibility of the solution.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store