Solving Memory Leaks Caused by Co and Recursive Calls

Image for post
Image for post

By Yijun

Preface

Writing synchronous functions recursively will lead to stack overflow process failures when the recursion exit condition becomes invalid. In some scenarios, writing asynchronous functions recursively is used to avoid this problem:

async function recursive() {
if( active ) return;
// do something
await recursive();
}

The function after the await keyword may be invoked across multiple event loops, and this method of writing will not lead to stack overflow. However, this method is not always foolproof. Let's look at the following case where this method caused a production failure for one of our customers.

Discovering the Problem

After the customer connected to the Node.js Performance Platform, often Out of Memory (OOM) was displayed due to sudden memory increases. The customer had added the @heap_used / @heap_limit > 0.5 alert rule so that the heap snapshot file could be generated for analysis when the heap is relatively small or when leaks occur.

After obtaining authorization from our customer, we managed to access the customer’s project and obtain the heap snapshot file. The process trend chart also showed some of the key effects of high memory use, such as longer GC duration and lower processing efficiency.

Locating the Problem

From the successfully generated heap snapshot file, we could roughly see where the memory leak was. However, accurately locating the problem still needed some effort.

Analysis of Heap Snapshot

Let’s first take a look at the memory leak report:

Image for post
Image for post

As shown in the preceding report, the file is nearly 1 GB. The keyword context indicates that it is a context object generated during the function execution, rather than an ordinary object (for example, a closure). After the function completes, this context object does not necessarily disappear.

In addition, this context object is related to the co module. This indicates that co may have scheduled a generator that runs for a long time. Otherwise, this context object will experience GC when the function completes.

However, this alone doesn’t provide any reliable conclusion. Let’s move on.

Try to view the object content by @22621725 and view the reference to GC root by @22621725. We find nothing useful.

The object cluster view does show some useful information:

Image for post
Image for post

We can see that starting from @22621725, one context object refers to another context object and a Promise is included between them. If you are familiar with co, you will know that co will convert a non-Promise call to a Promise. In this case, the Promise means a new call to the generator.

The reference relationship in this example is very long. After I move to the 20th layer, Percent hasn’t even been reduced by one ten-thousandth. This trail of clues ends here.

We also can obtain some useful information from the class view:

Image for post
Image for post

This view shows an uncommon object: scheduleUpdatingTask.

This heap snapshot includes 390,285 scheduleUpdatingTask objects. Click the class to view more information:

Image for post
Image for post

This class is in the file function /home/xxx/app/schedule/updateDeviceInfo.js() / updateDeviceInfo.js.

Currently, these are the only clues available. Next, let’s move on to code analysis.

Code Analysis

After obtaining authorization from our customer, we managed to obtain the relevant code and find the scheduleUpdatingTask object in the app/schedule/updateDeviceInfo.js file.

// Run the service, wait for a while after it runs successfully and continue
// Stop the task if the lock fails to be obtained
const scheduleUpdatingTask = function* (ctx) {
if (! taskActive) return;
try {
yield doSomething(ctx);
} catch (e) {
// Service exception capture is required so that the next scheduling can run normally even if the current task fails
ctx.logger.error(e);
}
yield scheduleUpdatingTask(ctx);
};

In the entire project, the only repeated call to scheduleUpdatingTask is the call made by itself. This is usually refers to as a recursive call.

However, it is not completely accurate to say it is a recursive call. If it is a real recursive call, the stack would have already experienced overflow in the first place.

The reason for no stack overflow is that the content before and after the yield keyword is actually run across eventloop processes in the co/generator system.

Although no stack overflow is present, the context object attached after the generator runs will only be destroyed when the generator completes the entire execution process. That’s where the recursion leads to the reference of one context object to another context object. As a result, memory cannot be deallocated.

In this code snippet, it is clear that the termination condition If (! taskActive) return; fails.

This code snippet perfectly explains the phenomenon observed before. To confirm this problem, I write the following code snippet to reproduce the problem:

const co = require('co');function sleep(ms) {
return new Promise((resolve) => {
setTimeout(() => {
resolve();
}, ms);
});
}
function* task() {
yield sleep(2);
console.log(process.memoryUsage());
yield task();
}
co(function* () {
yield task();
});

After the preceding code snippet is run, the application does not fail to respond immediately. But instead the memory usage continually increases. This phenomenon is the same as what the customer has encountered.

We may assume that the cause of this problem may be async functions:

function sleep(ms) {
return new Promise((resolve) => {
setTimeout(() => {
resolve();
}, ms);
});
}
async function task() {
await sleep(2);
console.log(process.memoryUsage());
await task();
}
task();

The result shows that the memory usage is still increased.

Solving the Problem

Although the heap snapshot analysis didn’t go smoothly on the Node.js Performance Platform, we finally located the cause of this problem. Now that we know the cause, let’s see how we can solve this problem.

From the previous example, we can see that recursive calls in co or async functions may delay the memory deallocation. This delay leads to memory pileups and memory pressure. However, does this mean that we cannot use recursive calls in these scenarios? The answer is no.

However, we need to evaluate the application to see if the recursive method will cause too long of a reference link. In this example, when the exit condition fails, the recursive call is basically like an infinite recursion call.

Can we find a solution to continue running the application without having too long of a context reference link? The answer is yes:

async function task() {
while (true) {
await sleep(2);
console.log(process.memoryUsage());
}
}

In the preceding example code, replacing the recursive call with the while (true) loop eliminates the context reference link problem. Because await internally causes eventloop scheduling, while (true) will not block the main thread.

Original Source

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store