Windows Networking Troubleshooting 4: Slow Response of .NET Framework Applications

Like Java, .Net Framework on applications are executed and parsed on IL. With the !sos debug extension from Microsoft, we can basically have a clear and complete understanding of .Net Framework applications. This article describes how to solve a network problem with some users’ .Net Framework applications. Hopefully, it can inspire you to find many better solutions.

Problem

We have received feedback from a user indicating that the Windows applications seem to have slow access to MySQL and are behaving strangely. The architecture is like this:

Client — — — — Internet — -> Windows IIS server (.Net Framework application) — — — Internet — — -> MySQL server

User feedback:

  1. The client experienced slow access to the IIS server.
  2. The IIS server also had slow access to the same page by using its own Internet IP (VPC machines and EIP).

The user suspected that the reason was the slow access to the MySQL server on Windows and tried many IIS and MySQL optimizations. However, this problem never seemed to be resolved. In the meantime, the user tried to use another server to access MySQL, and ruled out MySQL itself as the cause of the performance issue.

Troubleshooting

At first glance, all these issues seemingly indicate that the Windows IIS server has slow access to the MySQL server when using an Internet IP. Therefore, we generate a waterfall chart of network requests to

Find out how slow the client access to the IIS server is.

This chart clearly shows the following information:

  • The client indeed had slow access to the IIS server.
  • The total latency is around six seconds .
  • Each access to GetAllCountrys takes around 3.25s (TTFB) until the server returns the first response byte.

Is this caused by network problems between the client and the server? We can rule out this possibility by looking at the waterfall chart of network requests that we have obtained from the IIS server.

The information that we obtain through http://localhost from the server also indicates that all components are running properly.

Another access test with the same content and a similar size, indicates that the response is as expected, almost with zero latency.

At this point, we can basically rule the product itself out as a cause of the slow access problem. To pin point the causes of this latency, we need to capture data packets for further analysis.

The following are some of our findings:

1. The server did receive the request 3.25s ago, but the response was delayed.

2. Judging from the captured data packets, after the server received the request, it immediately connected to the MySQL server and obtained the required content (that is, information about the country). We can easily find this by comparing the data packet content exported from Wireshark with the content received on the client.

3. After the server received the data, it did not immediately return the HTTP response. As shown in the chart, an ARP request occurred during this period. Logically, these additional operations have nothing to do with whether IIS returns a HTTP response or not. However, we can always find similar behaviors when we capture data packets several times. Intuitively, we think that this may be related to the 3.25s latency.

4. By taking a closer look, we can find that the ARP request on the server uses the Internet IP address of the client. However, this ARP request is definitely not initiated by the Windows operating system. If the server tried to communicate with the client at that time, it should request the Gateway address. Therefore, the ARP request is logically incorrect. This indirectly indicates that the ARP has something to do with the high latency.

5. Capturing a packet on Localhost can also be performed to determine whether or not an ARP request exists. However, it is also normal if nothing is captured since this is related to local traffic.

Unfortunately, Windows does not provide any logs related to ARP requests. To obtain definitive evidence, consider the following:

1. Use Process Monitor logs to collect information about the running processes of applications. Try to find some useful information by using Callstack in Process Monitor. Note that this kind of analysis takes a long time. We may resort to this when we don’t have any other clues. Currently, our analysis has been relatively clear and we specifically want to know what .Net applications are doing when the latency problem occurs .
2. Collect the Network Trace log in the .Net Framework. This log provides more information than a packet capture.

How to Configure Network Tracing:

http://msdn.microsoft.com/en-us/library/ty48b824(VS. 90).aspx.aspx)

3. Obtain the user dump for w3wp.exe on the IIS server during 3.25s of server unresponsiveness. Logically, IIS unresponsiveness is caused by an application waiting for a certain resource. However, it is not easy to capture this dump. Sysinternals provides a very powerful tool — Process Dump. We can use procdump.exe to monitor w3wp.exe and generate a dump file every one or two seconds to obtain the callstack when the problem occurs.

https://docs.microsoft.com/en-us/sysinternals/downloads/procdump

procdump -w -s 2 -n 10 w3wp.exe c:tempw3wp.dump

For detailed instructions, see the Microsoft document.

w3wp.exe User Dump Analysis

It is a relatively complex task to collect Network Trace in the .Net Framework. In addition, we need users’ cooperation to modify the profile. Therefore, we directly collect user dump file for further analysis. For a .Net program, we load the SOS debugging extension once we use Windbg to open the dump file.

For more information about the SOS debugging extension, visit https://docs.microsoft.com/en-us/dotnet/framework/tools/sos-dll-sos-debugging-extension

By analyzing w3wp.exe thread information, we locate the #27 thread.

We can easily locate IPHLPAPI! SendARP, which is literally the API that sends the ARP request. However, !sos is required to parse the long series of 0x00007ff7`d84ab0fb .

From the .Net String object on the stack, we can find the ARP request’s target address.

We can even print IL code for further analysis or save the DLL from the dump and use Reflector to decompile it.

Postscript

From the callstack in .Net, we can clearly understand why the latency problem occurs:

A user’s .Net application needs to write back access logs after it obtains information from the MySQL server. However, the application collects the MAC address information and therefore causes the system to continuously send out ARP requests, which finally leads to high latency. This code does not cause trouble in a local subnet. However, ARP requests can’t always receive responses across network segments, and subsequent response is blocked. Therefore, avoid this unnecessary operation whenever possible.

Later, when performing IL analysis of the .Net application, from the code we also see the call to GetWebClientHostname. In fact, this is the call to DNS to do reverse PTR query (We can also see the DNS PTR query in the previous packet capture. Unfortunately, I did not notice this before). Contrary to an ARP request, whether the query succeeds or fails, response will always be received if the DNS server is alive. Therefore, it does not have much impact on the latency. However, this operation is not very meaningful for client access from the Internet. So it is recommended to use Request.UserHostName instead to avoid unnecessary latency.

Reference:https://www.alibabacloud.com/blog/windows-networking-troubleshooting-4-slow-response-of--net-framework-applications_594840?spm=a2c41.12910709.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.