Diagnosing performance issues in Chromium browsers
Recently we had complaints from a customer that our forms were taking an unusually long time to load and in many cases caused Chrome to report that the page was "unresponsive". Initially reproducing the problem in our environment was tricky which hindered out efforts to diagnose it. The customer's environment was also heavily restricted. The first port of call is always to open "Developer tools" (F12) and examine the network traffic or console logs but in this case there was nothing obviously wrong. The first idea was to ask the customer to clear out their "Local storage" (Dev Tools > Application > Local Storage) in case, somehow, a bad form structure had been cached and was now causing our scripts in to a tight loop. However, the issue remained and we had to dig deeper.
Chrome Performance Tab
How to Analyze Runtime Performance: Google DevTools
Since we were still having trouble reproducing the issue we asked the customer to access the Peformance tab in Developer tools and attempt to reproduce the issue as they profiled it (Ctrl+Shift+E to reload and record). They could then export the result by right-clicking the results and selecting "Save profile...". We received enough traces from the customer to see that there was an obvious problem, manifesting as a large gap where it seems like there was no obvious activity occurring at all. Normally if our product has initiated the problem, it would be visible as high script activity, or large amounts of CSS animation or even an open network request. Even though we were witnessing a fair amount of "Layout Shift" and also some warnings about long tasks, it was the big gap of seemingly no activity which was more concerning. The trace includes screen shots and it's clear to see that where the user was witnessing the browser lock up was at the beginning of this gap. Even when we asked for more details from the trace in the Chrome settings (i.e. Advanced paint instrumentation and (under Experiments) Timeline: show all events, Timeline: event initiators and Timeline: V8 Runtime Call Stats on Timeline this area remained unpopulated.
Chrome Tracing
Chrome Tracing for Fun and Profit
We were still not seeing something so we moved deeper. Enter chrome://tracing/. This is another way to run a performance trace in Chrome but it offers more detail, customisation and purpose built out-of-the-box traces. (Chrome has a lot of tools available in this manner. You can see a listing of them if you browse to chrome://chrome-urls/). We asked the customer for more traces and (for good measure) done with varying profiles that are presented when beginning the trace. We were particular interested in seeing the "Chrome developer (overall)" and "Rending" output. Again, the customer obliged but it was actually the "Web developer" trace that seemed to best show where time was being spent.
This kind of long running task, inside the main rendering loop, suggested Chromium was getting caught up either on constructing or handling inter-process communication to do with the accessibility tree. Clicking on the little magnifying class will also take you to the relevant piece of code in the Chromium source, so I could find that function and see that it looked like it indeed had to do with event processing.
https://source.chromium.org/chromium/chromium/src/+/main:content/renderer/accessibility/render_accessibility_impl.cc;l=663
The high CPU was also something we could use to highlight where this hang was taking place. With SysInternals Suite's Process Explorer (Chrome has a built-in Task Manager (Ctrl+Esc) but, again, it's pretty high level), once we were able to reproduce the issue locally, we could see that there was a particular instance of chrome.exe that was going to high CPU during the hang. Inspecting this process gave us information that it was launched with the following arguments....
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --type=renderer --field-trial-handle=1680,15553868430398110585,11130752057137292634,131072 --lang=en-GB --origin-trial-disabled-features=SecurePaymentConfirmation --device-scale-factor=1 --num-raster-threads=4 --enable-main-frame-before-activation --renderer-client-id=17137 --no-v8-untrusted-c
...which, again, pointed the finger squarely at the renderer. Suspending the process in Process Explorer we were able to do a full dump of the process at that time and examine it with WinDbg* (actually, inspecting with Process Explorer would have given the same information in a more friendly manner, but I didn't know that at the time) and looking closer at what threads are active showed that one in particular had a call stack...
# Child-SP RetAddr Call Site 00 0000004b`1f9ff2a8 00007fff`9e40bba3 ntdll!NtRemoveIoCompletion+0x14 01 0000004b`1f9ff2b0 00007fff`45371476 KERNELBASE!GetQueuedCompletionStatus+0x53 02 0000004b`1f9ff310 00007fff`45371392 chrome!GetHandleVerifier+0x1199196 03 0000004b`1f9ff450 00007fff`4537129f chrome!GetHandleVerifier+0x11990b2 04 0000004b`1f9ff4b0 00007fff`425aa151 chrome!GetHandleVerifier+0x1198fbf 05 0000004b`1f9ff540 00007fff`42fde87c chrome!Ordinal0+0x4a151 06 0000004b`1f9ff590 00007fff`431eec48 chrome!ChromeMain+0x6376c 07 0000004b`1f9ff5f0 00007fff`431ee9d4 chrome!ChromeMain+0x273b38 08 0000004b`1f9ff740 00007fff`431ee808 chrome!ChromeMain+0x2738c4 09 0000004b`1f9ff7a0 00007fff`42d44c9f chrome!ChromeMain+0x2736f8 0a 0000004b`1f9ff830 00007fff`a1db7974 chrome!IsSandboxedProcess+0x34447f 0b 0000004b`1f9ff8b0 00007fff`a1f1a2f1 kernel32!BaseThreadInitThunk+0x14 0c 0000004b`1f9ff8e0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
...which suggested it was waiting for some sort of event processing to complete. It could have been possible to build a debug version of Chrome to either break into what the process was doing or to add more logging, however, we were satisfied that it was a Chromium renderer issue at this point and investigate further was probably not a good investment of our time.
WinDbg* WinDbg is available in the Windows 10 SDK. If Visual Studio 2019 is present, it can also be installed through that.
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/opening-a-crash-dump-file-using-windbg
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/calls-window
Conclusion
While we had been able to reproduce we tested the form via a process of elimination (yes, brute force), removing and moving controls around to see if it had any effect. We were able to determine that the problem was most evident if there were initial tabs visible on first load that contained one or more form controls (i.e. non-empty and non-grid). It didn't seem to be any type of control in particular that caused the problem. Any series of basic form controls would do it. With this in mind, we could think in terms of mitigation and providing the customer with a workaround while this Chrome issue was in play, by delaying the visibility of tab contents until after the greater browser window had completed it's layout activity. The hope was that by staggering the presentation of parts of the window, that the renderer would have a simpler time of it and we would reduce the amount of IPC generated.
Other Things
These are some other things we tried which didn't lead anywhere, but maybe useful when pursuing future issues:
The customer had complained that the issue also occurred on Edge (which initially discouraged thoughts that it could be a Chrome problem, until someone here mentioned that Edge is now built on Chromium🙄)
When we started to suspect that processing around accessibility was at fault, we wanted to see if running without it made any difference for the customer. This is possible to do from chrome://accessibility/ and/or running Chrome with the command line flag, --disable-renderer-accessibility. I believed this required all chrome processes to be restarted and also wasn't going to be a solution for the customer. It was simply done in hopes of isolating where Chrome was having the problem.
We tried Chrome's Incognito Mode to reproduce the problem. This would have given us an idea that perhaps the customer had a plugin or extension that was behind the problem
The customer sent through network activity logs initially (simply because they are used to these sorts of problems occurring) and HAR (http archive) files can be viewed easily at http://www.softwareishard.com/har/viewer/
We tried running Chrome with verbose logging enable. The customer was able to output the logs to a file, but it was easier to visualise them with Sawbuck in real-time as the problem was occurring (it showed that no logs were being output during the hang).













