Tag Archives: Engineer Tips

Percepio Trace: Increasing response time

Discover how a developer used Tracealyzer to compare runtime behaviors and increase response time.

With time-to-market pressures constantly on the rise, advanced visualization support is a necessity nowadays. For those who may be unfamiliar with Percepio, the company has sought out to accelerate embedded software development through world-leading RTOS tracing tools. Tracealyzer provides Makers, engineers and developers alike a new level of insight into the run-time world, allowing for improved designs, faster troubleshooting and higher performance. What has made it such a popular choice among the community is that it works with a wide-range of operating systems and is available for Linux and FreeRTOS, among several others.


When developing advanced multi-threaded software systems, a traditional debugger is often insufficient for understanding the behavior of the integrated system, especially regarding timing issues. Tracealyzer is able to visualize the run-time behavior through more than 20 innovative views that complement the debugger perspective. These views are interconnected in intuitive ways which makes the visualization system powerful and easy to navigate. Beyond that, it seamlessly integrates with Atmel Studio 6.2, providing optimized insight into the run-time of embedded software with advanced trace visualization.

Over the next couple of months, we will be sharing step-by-step tutorials from the Percepio team, collected directly from actual user experiences with Tracealyzer. In the latest segment, how a developer used Tracealyzer to solve an issue with a randomly occurring reset; today, we’re exploring how the tool can increase response time.

In this scenario, a user had developed a networked system containing a TCP/IP stack, a Flash file system and an RTOS running on an ARM Cortex-M4 microcontroller. The system was comprised of several RTOS tasks, including a server-style task that responds to network requests and a log file spooler task. The response time on network requests had often been an issue, and when testing their latest build, the system responded even slower than before. So, as one can imagine, they really wanted to figure this out!

But when comparing the code of the previous and new version, they could not find any obvious reason for the lower response time of the server task. There were some minor changes due to refactoring, but no significant functions had been added. However, since other tasks had higher scheduling priority than the server task, there could be many other causes for the increased response time. Therefore, they decided to use Tracealyzer to compare the runtime behaviors of the earlier version and the new version, in order to see the differences.

They recorded traces of both versions in similar conditions and began at the comparison at the highest level of abstraction, i.e., the statistics report (below). This report can display CPU usage, number of executions, scheduling priorities, but also metrics like execution time and response time calculated per each execution of each task and interrupt.


As expected, the statistics report revealed that response times were, in fact, higher in the new version — about 50% higher on average. The execution times of the server task were quite similar, only about 7% higher in the latter. Reason for the greater response time, other tasks that interfere.

To determine out what was causing this disparity, one can simply click on the extreme values in the statistics report. This focuses the main trace view on the corresponding locations, enabling a user to see the details. By opening two parallel instances of Tracealyzer, one for each trace, you can now compare and see the differences — as illustrated below.


Since the application server task performed several services, two user events have been added to mark the points where the specific request are received and answered, labeled “ServerLog.” The zoom levels are identical, so you can clearly see the higher response time in the new version. What’s more, this also shows that the logger task preempts the server task 11 times, compared to only 6 times in the earlier version — a pretty significant difference. Moreover, it appears that the logger task is running on higher priority than server task, meaning every logging call preempts the server task.

So, there seems to be new logging calls added in the new version causing the logger task to interfere more with the server task. In order to observe what is logged, add a user event in the logger task to show the messages in the trace view. Perhaps some can be removed to improve performance?


Now, it’s evident that also other tasks generate logging messages that affect the server task response time. For instance, the ADC_0 task. To see all tasks sending messages to the logger task, one can use the communication flow view — as illustrated below.


The communication flow view is a dependency graph showing a summary of all operations on message queues, semaphores and other kernel objects. Here, this view is for the entire trace, but can be generated for a selected interval (and likewise for the statistics report) as well. For example, a user can see how the server task interacts with the TCP/IP stack. Note the interrupt handler named “RX_ISR” that triggers the server task using a semaphore, such as when there is new data on the server socket, and the TX task for transmitting over the network.

But back to the logger task, the communication flow reveals five tasks that sends logging messages. By double-clicking on the “LoggerQueue” node in the graph, the Kernel Object History view is opened and shows all operations on this message queue.


As expected, you can see that logger task receives messages frequently, one at a time, and is blocked after each message, as indicated by the “red light.”

Is this a really good design? It is probably not necessary to write the logging messages to file one-by-one. If increasing the scheduling priority of server task above that of the logger task, the server task would not be preempted as frequently, and thus, would be able to respond faster. The logging messages would be buffered in LoggerQueue until the server task (and other high priority tasks) has completed. Only then would the logger task be resumed and process all buffered messages in a batch.

By trying that, these screenshot below demonstrates the server task instance with highest response time, after increasing its scheduling priority above the logger task.


The highest response time is now just 5.4 ms instead of 7.5 ms, which is even faster than in the earlier version (5.7 ms) despite more logging. This is because the logger task is no longer preempting the server task, but instead processes all pending messages in a batch after server is finished. Here, one can also see “event labels” for the message queue operations. As expected, there are several “xQueueSend” calls in sequence, without blocking (= red labels) or task preemptions. There are still preemptions by the ADC tasks, but this no longer cause extra activations of the logger task. Problem solved!

The screenshot below displays LoggerQueue after the priority change. In the right column, one see how the messages are buffered in the queue, enabling the server task to respond as fast as possible, and the logging messages are then processed in a batch.