Learn how a developer used Tracealyzer to solve an unexpected timing issue.
With time-to-market pressures constantly on the rise, advanced visualization support is a necessity nowadays. For those who may be unfamiliar with Percepio, the company has sought out to accelerate embedded software development through world-leading RTOS tracing tools. Tracealyzer provides Makers, engineers and developers alike a new level of insight into the run-time world, allowing for improved designs, faster troubleshooting and higher performance. What has made it such a popular choice among the community is that it works with a wide-range of operating systems and is available for Linux and FreeRTOS, among several others.
When developing advanced multi-threaded software systems, a traditional debugger is often insufficient for understanding the behavior of the integrated system, especially regarding timing issues. Tracealyzer is able to visualize the run-time behavior through more than 20 innovative views that complement the debugger perspective. These views are interconnected in intuitive ways which makes the visualization system powerful and easy to navigate. Beyond that, it seamlessly integrates with Atmel Studio 6.2, providing optimized insight into the run-time of embedded software with advanced trace visualization.
Over the next couple of weeks, we will be sharing step-by-step tutorials from the Percepio team, collected directly from actual user experiences with the Tracealyzer tools.
In the first case, a user had an issue with a periodic task not running as expected, here called SamplerTask. This was the highest priority task in the system and should execute every 5 ms to read a sensor. However, they found significant timing variations in the data sampling. Why, you ask?
With Tracealyzer, they soon found the Actor Instance Graph, revealing run-time metrics of individual task executions, as shown below. There are several such graphs, one for each supported metric, where each data point represents one execution of the task. The X-axis notes the start time of the instance and the Y-axis shows the specific metric, which in this case is “Periodicity – from Ready” (i.e., the time in between task activations).
As you can see in the above screenshot, the SamplerTask is not executing strictly every 5 ms as intended. While it usually does, there are cases with unusually long intervals in between activations — over 6.5 ms.
So, what exactly is going on during these instances? If we add a horizontal trace view, synchronized and with the other tasks enabled, we see that the ControlTask (dark green) is running in this interval. Is this causing the problem? But SamplerTask (orange) has higher priority, so it should preempt ControlTask.
Perhaps the ControlTask is disabling interrupts for a few milliseconds and thereby also the kernel tick interrupt? We can find out in the main trace view, which allows us to inspect the tick events. Let’s have a look…
By double-clicking on the data point, the corresponding interval is shown in the main trace view. This view provides a vertical timeline, with events presented as horizontal text labels. The OS tick events should occur every 1 ms in this system, but there is a gap of about 2.5 ms, starting at 2.018.000.
That being said, it seems plausible that ControlTask is disabling interrupts, probably to protect access to a shared resource like a file system. But SamplerTask is not using that resource, so is it really necessary to block it by disabling interrupts? Let’s check how the other task, ServerTask, is handling this.
In the lower right corner, you find the View Filter, where we have now enabled “Mutex” type kernel calls, and also User Events – generated by logging calls added to the application code. Here, we can see that the ServerTask (bright green) is using a Mutex for protecting access to the file system, but ControlTask makes no such calls, even though the logging indicates a similar operation in both tasks (“Updating File”). This can probably be solved by changing ControlTask to use the mutex (FS_Mutex) instead of disabling interrupts.
Let’s see what happens if we change this. In the screenshot below, it’s apparent that SamplerTask is now running periodically every 5 ms, as intended.
When looking at the detailed trace (below), we verify that ControlTask is now using FS_Mutex and the OS ticks are running periodically, even in the critical section, and lets SamplerTask preempt as intended.
Stay tuned for the next use case – the mysterious random watchdog reset.