Tag Archives: Percepio Tracealyzer

“It’s not a feature, it’s a bug”


Embedded systems no longer need to be a ‘black box’ that leaves engineers guessing what may be happening, Percepio AB CEO Dr. Johan Kraft explains his latest guest blog post.


Anyone involved with software development will have most likely heard (and perhaps even said) the phrase “it’s not a bug, it’s a feature” at some point, and while its origins remain a mystery, its sentiment is clear — it’s a bug that we haven’t seen before.

connected_views

Intermittent ‘features’ in an embedded system can originate in either the software or hardware domain, often only evident when certain conditions collide in both. In the hardware domains, the timings involved may be parts of a nano second and where the logic is accessible, such as an address line or data bus — there exist instruments that can operate at high sample rates, allowing engineers to visualize and verify such ‘glitches.’ In the software domain, this becomes much more challenging.

Sequential Processing

While parallel processing is being rapidly adopted across all applications, single-processor systems remain common in embedded systems, thanks partly to the continued increases in the performance of microcontroller cores. Embedded MCUs are now capable of executing a range of increasingly sophisticated Real-Time Operating Systems (RTOS), often including the ability to run various communication protocols for both wired and wireless interfaces.

Whether in a single- or multi-processing system, combining these tasks with the embedded system’s main application, written by the engineering team, can make embedded software builds large, complex and difficult to fault-find, particularly when visibility into the code’s execution is limited. It can also lead to the dreaded intermittent fault which, if part of the system’s operation is ‘hidden’, can make solving them even more challenging.

A typical example may be an unexplained delay in a scheduled task. Of course, an RTOS is intended to guarantee specific tasks happen at specific times but this can be dependent on the task’s priority and what else may be happening at any time. In one real-world example, where a sensor needed to be sampled every 5ms, it was found that occasionally the delay between samples reached 6.5ms, with no simple explanation as to the cause. In another example, a customer reported that their system exhibited random resets; the suspected cause was that the watchdog was expiring before it was serviced, but how could this be checked? In yet another example, a system running a TCP/IP stack showed slower response times to network requests after minor changes in the code, for no obvious reason.

These are typical examples of how embedded systems running complex software can behave in unforeseen ways, leaving engineering teams speculating on the causes and attempting to solve the problems with only empirical results from which to assess their efforts. In the case of intermittent faults or system performance fluctuations, this is clearly an inefficient and unreliable development method.

Trace Tools

The use of logging software embedded in a build in order to record certain actions isn’t new, of course, and it can offer a significantly improved level of visibility into a system. However, while the data generated by such trace software is undoubtedly valuable, exploiting that value isn’t always simple.

Analyzing trace data and visually rendering it in various ways is the key function of Percepio’s Tracealyzer tools. It offers visualization at many levels, ranging from an event list to high-level dependency graphs and advanced statistics.

Over 20 different graphical views are provided, showing different aspects of the software’s execution that are unavailable with debuggers alone, and as such it complements existing software debug tools in a way that is becoming essential in today’s complex embedded systems. It supports an increasing range of target operating systems.

Figure 1(a): It appears that the ControlTask may be disabling interrupts.

Figure 1(a): It appears that the ControlTask may be disabling interrupts.

The main view in Tracealyzer, as shown in Figure 1(a) and 1(b), is a vertical timeline visualizing the execution of tasks/threads and interrupts. Other logged events, such as system calls, are displayed as annotations in this timeline, using horizontal colour-coded text labels. Several other timeline views are provided using horizontal orientation and all horizontal views can be combined on a common horizontal timeline. While much important data is created by the operating system’s kernel, developers can also extend the tracing with User Events, which allow any event or data in a user’s application to be logged. They are logged similar to calling the classic ‘printf’ C library function but are much faster as the actual formatting is handled in the host-side application, and can therefore also be used in time-critical code such as interrupt handlers. And, of course, they can also be correlated with other kernel-based events.

Figure 1(b): By changing the way ControlTask protects a critical section, SamplerTask is able to run as intended.

Figure 1(b): By changing the way ControlTask protects a critical section, SamplerTask is able to run as intended.

Tracealyzer understands the general meaning of many kernel calls, for instance locking a Mutex or writing to a message queue. This allows Tracealyzer to perform deep analysis to connect related events and visualize dependencies, e.g., which tasks communicate (see the communication flow graph, shown in Figure 3). This allows developers to quickly understand what’s really going on inside their system.

Insights

Returning to the first example, where a scheduled task was being inexplicably delayed intermittently, Tracealyzer was used to graphically show the task in question, time-correlated with other tasks. By invoking an exploded view of the task of interest, it was found that a lower priority task was incorrectly blocking the primary task from executing. It was discovered that the second task was disabling interrupts to protect a critical section unrelated to the primary task, which blocked the operating system scheduling. After changing the second task to using a Mutex instead, the primary task was able to meet its timing requirements. Figure 1(a) shows the SamplerTask being delayed by the (lower priority) ControlTask before the bug fix; Figure 1(b) confirms that SamplerTask is now occurring every 5ms as intended.

In the second example, User Events were used to not only record when the Watchdog was reset or when it expired, but also to log the remaining Watchdog timer value, thereby showing the time left in the Watchdog timer when it is reset. By inspecting the logged system calls it was found that the task in question did not only reset the Watchdog timer; it also posted a message to another task using a (fixed-size) message queue. The Watchdog resets seemed to occur while the Watchdog task was blocked by this message posting. Once realised, the question then became ‘why’. By visually exploring the operations on this message queue using the Kernel Object History view, it became clear that the message queue sometimes becomes full, as suspected. By correlating a view of the CPU load against how the Watchdog timer margin varied over time, as shown in Figure 2, it was found that Fixed Priority Scheduling was allowing a medium-priority task (ServerTask) to use so much CPU time that the message queue wasn’t always being read. Instead, it became full, leading to a Watchdog reset. The solution was in this case to modify the task priorities.

Figure 2: The CPU Load graph, correlated to the Watchdog Timer User Event, gives valuable insights.

Figure 2: The CPU Load graph, correlated to the Watchdog Timer User Event, gives valuable insights.

In the last example, where a software modification caused increased response time to network requests, using the Communications Flow view (Figure 3) it was found that one particular task — Logger — was receiving frequent but single messages with diagnostics data to be written to a device file system, each causing a context switch. By modifying the task priorities, the messages were instead buffered until the network request had finished and thereafter handled in a batch. This way, the number of context-switches during the handling of network requests was drastically reduced, thereby improving overall system responsiveness.

Figure 3: The Communication Flow reveals 5 tasks sending messages to Logger.

Figure 3: The Communication Flow reveals 5 tasks sending messages to Logger.

Conclusion

The complexity of embedded software is increasing rapidly, creating demand for improved development tools. While runtime data can be recorded in various ways, understanding its meaning isn’t a simple process, but through the use of innovative data visualization tools such as Tracealyzer it can be.

Many companies have already benefited from the many ways of using the tool to really discover what’s going on in the runtime system. Some Tracealyzer users even include it in production code, allowing them to gather invaluable data about real systems running in the field.

Embedded systems need no longer be a ‘black box,’ leaving engineers to suppose what may be happening; powerful visualization tools now turn that black box into an open box.

Percepio Trace: Increasing response time


Discover how a developer used Tracealyzer to compare runtime behaviors and increase response time.


With time-to-market pressures constantly on the rise, advanced visualization support is a necessity nowadays. For those who may be unfamiliar with Percepio, the company has sought out to accelerate embedded software development through world-leading RTOS tracing tools. Tracealyzer provides Makers, engineers and developers alike a new level of insight into the run-time world, allowing for improved designs, faster troubleshooting and higher performance. What has made it such a popular choice among the community is that it works with a wide-range of operating systems and is available for Linux and FreeRTOS, among several others.

connected_views

When developing advanced multi-threaded software systems, a traditional debugger is often insufficient for understanding the behavior of the integrated system, especially regarding timing issues. Tracealyzer is able to visualize the run-time behavior through more than 20 innovative views that complement the debugger perspective. These views are interconnected in intuitive ways which makes the visualization system powerful and easy to navigate. Beyond that, it seamlessly integrates with Atmel Studio 6.2, providing optimized insight into the run-time of embedded software with advanced trace visualization.

Over the next couple of months, we will be sharing step-by-step tutorials from the Percepio team, collected directly from actual user experiences with Tracealyzer. In the latest segment, how a developer used Tracealyzer to solve an issue with a randomly occurring reset; today, we’re exploring how the tool can increase response time.


In this scenario, a user had developed a networked system containing a TCP/IP stack, a Flash file system and an RTOS running on an ARM Cortex-M4 microcontroller. The system was comprised of several RTOS tasks, including a server-style task that responds to network requests and a log file spooler task. The response time on network requests had often been an issue, and when testing their latest build, the system responded even slower than before. So, as one can imagine, they really wanted to figure this out!

But when comparing the code of the previous and new version, they could not find any obvious reason for the lower response time of the server task. There were some minor changes due to refactoring, but no significant functions had been added. However, since other tasks had higher scheduling priority than the server task, there could be many other causes for the increased response time. Therefore, they decided to use Tracealyzer to compare the runtime behaviors of the earlier version and the new version, in order to see the differences.

They recorded traces of both versions in similar conditions and began at the comparison at the highest level of abstraction, i.e., the statistics report (below). This report can display CPU usage, number of executions, scheduling priorities, but also metrics like execution time and response time calculated per each execution of each task and interrupt.

1

As expected, the statistics report revealed that response times were, in fact, higher in the new version — about 50% higher on average. The execution times of the server task were quite similar, only about 7% higher in the latter. Reason for the greater response time, other tasks that interfere.

To determine out what was causing this disparity, one can simply click on the extreme values in the statistics report. This focuses the main trace view on the corresponding locations, enabling a user to see the details. By opening two parallel instances of Tracealyzer, one for each trace, you can now compare and see the differences — as illustrated below.

2

Since the application server task performed several services, two user events have been added to mark the points where the specific request are received and answered, labeled “ServerLog.” The zoom levels are identical, so you can clearly see the higher response time in the new version. What’s more, this also shows that the logger task preempts the server task 11 times, compared to only 6 times in the earlier version — a pretty significant difference. Moreover, it appears that the logger task is running on higher priority than server task, meaning every logging call preempts the server task.

So, there seems to be new logging calls added in the new version causing the logger task to interfere more with the server task. In order to observe what is logged, add a user event in the logger task to show the messages in the trace view. Perhaps some can be removed to improve performance?

3

Now, it’s evident that also other tasks generate logging messages that affect the server task response time. For instance, the ADC_0 task. To see all tasks sending messages to the logger task, one can use the communication flow view — as illustrated below.

commu

The communication flow view is a dependency graph showing a summary of all operations on message queues, semaphores and other kernel objects. Here, this view is for the entire trace, but can be generated for a selected interval (and likewise for the statistics report) as well. For example, a user can see how the server task interacts with the TCP/IP stack. Note the interrupt handler named “RX_ISR” that triggers the server task using a semaphore, such as when there is new data on the server socket, and the TX task for transmitting over the network.

But back to the logger task, the communication flow reveals five tasks that sends logging messages. By double-clicking on the “LoggerQueue” node in the graph, the Kernel Object History view is opened and shows all operations on this message queue.

Clip

As expected, you can see that logger task receives messages frequently, one at a time, and is blocked after each message, as indicated by the “red light.”

Is this a really good design? It is probably not necessary to write the logging messages to file one-by-one. If increasing the scheduling priority of server task above that of the logger task, the server task would not be preempted as frequently, and thus, would be able to respond faster. The logging messages would be buffered in LoggerQueue until the server task (and other high priority tasks) has completed. Only then would the logger task be resumed and process all buffered messages in a batch.

By trying that, these screenshot below demonstrates the server task instance with highest response time, after increasing its scheduling priority above the logger task.

New

The highest response time is now just 5.4 ms instead of 7.5 ms, which is even faster than in the earlier version (5.7 ms) despite more logging. This is because the logger task is no longer preempting the server task, but instead processes all pending messages in a batch after server is finished. Here, one can also see “event labels” for the message queue operations. As expected, there are several “xQueueSend” calls in sequence, without blocking (= red labels) or task preemptions. There are still preemptions by the ADC tasks, but this no longer cause extra activations of the logger task. Problem solved!

The screenshot below displays LoggerQueue after the priority change. In the right column, one see how the messages are buffered in the queue, enabling the server task to respond as fast as possible, and the logging messages are then processed in a batch.

PErc

Percepio Tracealyzer: The mysterious random watchdog reset


Learn how a developer used Tracealyzer to solve an issue with a randomly occurring reset.


With time-to-market pressures constantly on the rise, advanced visualization support is a necessity nowadays. For those who may be unfamiliar with Percepio, the company has sought out to accelerate embedded software development through world-leading RTOS tracing tools. Tracealyzer provides Makers, engineers and developers alike a new level of insight into the run-time world, allowing for improved designs, faster troubleshooting and higher performance. What has made it such a popular choice among the community is that it works with a wide-range of operating systems and is available for Linux and FreeRTOS, among several others.

connected_views

When developing advanced multi-threaded software systems, a traditional debugger is often insufficient for understanding the behavior of the integrated system, especially regarding timing issues. Tracealyzer is able to visualize the run-time behavior through more than 20 innovative views that complement the debugger perspective. These views are interconnected in intuitive ways which makes the visualization system powerful and easy to navigate. Beyond that, it seamlessly integrates with Atmel Studio 6.2, providing optimized insight into the run-time of embedded software with advanced trace visualization.

Over the next couple of months, we will be sharing step-by-step tutorials from the Percepio team, collected directly from actual user experiences with Tracealyzer. Last week, we explored how to use the tools to solve an unexpected timing issue; today, we’re delving deeper into the mysterious random watchdog reset.


In this scenario, a developer had an issue with a randomly occurring reset. By placing a breakpoint in the reset exception handler, he had discovered that the watchdog timer had expired. The watchdog timer was supposed to be reset in a high priority task that executed periodically, however that was not the case.

The ability to insert custom User Events comes in handy here. They are similar to a classic “printf()” call and events that were added when the watchdog timer was reset and when it expired. User events also support data arguments, and this has been used to log the timer value (just before resetting it) to see the watchdog “margin,” i.e. remaining time. The result can be seen below, in the yellow text labels.

Fig1

From this, we can see that the SamplerTask is running, but it does not clear the watchdog timer in the last execution of the task, which resets the system after a while (“Watchdog reset!”). So why didn’t SamplerTask reset the watchdog timer, you ask? Let’s enable kernel service calls to see what the task was doing.

Fig2

The last event of SamplerTask is a call to xQueueSend, a kernel function that puts a message in a message queue. Note that the label is red, meaning that the xQueueSend call blocked the task, which caused a context-switch to ServerTask before the watchdog timer had been reset. This, of course, caused it to expire and reset the system.

Why was xQueueSend blocking the task? By double-clicking on this event label, we open the Object History View, showing all operations on this particular queue, “ControlQueue” — as illustrated below.

Fig3

The rightmost column shows a visualization of the buffered messages. There, we can see that the message queue already contains five messages and is probably full, hence the blocking. But the ControlTask is supposed to read the queue and make room, so why hasn’t this worked as expected?

To investigate this bit further, it would be interesting to see how the watchdog margin varies over time. This information can be found in the user event logging, and by employing the User Event Signal Plot, we can plot the watchdog margin over time. Aside from that, by adding a CPU Load Graph on the same timeline, we can observe how the task execution affects the watchdog margin — as shown below.

UserCase

In the CPU Load Graph, we can note that the ServerTask is executing a lot in the second half of the trace, and subsequently, this seems to impact the watchdog margin. ServerTask (bright green) has higher priority than ControlTask (dark green), so when it is executing a lot in the end of the trace, we see that ControlTask is getting less CPU time. This is an inherit effect of Fixed Priority Scheduling, which is used by most RTOS. Most likely, this could cause the full message queue, since ControlTask might not be able to read messages quick enough when the higher priority ServerTask is utilizing most of the CPU time. This is an example of a Priority Inversion problem, as the SamplerTask is blocked by an unrelated task of lower priority. A solution could be to change the scheduling priorities, meaning that the ControlTask gets higher priority than ServerTask. Let’s try that and visualize how it would look.

Fig

The screenshot above reveals the result of switching the task scheduling priorities between ServerTask and ControlTask. The system now shows a much more stable behavior. The CPU load of SamplerTask (highlighted in red) is quite steady around 20%, indicating a stable periodic behavior, while the watchdog margin is a perfect “line” — always at 10 ms. It does not expire anymore… problem solved! (Note that the task colors have changed due to the change in relative priority levels.)

Tracealyzer provides over 20 interactive views of the run-time world, connected in clever ways. It is available for several leading real-time operating systems as well as for Linux. This can be found on the company’s website here.

Percepio Tracealyzer: Periodic task not running as expected


Learn how a developer used Tracealyzer to solve an unexpected timing issue. 


With time-to-market pressures constantly on the rise, advanced visualization support is a necessity nowadays. For those who may be unfamiliar with Percepio, the company has sought out to accelerate embedded software development through world-leading RTOS tracing tools. Tracealyzer provides Makers, engineers and developers alike a new level of insight into the run-time world, allowing for improved designs, faster troubleshooting and higher performance. What has made it such a popular choice among the community is that it works with a wide-range of operating systems and is available for Linux and FreeRTOS, among several others.

connected_views

When developing advanced multi-threaded software systems, a traditional debugger is often insufficient for understanding the behavior of the integrated system, especially regarding timing issues. Tracealyzer is able to visualize the run-time behavior through more than 20 innovative views that complement the debugger perspective. These views are interconnected in intuitive ways which makes the visualization system powerful and easy to navigate. Beyond that, it seamlessly integrates with Atmel Studio 6.2, providing optimized insight into the run-time of embedded software with advanced trace visualization.

Over the next couple of weeks, we will be sharing step-by-step tutorials from the Percepio team, collected directly from actual user experiences with the Tracealyzer tools.


In the first case, a user had an issue with a periodic task not running as expected, here called SamplerTask. This was the highest priority task in the system and should execute every 5 ms to read a sensor. However, they found significant timing variations in the data sampling. Why, you ask?

With Tracealyzer, they soon found the Actor Instance Graph, revealing run-time metrics of individual task executions, as shown below. There are several such graphs, one for each supported metric, where each data point represents one execution of the task. The X-axis notes the start time of the instance and the Y-axis shows the specific metric, which in this case is “Periodicity – from Ready” (i.e., the time in between task activations).

Chart1

As you can see in the above screenshot, the SamplerTask is not executing strictly every 5 ms as intended. While it usually does, there are cases with unusually long intervals in between activations — over 6.5 ms.

So, what exactly is going on during these instances? If we add a horizontal trace view, synchronized and with the other tasks enabled, we see that the ControlTask (dark green) is running in this interval. Is this causing the problem? But SamplerTask (orange) has higher priority, so it should preempt ControlTask.

Chart2

Perhaps the ControlTask is disabling interrupts for a few milliseconds and thereby also the kernel tick interrupt? We can find out in the main trace view, which allows us to inspect the tick events. Let’s have a look…

By double-clicking on the data point, the corresponding interval is shown in the main trace view. This view provides a vertical timeline, with events presented as horizontal text labels. The OS tick events should occur every 1 ms in this system, but there is a gap of about 2.5 ms, starting at 2.018.000.

Chart3

That being said, it seems plausible that ControlTask is disabling interrupts, probably to protect access to a shared resource like a file system. But SamplerTask is not using that resource, so is it really necessary to block it by disabling interrupts? Let’s check how the other task, ServerTask, is handling this.

Chart4

In the lower right corner, you find the View Filter, where we have now enabled “Mutex” type kernel calls, and also User Events – generated by logging calls added to the application code. Here, we can see that the ServerTask (bright green) is using a Mutex for protecting access to the file system, but ControlTask makes no such calls, even though the logging indicates a similar operation in both tasks (“Updating File”). This can probably be solved by changing ControlTask to use the mutex (FS_Mutex) instead of disabling interrupts.

Let’s see what happens if we change this. In the screenshot below, it’s apparent that SamplerTask is now running periodically every 5 ms, as intended.

Chart5

When looking at the detailed trace (below), we verify that ControlTask is now using FS_Mutex and the OS ticks are running periodically, even in the critical section, and lets SamplerTask preempt as intended.

Chart6

Stay tuned for the next use case – the mysterious random watchdog reset.