Tag Archives: dual-port SRAM

How to prevent execution surprises for Cortex-M7 MCU

We know the heavy weight linked with software development, in the 60% to 70% of the overall project cost.

The ARM Cortex-A series processor core (A57, A53) is well known in the high performance market segments, like application processing for smartphone, set-top-box and networking. If you look at the electronic market, you realize that multiple applications are cost sensitive and don’t need such high performance processor core. We may call it the embedded market, even if this definition is vague. The ARM Cortex-M family has been developed to address these numerous market segments, starting with the Cortex-M0 for lowest cost, the Cortex-M3 for best power/performance balance, and the Cortex-M4 for applications requiring digital signal processing (DSP) capabilities.

For the audio, voice control, object recognition, and complex sensor fusion of automotive and higher-end Internet of Things sensing, where complex algorithms for audio and video are needed for rich audio and visual capabilities, Cortex-M7 is required. ARM offers the processor core as well as the Tightly Coupled Memory (TCM) architecture, but ARM licensees like Atmel have to implement memories in such a way that the user can take full benefit from the M7 core to meet system performance and latency goals.

Figure 1. The TCM interface provides a single 64-bit instruction port and two 32-bit data ports.

The TCM interface provides a single 64-bit instruction port and two 32-bit data ports.

In a 65nm embedded Flash process device, the Cortex-M7 can achieve a 1500 CoreMark score while running at 300 MHz, offering top class DSP performance: double-precision floating-point unit and a double-issue instruction pipeline. But algorithms like FIR, FFT or Biquad need to run as deterministically as possible for real-time response or seamless audio and video performance. How do you best select and implement the memories needed to support such performance? If you choose Flash, this will require caching (as Flash is too slow) leading to cache miss risk. Whereas SRAM technology is a better choice since it can be easily embedded on-chip and permits random access at the speed of processor.

Peripheral data buffers implemented in general-purpose system SRAM are typically loaded by DMA transfers from system peripherals. The ability to load from a number of possible sources, however, raises the possibility of unnecessary delays and conflicts by multiple DMAs trying to access the memory at the same time. In a typical example, we might have three different entities vying for DMA access to the SRAM: the processor (64-bit access, requesting 128 bits for this example) and two separate peripheral DMA requests (DMA0 and DMA1, 32-bit access each). Atmel has get round this issue by organizing the SRAM into several banks as described in this picture:

Figure 2. By organizing the SRAM into banks, multiple DMA bursts can occur simultaneously with minimal latency.

By organizing the SRAM into banks, multiple DMA bursts can occur simultaneously with minimal latency.

For a chip maker designing microcontrollers, licensing ARM Cortex-M processor core provides numerous advantages. The very first is the ubiquity of the ARM core architecture, being adopted in multiple market segments to support variety of applications. If this chip maker wants to design-in a new customer, the probability that such OEM has already used ARM-based MCU is very high, and it’s very important for this OEM to be able to reuse existing code (we know the heavy weight linked with software development, in the 60% to 70% of the overall project cost). But this ubiquity generates a challenge: how do you differentiate from the competition when competitors can license exactly the same processor core?

Selecting a more aggressive technology node and providing better performance at lower cost are an option, but we understand that this advantage can disappear as soon as the competition also move to this node. Integrating larger amount of Flash is another option, which is very efficient if the product is designed on a technology that enables it to keep the pricing low enough.

If the chip maker has designed on an aggressive technology node for higher performance and offers a larger amount of Flash than the competition, it may be enough differentiation. Completing with the design of a smarter memory architecture unencumbered by cache misses, interrupts, context swaps, and other execution surprises that work against deterministic timing allow bringing strong differentiation.


If you want to more completely understand how Atmel has designed this SMART memory architecture for the Cortex-M7, I encourage you to read this white paper from Jacko Wilbrink and Lionel Perdigon entitled “Run Blazingly Fast Algorithms with Cortex-M7 Tightly Coupled Memories.” (You will have to register.) This paper describes MCUs integrating SRAM organized into four banks that can be used as general SRAM and for TCM, showing one example of a Cortex-M7 MCU being implemented in the Atmel | SMART SAM S70, SAM E70 and SAM V70/V71 families.

This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger, as well as one of the four founding members of the site. This blog was originally shared on August 6, 2015.

Atmel’s AVR UC3: Low power & ease of use

Atmel’s AVR UC3 has popped up in quite a number of use recent cases on Bits & Pieces, so today we will be taking a closer look at the stalwart microcontroller (MCU) family which is built on high-performance 32-bit AVR architecture and optimized for highly integrated applications.

Essentially, the AVR UC3 delivers high computational throughput, deterministic real-time control, low-power consumption, low system cost, high reliability and ease-of-use. As previously discussed on Bits & Pieces, the AVR CPU boasts a plethora of cutting-edge features including integer and fixed point DSP (digital signal processor) arithmetic and single-cycle multiply-accumulate instructions.

Meanwhile, a dual-port SRAM, peripheral DMA (direct memory access) controller and multi-layer, high-speed bus architecture makes the AVR UC3  core ideal for high-throughput applications. As such, AVR UC3  devices are perfectly suited for portable and battery-powered applications due to their optimized low-power properties.

Another important feature of Atmel’s AVR UC3 is picoPower technology, which allows the AVR UC3 to further extend the battery life of portable devices.

“True 1.62V operation means  selected AVR UC3 devices can utilize a 1.8V (± 10%) regulated power supply – with all functions working,” an Atmel engineering rep told Bits & Pieces. “Indeed, picoPower AVR UC3 devices consume only 650nA with the RTC (real time clock) running, enabling ultra-low sleep current combined with fast wake-up for high integrated microcontrollers.”

On the security side, selected AVR UC3 devices provide mechanisms to protect the system from unauthorized modification, flash software theft and runaway code. Atmel’s FlashVault code protection allows CPU resources and sections of code/data memory to be reserved for proprietary software IP or critical sections of code/data, while a special API (application programming interface) is used to access these resources from the rest of the code. Attempts to access these resources by circumventing this API (either by hacking or runaway code) will be aborted and result in an exception.

In terms of the Digital Signal Processor, the 32-bit AVR UC3 offers unrivaled DSP performance compared to legacy architectures.

“By including powerful instructions for single cycle, multiply accumulate and fractional multiply for various number formats, the 32-bit AVR UC3 delivers unrivaled DSP (digital signal processor) performance compared to legacy architectures,” the Atmel engineering rep noted. “In the AVR UC3  software framework more than 70 DSP functions have been assembly optimized utilizing these instructions. In short, DSP has never been easier.”

And last, but certainly not least, Atmel’s peripheral DMA (direct memory access) controller sets a new standard for data transfer efficiency. For example, if the peripheral DMA controller is not enabled, the maximum usable transfer rate on the SPI (serial peripheral interface) module would be approximately 1MBit/s – occupying the CPU with more than 50% load just moving data around. However, with the peripheral DMA controller, this bottleneck is removed and the Atmel AVR UC3 microcontroller can achieve a transfer rate of 33MBit/s on SPI and USART with only a 15% load on the CPU.

“The innovative peripheral event system in the AVR UC3 also represents a paradigm shift, as it allows the AVR UC3 to send signals (events) directly to other peripherals without involving the CPU. This ensures short and predictable response time. At the same time it offloads the CPU and reduces power consumption,” the engineering rep added.

Want to learn more about Atmel’s lineup AVR UC3 MCUs? Be sure to check out our official UC3 page here.