Tag Archives: ARM Cortex-M7 processor

6 memory considerations for Cortex-M7-based IoT designs


Taking a closer look at the configurable memory aspects of Cortex-M7 microcontrollers.


Tightly coupled memory (TCM) is a salient feature in the Cortex-M7 lineup as it boosts the MCU’s performance by offering single cycle access for the CPU and by securing the high-priority latency-critical requests from the peripherals.

Cortex-M7-chip-diagramLG

The early MCU implementations based on the ARM’s M7 embedded processor core — like Atmel’s SAM E70 and S70 chips — have arrived in the market. So it’d be worthwhile to have a closer look at the configurable memory aspects of M7 microcontrollers and see how the TCMs enable the execution of deterministic code and fast transfer of real-time data at the full processor speed.

Here are some of the key findings regarding the advanced memory architecture of Cortex-M7 microcontrollers:

1. TCM is Configurable

First and foremost, the size of TCM is configurable. TCM, which is part of the physical memory map of the MCU, supports up to 16MB of tightly coupled memory. The configurability of the ARM Cortex-M7 core allows SoC architects to integrate a range of cache sizes. So that industrial and Internet of Things product developers can determine the amount of critical code and real-time data in TCM to meet the needs of the target application.

The Atmel | SMART Cortex-M7 architecture doesn’t specify what type of memory or how much memory should be provided; instead, it leaves these decisions to designers implementing M7 in a microcontroller as a venue for differentiation. Consequently, a flexible memory system can be optimized for performance, determinism and low latency, and thus can be tuned to specific application requirements.

2. Instruction TCM

Instruction TCM or ITCM implements critical code with deterministic execution for real-time processing applications such as audio encoding/decoding, audio processing and motor control. The use of standard memory will lead to delays due to cache misses and interrupts, and therefore will hamper the deterministic timing required for real-time response and seamless audio and video performance.

The deterministic critical software routines should be loaded in a 64-bit instruction memory port (ITCM) that supports dual-issue processor architecture and provide single-cycle access for the CPU to boost MCU performance. However, developers need to carefully calibrate the amount of code that need zero-wait execution performance to determine the amount of ITCM required in an MCU device.

The anatomy of TCM inside the M7 architecture

The anatomy of TCM inside the M7 architecture.

3. Data TCM

Data TCM or DTCM is used in fast data processing tasks like 2D bar decoding and fingerprint and voice recognition. There are two data ports (DTCMs) that provide simultaneous and parallel 32-bit data accesses to real-time data. Both instruction TCM and data TCM — used for efficient access to on-chip Flash and external resources — must have the same size.

4. System RAM and TCM

System RAM, also known as general RAM, is employed for communications stacks related to networking, field buss, high-bandwidth bridging, USB, etc. It implements peripheral data buffers generally through direct memory access (DMA) engines and can be accessed by masters without CPU intervention.

Here, product developers must remember the memory access conflicts that arise from the concurrent data transfer to both CPU and DMA. So developers must set clear priorities for latency-critical requests from the peripherals and carefully plan latency-critical data transfers like the transfer of a USB descriptor or a slow data rate peripheral with a small local buffer. Access from the DMA and the caches are generally burst to consecutive addresses to optimize system performance.

It’s worth noting that while system memory is logically separate from the TCM, microcontroller suppliers like Atmel are incorporating TCM and system RAM in a single SRAM block. That lets IoT developers share general-purpose tasks while splitting TCM and system RAM functions for specific use cases.

A single SRAM block for TCM and system memory allows higher flexibility and utilization

A single SRAM block for TCM and system memory allows higher flexibility and utilization.

5. TCM Loading

The Cortex-M7 uses a scattered RAM architecture to allow the MCU to maximize performance by having a dedicated RAM part for critical tasks and data transfer. The TCM might be loaded from a number of sources, and these sources aren’t specified in the M7 architecture. It’s left to the MCU designers whether there is a single DMA or several data loading points from various streams like USB and video.

It’s imperative that, during the software build, IoT product developers identify which code segments and data blocks are allocated to the TCM. This is done by embedding programs into the software and by applying linker settings so that software build appropriately places the code in memory allocation.

6. Why SRAM?

Flash memory can be attached to a TCM interface, but the Flash cannot run at the processor clock speed and will require caching. As a result, this will cause delays when cache misses occur, threatening the deterministic value proposition of the TCM technology.

DRAM technology is a theoretical choice but it’s cost prohibitive. That leaves SRAM as a viable candidate for fast, direct and uncached TCM access. SRAM can be easily embedded on a chip and permits random accesses at the speed of the processor. However, cost-per-bit of SRAM is higher than Flash and DRAM, which means it’s critical to keep the size of the TCM limited.

Atmel | SMART Cortex-M7 MCUs

Take the case of Atmel’s SMART SAM E70, S70 and V70/71 microcontrollers that organize SRAM into four memory banks for TCM and System SRAM parts. The company has recently started shipping volume units of its SAM E70 and S70 families for the IoT and industrial markets, and claims that these MCUs provide 50 percent better performance than the closest competitor.

SAM-E70_S70_BlockDiagram_Lg_929x516

Atmel’s M7-based microcontrollers offer up to 384KB of embedded SRAM that is configurable as TCM or system memory for providing IoT designs with higher flexibility and utilization. For instance, E70 and S70 microcontrollers organize 384KB of embedded SRAM into four ports to limit memory access conflicts. These MCUs allocate 256KB of SRAM for TCM functions — 128 KB for ITCM and DTCM each — to deliver zero wait access at 300MHz processor speed, while the remaining 128KB of SRAM can be configured as system memory running at 150MHz.

However, the availability of an SRAM block organized in the form of a memory bank of 384KB means that both system SRAM and TCM can be used at the same time.The large on-chip SRAM of 384KB is also critical for many IoT devices, since it enables them to run multiple communication stacks and applications on the same MCU without adding external memory. That’s a significant value proposition in the IoT realm because avoiding external memories lowers the BOM cost, reduces the PCB footprint and eliminates the complexity in the high-speed PCB design.

Why do drones love the Atmel SAM E70?


Eric Esteve explains why the latest Cortex-M7 MCU series will open up countless capabilities for drones other than just flying. 


By nature, avionics is a mature market requiring the use of validated system solution: safety is an absolute requirement, while innovative systems require a stringent qualification phase. That’s why the very fast adoption of drones as an alternative solution for human piloted planes is impressive. It took 10 or so years for drones to become widely developed and employed for various applications, ranging from war to entertainment, with prices spanning a hundreds of dollars to several hundreds of thousands. But, even if we consider consumer-oriented, inexpensive drones, the required processing capabilities not only call for high performance but versatile MCU as well, capable of managing its built-in gyroscope, accelerator, geomagnetic sensor, GPS, rotational station, four to six-axis control, optical flow and so on.

Drone-camera-use-cases-for-atmel-sam-e70

When I was designing for avionics, namely the electronic CFM56 motor control (this reactor being jointly developed by GE in the U.S. and Snecma in France, equipping Boeing and Airbus planes), the CPU was a multi-hundred dollar Motorola 68020, leading to a $20 per MIPS cost! While I may not know the Atmel | SMART SAM E70 price precisely — I would guess that it cost a few dollars — what I do I know is that the MCU is offering an excess of 600 DMIPS. Aside from its high performance, this series boasts a rather large on-chip memory size of up to 384KB SRAM and 2MB Flash — just one of many pivotal reasons that this MCU has been selected to support the “drone with integrated navigation control to avoid obstacle and improve stability.”

In fact, the key design requirements for this application were: +600 DMIPS, camera sensor interface, dual ADC and PWM for motor control and dual CAN, all bundled up in a small package. Looking at the block diagram below helps link the MCU features with the various application capabilities: gyroscope (SPI), accelerator (SPI x2), geomagnetic sensor (I2C x2), GPS (UART), one or two-channel rotational station (UART x2), four or six-axis control communication (CAN x2), voltage/current (ADC), analog sensor (ADC), optical flow sensor (through image sensor Interface or ISI) and pulse width modulation (PWM x8) to support the rotational station and four or six-axis speed PWM control.

For those of you who may not know, the SAM E70 is based on the ARM-Cortex M7 — a principle and multi-verse handling MCU that combines superior performance with extensive peripheral sets supporting multi-threaded processes. It’s this multi-thread support that will surely open up countless capabilities for drones other than simply flying.

Atmel | SMART ARM Cortex M7 SAM E70

Today’s drones already possess the ability to soar through the air or stay stationary, snapping pictures or capturing HD footage. That’s already very impressive to see sub-kilogram devices offering such capabilities! However, the drone market is already looking ahead, preparing for the future, with the desire to get more application stacks into the UAVs so they can take in automation, routing, cloud connectivity (when available), 4G/5G, and other wireless functionalities to enhance data pulling and posting.

For instance, imagine a small town tallying a few thousand habitants, except a couple of days or weeks per year because of a special event or holiday, a hundred thousand people come storming into the area. These folks want to feed their smartphone with multimedia or share live experiences by sending movies or photos, most of them at the same time. The 4G/5G and cloud infrastructure is not tailored for such an amount of people, so the communication system may break. Yet, this problem could be fixed by simply calling in drone backup to reinforce the communication infrastructure for that period of time.

While this may be just one example of what could be achieved with the advanced usage of drones, each of the innovative applications will be characterized by a common set of requirements: high processing performance, large SRAM and flash memory capability, and extensive peripheral sets supporting multi-threaded processes. In this case, the Cortex M7 ARM-based SAM E70 MCU is an ideal choice with processing power in excess of 640 DMIPS, large on-chip SRAM (up to 384 KB) and Flash (up to 2MB) capabilities managing all sorts of sensors, navigation, automation, servos, motor, routing, adjustments, video/audio and more.

Intrigued? You’ll want to check out some of the products and design kits below:


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of SemiWiki.com. This blog first appeared on SemiWiki on July 18, 2015.

What is real SAM V71 DSP performance in automotive audio?


The integrated FPU DSP (into the Cortex-M7 core) is using 2X the number of clock cycles when compared with the SHARC21489.


Thinking of selecting an ARM Cortex-M7-based Atmel SAM V70/71 for your next automotive entertainment application? Three key reasons to consider are the clock speed of the the Cortex-M7 (300 Mhz), the integration of a floating point (FPU) DSP, and last but not least, because the SAM V70/71 has obtained automotive qualification. If you delve deeper into the SAM V70/71 features list, you will see that this MCU is divided into several versions integrating Flash: 512 KB, 1024 KB or 2018 KB. And, if you compare with the competition, this MCU is the only Cortex-M7 supporting the 2 MB Flash option, being automotive qualified and delivering 1500 CoreMark — thanks to the 300 MHz clock speed when the closest competitor only reach 240 MHz and deliver 1200 CoreMark.

SAMV71-Auto-Infotainment-System

In fact, what makes the SAMV70/71 so unique is its FPU DSP performance. Let’s make it clear for the beginning, if you search for pure DSP performance, it will be easy to find standard DSP chip offering much higher performance. Take the Analog Device AD21489 or Blackfin70x series, for example. However, the automotive market is not only very demanding, it’s also a very cost sensitive market as well.

Think about this simple calculation: If you select AD21489 DSP, you will have to add external flash and a MCU, which would lead the total BOM to be four to five times the price associated with the SAM V71. (Let’s also keep this AD21489 as a reference in terms of performance, and examine DSP benchmark results, coming from third party DSP experts DSP Concept.)

FIR Benchmark

Before analyzing the results, we need to describe the context:

  • FIR is made on 256 samples block size
  • Results are expressed in term of clock cycles (smaller is better)
  • All DSP are floating-point except Blackfin
  • Clock cycles count is measured using Audio Weaver

To elaborate upon that even further, this FIR is used to build equalization filter — the higher Taps count, the better. If we look at the “50 Taps” benchmark results, the SAM V71 (Cortex-M7 based) exhibits 22,734 clock cycles (about three times more than the SHARC21489). Unsurprisingly, the Cortex-M4 requires 50% more, but you have to integrate a Cortex-A15 to get better results, as both the Cortex-A8 and Cortex-A9 need 30% and 40% more cycles, respectively! And when looking at standard Analog Devices Blackfin DSP, only the 70x series is better by 35%… the 53x being 30% worst.

Now, if you want to build a graphic equalizer, you will have to run Biquad. For instance, when building eight channels and six stages graphic equalizer, your DSP will have to run 48 Biquad.

Biquad Benchmark

Again, the context:

  • Biquad is made on 256 samples block size
  • Results are expressed in term of clock cycles (smaller is better)
  • All DSP are floating-point except Blackfin
  • Clock cycles count is measured using Audio Weaver

In fact, the results are quite similar to those of the FIR benchmark: only the Cortex-A15 and the SHARC21489 exhibits better performance. The integrated FPU DSP (into the Cortex-M7 core) is using twice the amount of clock cycles when put side-by-side with the SHARC21489. If you compare the performance per price, the Cortex-M7 integrated in the SAMV71 is 50% cheaper! Using a SHARC DSP certainly makes sense if you want to build high performance home cinema system, but if you target automotive, it’s much more effective to select a FPU DSP integrated together with Flash (512KB to 2MB) and a full featured MCU.

The Atmel SAM V71 is specifically dedicated to support automotive infotainment application, offering Dual CAN and Ethernet MAC support. Other notable specs include:

  • 10/100 Mbps, IEEE1588 support
  • 12 KB SRAM plus DMA
  • AVB support with Qav & Qas HW support for audio traffic support
  • 802.3az Energy efficiency support
  • Dual CAN-FD
  • Up to 64 SRAM-based mailboxes
  • Wake up from sleep or wake up modes on RX/TX

Don’t forget that when looking to construct an automotive high-end radio, you still need room for Ethernet MAC and AVB support… What’s more, the SAM V71 only consume 68% of the DSP resource, leaving well enough space for both AVB and Ethernet MAC.

Interested? Explore the Atmel | SMART SAM V ARM Cortex-M7 family here. More information about the the DSP benchmark can be also found on DSP Concept’s website.  Also, be sure the detailed DSP Concept’s audio processing benchmarks.


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of SemiWiki.com. This blog first appeared on SemiWiki on May 6, 2015.

Single chip MCU + DSP architecture for automotive = SAM V71


Automotive apps are running in production by million units per year, and cost is a crucial factor when deciding on an integrated solution.


It’s all about Cost of Ownership (CoO) and system level integration. If you target automotive related application, like audio or video processing or control of systems (Motor control, inverter, etc.), you need to integrate strong performance capable MCU with a DSP. In fact, if you expect your system to support Audio Video Bridging (AVB) MAC on top of the targeted application and to get the automotive qualification, the ARM Cortex-M7 processor-based Atmel SAMV70/71 should be your selection: offering the fastest clock speed of his kind (300 MHz), integrating a DSP Floating Point Unit (FPU), supporting AVB and qualified for automotive.

Let’s have a closer look at the SAM V71 internal architecture, shall we?

A closer look at Atmel | SMART ARM based Cortex M7 - SAMV71 internal architecture.

A closer look at Atmel | SMART ARM based Cortex M7 – SAMV71 internal architecture.

When developing a system around a microcontroller unit, you expect this single chip to support as many peripherals as needed in your application to minimize the global cost of ownership. That’s why you can see the long list of system peripherals (top left of the block diagram). Meanwhile, the Atmel | SMART SAM V71 is dedicated to support automotive infotainment application, e.g. Dual CAN and Ethernet MAC (bottom right). If we delve deeper into these functions, we can list these supported features:

  • 10/100 Mbps, IEEE1588 support
  • MII (144-pin), RMII (64-, 100, 144-pin)
  • 12 KB SRAM plus DMA
  • AVB support with Qav & Qas HW support for Audio traffic support
  • 802.3az Energy efficiency support
  • Dual CAN-FD
  • Up to 64 SRAM-based mailboxes
  • Wake up from sleep or wake up modes on RX/TX

The automotive-qualified SAM V70 and V71 series also offers high-speed USB with integrated PHY and Media LB, which when combined with the Cortex-M7 DSP extensions, make the family ideal for infotainment connectivity and audio applications. Let’s take a look at this DSP benchmark:

DSP bench-Atmel-SAM-Cortex-M7

ARM CM7 Performance normalized relative to SHARC (Higher numbers are better).

If you are not limited by budget consideration and can afford integrating one standard DSP along with a MCU, you will probably select the SHARC 21489 DSP (from Analog Devices) offering the best-in-class benchmark results for FIR, Biquad and real FFT. However, such performance has a cost, not only monetarily but also in terms of power consumption and board footprint — we can call that “Cost of Ownership.” Automotive apps are running in production by million units per year, and cost is absolutely crucial in this market segment, especially when quickly deciding to go with an integrated solution.

To support audio or video infotainment application, you expect the DSP integrated in the Cortex-M7 to be “good enough” and you can see from this benchmark results that it’s the case for Biquad for example, as ARM CM7 is equal or better than any other DSP (TI C28, Blackfin 50x or 70x) except the SHARC 21489… but much cheaper! Good enough means that the SAMV70 will support automotive audio (Biquad in this case) and keep enough DSP power for Ethernet MAC (10/100 Mbps, IEEE1588) support.

Ethernet AVB via Atmel Cortex M7

Ethernet AVB Architectures (SAM V71)

In the picture above, you can see the logical SAM V71 architectures for Ethernet AVB support and how to use the DSP capabilities for Telematics Control Unit (TCU) or audio amplifier.

Integrating a DSP means that you need to develop the related DSP code. Because the DSP is tightly integrated into the ARM CM7 core, you may use the MCU development tools (and not specific DSP tools) for developing your code. Since February, the ATSAMV71-XULT (full-featured Xplained board, SAM V71 Xplained Ultra Evaluation Kit with software package drivers supporting basic drivers, software services, libraries for Atmel SAMV71, V70, E70, S70 Cortex-M7 based microcontrollers) is available from Atmel. As this board has been built around the feature-rich SAM V71, you can develop your automotive application on the same exact MCU architecture as the part going into production.

SAMV71 Ultra Xplained - Atmel ARM Cortex M7

Versatility and Integrated DSP built into the ARM CM7 core allows for MCU development tools to be used instead of having to revert to specific DSP tools. You can develop your automotive application on exactly the same MCU architecture than the part going into production.

Interested? More information on this eval/dev board can found here.


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of SemiWiki.com. This blog first appeared on SemiWiki on April 29, 2015.