Tag Archives: SAM V70

6 memory considerations for Cortex-M7-based IoT designs


Taking a closer look at the configurable memory aspects of Cortex-M7 microcontrollers.


Tightly coupled memory (TCM) is a salient feature in the Cortex-M7 lineup as it boosts the MCU’s performance by offering single cycle access for the CPU and by securing the high-priority latency-critical requests from the peripherals.

Cortex-M7-chip-diagramLG

The early MCU implementations based on the ARM’s M7 embedded processor core — like Atmel’s SAM E70 and S70 chips — have arrived in the market. So it’d be worthwhile to have a closer look at the configurable memory aspects of M7 microcontrollers and see how the TCMs enable the execution of deterministic code and fast transfer of real-time data at the full processor speed.

Here are some of the key findings regarding the advanced memory architecture of Cortex-M7 microcontrollers:

1. TCM is Configurable

First and foremost, the size of TCM is configurable. TCM, which is part of the physical memory map of the MCU, supports up to 16MB of tightly coupled memory. The configurability of the ARM Cortex-M7 core allows SoC architects to integrate a range of cache sizes. So that industrial and Internet of Things product developers can determine the amount of critical code and real-time data in TCM to meet the needs of the target application.

The Atmel | SMART Cortex-M7 architecture doesn’t specify what type of memory or how much memory should be provided; instead, it leaves these decisions to designers implementing M7 in a microcontroller as a venue for differentiation. Consequently, a flexible memory system can be optimized for performance, determinism and low latency, and thus can be tuned to specific application requirements.

2. Instruction TCM

Instruction TCM or ITCM implements critical code with deterministic execution for real-time processing applications such as audio encoding/decoding, audio processing and motor control. The use of standard memory will lead to delays due to cache misses and interrupts, and therefore will hamper the deterministic timing required for real-time response and seamless audio and video performance.

The deterministic critical software routines should be loaded in a 64-bit instruction memory port (ITCM) that supports dual-issue processor architecture and provide single-cycle access for the CPU to boost MCU performance. However, developers need to carefully calibrate the amount of code that need zero-wait execution performance to determine the amount of ITCM required in an MCU device.

The anatomy of TCM inside the M7 architecture

The anatomy of TCM inside the M7 architecture.

3. Data TCM

Data TCM or DTCM is used in fast data processing tasks like 2D bar decoding and fingerprint and voice recognition. There are two data ports (DTCMs) that provide simultaneous and parallel 32-bit data accesses to real-time data. Both instruction TCM and data TCM — used for efficient access to on-chip Flash and external resources — must have the same size.

4. System RAM and TCM

System RAM, also known as general RAM, is employed for communications stacks related to networking, field buss, high-bandwidth bridging, USB, etc. It implements peripheral data buffers generally through direct memory access (DMA) engines and can be accessed by masters without CPU intervention.

Here, product developers must remember the memory access conflicts that arise from the concurrent data transfer to both CPU and DMA. So developers must set clear priorities for latency-critical requests from the peripherals and carefully plan latency-critical data transfers like the transfer of a USB descriptor or a slow data rate peripheral with a small local buffer. Access from the DMA and the caches are generally burst to consecutive addresses to optimize system performance.

It’s worth noting that while system memory is logically separate from the TCM, microcontroller suppliers like Atmel are incorporating TCM and system RAM in a single SRAM block. That lets IoT developers share general-purpose tasks while splitting TCM and system RAM functions for specific use cases.

A single SRAM block for TCM and system memory allows higher flexibility and utilization

A single SRAM block for TCM and system memory allows higher flexibility and utilization.

5. TCM Loading

The Cortex-M7 uses a scattered RAM architecture to allow the MCU to maximize performance by having a dedicated RAM part for critical tasks and data transfer. The TCM might be loaded from a number of sources, and these sources aren’t specified in the M7 architecture. It’s left to the MCU designers whether there is a single DMA or several data loading points from various streams like USB and video.

It’s imperative that, during the software build, IoT product developers identify which code segments and data blocks are allocated to the TCM. This is done by embedding programs into the software and by applying linker settings so that software build appropriately places the code in memory allocation.

6. Why SRAM?

Flash memory can be attached to a TCM interface, but the Flash cannot run at the processor clock speed and will require caching. As a result, this will cause delays when cache misses occur, threatening the deterministic value proposition of the TCM technology.

DRAM technology is a theoretical choice but it’s cost prohibitive. That leaves SRAM as a viable candidate for fast, direct and uncached TCM access. SRAM can be easily embedded on a chip and permits random accesses at the speed of the processor. However, cost-per-bit of SRAM is higher than Flash and DRAM, which means it’s critical to keep the size of the TCM limited.

Atmel | SMART Cortex-M7 MCUs

Take the case of Atmel’s SMART SAM E70, S70 and V70/71 microcontrollers that organize SRAM into four memory banks for TCM and System SRAM parts. The company has recently started shipping volume units of its SAM E70 and S70 families for the IoT and industrial markets, and claims that these MCUs provide 50 percent better performance than the closest competitor.

SAM-E70_S70_BlockDiagram_Lg_929x516

Atmel’s M7-based microcontrollers offer up to 384KB of embedded SRAM that is configurable as TCM or system memory for providing IoT designs with higher flexibility and utilization. For instance, E70 and S70 microcontrollers organize 384KB of embedded SRAM into four ports to limit memory access conflicts. These MCUs allocate 256KB of SRAM for TCM functions — 128 KB for ITCM and DTCM each — to deliver zero wait access at 300MHz processor speed, while the remaining 128KB of SRAM can be configured as system memory running at 150MHz.

However, the availability of an SRAM block organized in the form of a memory bank of 384KB means that both system SRAM and TCM can be used at the same time.The large on-chip SRAM of 384KB is also critical for many IoT devices, since it enables them to run multiple communication stacks and applications on the same MCU without adding external memory. That’s a significant value proposition in the IoT realm because avoiding external memories lowers the BOM cost, reduces the PCB footprint and eliminates the complexity in the high-speed PCB design.

Atmel tightens automotive focus with new Cortex-M7 MCUs


Large SoCs without an Ethernet interface typically have slow start-up times and high-power requirements — until now. 


Atmel, a lead partner for the ARM Cortex-M7 processor launch in October 2014, has unveiled three new M7-based microcontrollers with a unique memory architecture and advanced connectivity features for the connected car market.

According to a company spokesman, E70, V71 and V70 chips are the industry’s highest performing Cortex-M microcontrollers with six-stage dual-issue pipeline delivering 1500 CoreMarks at 300MHz. Moreover, V70 and V71 microcontrollers are the only automotive-qualified ARM Cortex-M7 MCUs with Audio Video Bridging (AVB) over Ethernet and Media LB peripheral support.

Cortex-M7-chip-diagramLG

Atmel is among the first suppliers to introduce the ARM Cortex-M7-based MCUs, whose core combines performance and simplicity and further pushes the performance envelope for embedded devices. The new MCU devices are aimed to take the connected car design to the next performance level with high-speed connectivity, high-density on-chip memory, and a solid ecosystem of design engineering tools.

Atmel’s Memory Play

Atmel has memory technology in its DNA, and that seems apparent in the design footprint of E70, V70 and V71 MCUs. The San Jose-based chipmaker is offering a flexible memory system that is optimized for performance, determinism and low latency.

Jacko Wilbrink, Senior Marketing Director at Atmel, said that the company’s Cortex-M7-based MCUs leverage Atmel’s advanced peripherals and flexible SRAM architecture for higher performance applications while keeping the Cortex-M class ease-of-use. He added that the large on-chip SRAM on SAM E70/V70/V71 chips is critical for connected car and IoT product designers since it allows them to run the multiple communication stacks and applications on the same MCU without adding external memory.

On-chip DMA and low-latency access SRAM architecture

On-chip DMA and low-latency access SRAM architecture

Avoiding the external memories reduces the PCB footprint, lowers the BOM cost and eliminates the complexity of high-speed PCB design when pushing the performance to a maximum. Next, Tim Grai, another senior manager at Atmel, pointed out another critical take from Cortex-M7 designs: The tightly coupled memory (TCM) interface. It provides the low-latency memory that the processor can use without the unpredictability that is a feature of cache memories.

Grai says that the most vital memory feature is not the memory itself but how the TCM interface to the M7 is utilized. “The available RAM is configurable to be used as system RAM or tightly-coupled instruction and data memory to the core, where it provides deterministic zero-wait state access,” Grai added. “The arrangement of SRAM allows for multiple concurrent accesses.”

Cortex-M7 a DSP Winner

According to Will Strauss, President & Principal Analyst at Forward Concepts, ARM has had considerable success with its Cortex-M4 power-efficient 32-bit processor chip family. “However, realizing that it lacked the math ability to do more sophisticated DSP functions, ARM has introduced the Cortex-M7, its newest and most powerful member of the Cortex-M family.”

Strauss adds that the M7 provides 32-bit floating point DSP capability as well as faster execution times. With the greater clock speed, floating point and twice the DSP power of the M4, the M7 is even more attractive for applications requiring high-performance audio and even video accompanying traditional automotive and control applications.

Atmel’s Grai added an interesting dimension to the DSP story in Cortex-M7 processor fabric. He pointed out that true DSPs don’t do control and logical functions well and generally lack the breadth of peripherals available on MCUs. “The attraction of the M7 is that it does both—DSP functions and control functions—hence it can be classified as a digital signal controller (DSC).”

Grai quoted the example of Atmel V70 and V71 microcontrollers used to connect end-nodes like infotainment audio amplifiers to the emerging Ethernet AVB network. In an audio amplifier, you receive a specific audio format that has to be converted, filtered, modulated to match the requirement for each specific speaker in the car. So you need Ethernet and DSP capabilities at the same time.

Grai says that the audio amplifier in infotainment applications is a good example of DSC: a mix of MCU capabilities and peripherals plus DSP capability for audio processing. Atmel is targeting the V70 and V71 chips as a bridge between large application processors and Ethernet.

Most of the time, the main processor does not integrate Ethernet AVB, as the infotainment connectivity is based on Ethernet standard. Here, the V71 microcontroller brings this feature to the main processor. “Large SoCs, which usually don’t have Ethernet interface, have slow start-up time and high power requirements,” Grai said. “Atmel’s V7x MCUs allow fast network start-up and facilitate power moding.”

The SAM E70, V70 and V71

Atmel’s three new MCU devices are aimed at multiple aspects of in-vehicle infotainment connectivity and telematics control.

SAM E70: The microcontroller series features Dual CAN-FD, 10/100 Ethernet MAC with IEEE1588 real-time stamping, and AVB support. It’s aimed at automotive industry’s movement toward controller area network (CAN) message-based protocols holistically across the cabin, eliminating isolation and wire redundancy, and have them all bridged centrally with the CAN interface.

SAM V70: It’s designed for MediaLB connectivity and leverages advanced audio processing, multi-port memory architecture and Cortex-M7 DSP capabilities. For the media-oriented systems transport (MOST) architecture, old modules are not redesigned. So Atmel offers a MOST solution that is done over Media Local Bus (MediaLB) and is supported by the V70 series.

SAM V71: The MCU series ports a complete automotive Ethernet AVB stack for in-vehicle infotainment connectivity, audio amplifiers, telematics and head control units. It mirrors the SAM V70 series features as well as combines Ethernet-AVB and MediaLB connectivity stacks.


Majeed Ahmad is the author of books Smartphone: Mobile Revolution at the Crossroads of Communications, Computing and Consumer Electronics and The Next Web of 50 Billion Devices: Mobile Internet’s Past, Present and Future.

IAR Systems updates development tools for ARM Cortex-M7 devices


IAR Systems shortens build times in leading development toolchain for ARM-based devices.


Version 7.40 of the incredibly-popular IAR Embedded Workbench for ARM has introduced support for ARM Cortex-M7 microcontrollers from Atmel. Beyond that, the tools now feature parallel build for shorter build times, as well as an integration of IAR Systems’ new tool C-STAT for powerful static code analysis.

template-screenschot-b-trace-timeline-crun-micrum

As you know, the ARM Cortex-M7 processor is the most recent addition to the ARM Cortex-M family. Not only focused on energy efficiency and high-performance, the MCUs are intended for use in a wide-range of applications including automotive, industrial automation, medical devices, and of course, the burgeoning Internet of Things.

The new version of IAR Embedded Workbench adds support for ARM Cortex-M7 devices from Atmel, including support for the double precision floating point unit. This covers the recently-revealed Atmel | SMART SAM E70, SAM S70 and SAM V70. In addition to these MCUs, support for a number of ARM Cortex-based devices from several other vendors have also been added.

In order to speed up build times, version 7.40 introduces parallel build. Users can easily set the compiler to run in several parallel processes and make better use of the available processor cores in the PC. This feature can have a major impact on reducing the build times of the compiler.

The add-on product C-STAT for powerful, integrated static code analysis is now available. Static analysis finds potential issues in code on the source code level and can be used to prevent errors such as memory leaks, access violations, arithmetic errors and array and string overruns. The analysis performed by C-STAT improves code quality and aids alignment with industry coding standards. It checks compliance with rules as defined by MISRA C:2004, MISRA C++:2008 and MISRA C:2012, as well as hundreds of rules based on CWE (the Common Weakness Enumeration) and CERT C/C++, for example. Users can easily select the rule set or individual rules to check their code against, and the analysis results are provided directly in the IAR Embedded Workbench IDE.

Interested? Head over to IAR Systems’ official page to learn more. Also, as of late last year, over 1,400 new example projects could be found in IAR Embedded Workbench, which supports Atmel’s entire portfolio of MCUs and MPUs.