Tag Archives: Configurable TCM

ARM Keil ecosystem integrates the Atmel SAM ESV7


Keil is part of the ARM wide ecosystem, enabling developers to speed up system release to the market. 


Even the best System-on-Chip (SoC) is useless without software, as well as the best designed S/W needs H/W to flourish. The “old” embedded world has exploded into many emergent markets like the  IoT, wearables, and even automotive, which is no more restricted to motor control or airbags as innovative products from entertainment to ADAS are being developed. What is the common denominator with these emergent products? Each of these require more software functionality and fast memory algorithm with deterministic code execution, and consequently innovative hardware to support these requirements, such as the ARM Cortex-M7-based Atmel | SMART SAM ESV7.

AtmelChipLib Overview

ARM has released a complete software development environment for a range of ARM Cortex-M based MCU devices: Keil MDK. Keil is part of ARM wide ecosystem, enabling developers to speed up system release to the market. MDK includes the µVision IDE/Debugger and ARM C/C++ Compiler, along with the essential middleware components and software packs. If you’re familiar with Run-Time Environment stacked description, you’ll recognize the various stacks. Let’s focus on “CMSIS-Driver”. CMSIS is the standard software framework for Cortex-M MCUs, extending the SAM-ESV7 Chip Library with standardized drivers for middleware and generic component interfaces.

By definition, an MCU is designed to address multiple applications and the SAM ESV7 is dedicated to support performance demanding and DSP intensive systems. Thanks to its 300MHz clock, SAM ESV7 delivers up to 640 DMIPS and its DSP performance is double that available in the Cortex-M4. A double-precision floating-point unit and a double-issue instruction pipeline further position the Cortex-M7 for speed.

Atmel Cortex M7 based Dev board

Let’s review some of these applications where SAM ESV7 is the best choice…

Finger Printer Module

The goal is to provide human bio authentication module for office or house access control. The key design requirements are:

  • +300 MHz CPU performance to process recognition algorithms
  • Image sensor interface to read raw finger image data from finger sensor array
  • Low cost and smaller module size
  • Flash/memory to reduce BOM cost and module size
  • Memory interface to expand model with memory extension just in case.

The requirement for superior performance and an image sensor interface can be seen as essential needs, but which will make the difference will be to offer both cheaper BOM cost and smaller module size than the competitor? The SAM S70 integrates up to 2MB embedded Flash, which is twice more than the direct competitor and may allow reducing BOM and module size.

SAM S70 Finger Print

Automotive Radio System

Every cent counts in automotive design, and OEMs prefer using a MCU rather than MPU, at first for cost reasons. Building an attractive radio for tomorrow’s car requires developing very performing DSP algorithms. Such algorithms used to be developed on expansive DSP standard part, leading to large module size, including external Flash and MCU leading obviously to a heavy BOM. In a 65nm embedded Flash process device, the Cortex-M7 can achieve a 1500 CoreMark score while running at 300 MHz, and its DSP performance is double that available in the Cortex-M4. This DSP power can be used to manage eight channels of speaker processing, including six stages of biquads, delay, scaler, limiter and mute functions. The SAM S71 workload is only 63% of the CPU, leaving enough room to support Ethernet AVB stack — very popular in automotive.

One of the secret sauces of the Cortex-M7 architecture is to provide a way to bypass the standard execution mechanism using “tightly coupled memories,” or TCM. There is an excellent white paper describing TCM implementation in the SAM S70/E70 series, entitled “Run Blazingly Fast Algorithms with Cortex-M7 Tightly Coupled Memories” from Lionel Perdigon and Jacko Wilbrink, which you can find here.


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of the site. This blog first appeared on SemiWiki on October 23, 2015.

4 designs tips for AVB in-car infotainment


AVB is clearly the choice of several automotive OEMs, says Gordon Bechtel, CTO, Media Systems, Harman Connected Services.


Audio Video Bridging (AVB) is a well-established standard for in-car infotainment, and there is a significant amount of activity for specifying and developing AVB solutions in automobiles. The primary use case for AVB is interconnecting all devices in a vehicle’s infotainment system. That includes the head unit, rear-seat entertainment systems, telematics unit, amplifier, central audio processor, as well as rear-, side- and front-view cameras.

The fact that these units are all interconnected with a common, standards-based technology that is certified by an independent market group — AVnu — is a brand new step for the automotive OEMs. The AVnu Alliance facilitates a certified networking ecosystem for AVB products built into the Ethernet networking standard.

Figure 1 - AVB is an established technology for in-car infotainmentAccording to Gordon Bechtel, CTO, Media Systems, Harman Connected Services, AVB is clearly the choice of several automotive OEMs. His group at Harman develops core AVB stacks that can be ported into car infotainment products. Bechtel says that AVB is a big area of focus for Harman.

AVB Design Considerations

Harman Connected Services uses Atmel’s SAM V71 microcontrollers as communications co-processors to work on the same circuit board with larger Linux-based application processors. The software firm writes codes for customized reference platforms that automotive OEMs need to go beyond the common reference platforms.

Based on his experience of automotive infotainment systems, Bechtel has outlined the following AVB design dos and don’ts for the automotive products:

1. Sub-microsecond accuracy: Every AVB element on the network is hooked to the same accurate clock. The Ethernet hardware should feature a time stand to ensure packet arrival in the right order. Here, Bechtel mentioned the Atmel | SMART SAM V71 MCU that boasts screen registers to ensure advanced hardware filtering of inbound packets for routing to correct receive-end queues.

2. Low latency: There is a lot of data involved in AVB, both in terms of bit rate and packet rate. AVB allows low latency through reservations for traffic, which in turn, facilitate faster packet transfer for higher priority data. Design engineers should carefully shape the data to avoid packet bottlenecks as well as data overflow.

Figure 2 - Bechtel

Bechtel once more pointed to Atmel’s SAM V71 microcontrollers that provide two priority queues with credit-based shaper (CBS) support that allows the hardware-based traffic shaping compliant with 802.1Qav (FQTSS) specifications for AVB.

3. 1588 Timestamp unit: It’s a protocol for correct and accurate 802.1 AS (gPTP) support as required by AVB for precision clock synchronization. The IEEE 802.1 AS carries out time synchronization and is synonymous with generalized Precision Time Protocol or gPTP.

Timestamp compare unit and a large number of precision timer counters are key for the synchronization needed in AVB for listener presentations times and talker transmissions rates as well as for media clock recovery.

4) Tightly coupled memory (TCM): It’s a configurable high-performance memory access system to allow zero-wait CPU access to data and instruction memory blocks. A careful use of TCM enables much more efficient data transfer, which is especially important for AVB class A streams.

It’s worth noting that MCUs based on ARM Cortex-M7 architecture have added the TCM capability for fast and deterministic code execution. TCM is a key enabler in running audio and video streams in a controlled and timely manner.

AVB and Cortex-M7 MCUs

The Cortex-M7 is a high-performance core with almost double the power efficiency of the older Cortex-M4. It features a six-stage superscalar pipeline with branch prediction — while the M4 has a three-stage pipeline.  Bechtel of Harman acknowledged that M7 features equate to more highly optimized code execution, which is important for Class A audio implementations with lower power consumption.

Again, Bechtel referred to the SAM V71 MCUs — which are based on the Cortex-M7 architecture — as particularly well suited for the smaller ECUs. “Rear-view cameras and power amplifiers are good examples where the V71 microcontroller would be a good fit,” he said. “Moreover, the V71 MCUs can meet the quick startup requirements needed by automotive OEMs.”

Figure 3 - Atmel's V71 is an M7 chip for Ethernet AVB networking and audio processing

The infotainment connectivity is based on Ethernet, and most of the time, the main processor does not integrate Ethernet AVB. So the M7 microcontrollers, like the V71, bring this feature to the main processor. For the head unit, it drives the face plate, and for the telematics control, it contains the modem to make calls so echo cancellation is a must, for which DSP capability is required.

Take the audio amplifier, for instance, which receives a specific audio format that has to be converted, filtered and modulated to match the requirement for each specific speaker in the car. This means infotainment system designers will need both Ethernet and DSP capability at the same time, which Cortex-M7 based chips like V71 provide at low power and low cost.

6 memory considerations for Cortex-M7-based IoT designs


Taking a closer look at the configurable memory aspects of Cortex-M7 microcontrollers.


Tightly coupled memory (TCM) is a salient feature in the Cortex-M7 lineup as it boosts the MCU’s performance by offering single cycle access for the CPU and by securing the high-priority latency-critical requests from the peripherals.

Cortex-M7-chip-diagramLG

The early MCU implementations based on the ARM’s M7 embedded processor core — like Atmel’s SAM E70 and S70 chips — have arrived in the market. So it’d be worthwhile to have a closer look at the configurable memory aspects of M7 microcontrollers and see how the TCMs enable the execution of deterministic code and fast transfer of real-time data at the full processor speed.

Here are some of the key findings regarding the advanced memory architecture of Cortex-M7 microcontrollers:

1. TCM is Configurable

First and foremost, the size of TCM is configurable. TCM, which is part of the physical memory map of the MCU, supports up to 16MB of tightly coupled memory. The configurability of the ARM Cortex-M7 core allows SoC architects to integrate a range of cache sizes. So that industrial and Internet of Things product developers can determine the amount of critical code and real-time data in TCM to meet the needs of the target application.

The Atmel | SMART Cortex-M7 architecture doesn’t specify what type of memory or how much memory should be provided; instead, it leaves these decisions to designers implementing M7 in a microcontroller as a venue for differentiation. Consequently, a flexible memory system can be optimized for performance, determinism and low latency, and thus can be tuned to specific application requirements.

2. Instruction TCM

Instruction TCM or ITCM implements critical code with deterministic execution for real-time processing applications such as audio encoding/decoding, audio processing and motor control. The use of standard memory will lead to delays due to cache misses and interrupts, and therefore will hamper the deterministic timing required for real-time response and seamless audio and video performance.

The deterministic critical software routines should be loaded in a 64-bit instruction memory port (ITCM) that supports dual-issue processor architecture and provide single-cycle access for the CPU to boost MCU performance. However, developers need to carefully calibrate the amount of code that need zero-wait execution performance to determine the amount of ITCM required in an MCU device.

The anatomy of TCM inside the M7 architecture

The anatomy of TCM inside the M7 architecture.

3. Data TCM

Data TCM or DTCM is used in fast data processing tasks like 2D bar decoding and fingerprint and voice recognition. There are two data ports (DTCMs) that provide simultaneous and parallel 32-bit data accesses to real-time data. Both instruction TCM and data TCM — used for efficient access to on-chip Flash and external resources — must have the same size.

4. System RAM and TCM

System RAM, also known as general RAM, is employed for communications stacks related to networking, field buss, high-bandwidth bridging, USB, etc. It implements peripheral data buffers generally through direct memory access (DMA) engines and can be accessed by masters without CPU intervention.

Here, product developers must remember the memory access conflicts that arise from the concurrent data transfer to both CPU and DMA. So developers must set clear priorities for latency-critical requests from the peripherals and carefully plan latency-critical data transfers like the transfer of a USB descriptor or a slow data rate peripheral with a small local buffer. Access from the DMA and the caches are generally burst to consecutive addresses to optimize system performance.

It’s worth noting that while system memory is logically separate from the TCM, microcontroller suppliers like Atmel are incorporating TCM and system RAM in a single SRAM block. That lets IoT developers share general-purpose tasks while splitting TCM and system RAM functions for specific use cases.

A single SRAM block for TCM and system memory allows higher flexibility and utilization

A single SRAM block for TCM and system memory allows higher flexibility and utilization.

5. TCM Loading

The Cortex-M7 uses a scattered RAM architecture to allow the MCU to maximize performance by having a dedicated RAM part for critical tasks and data transfer. The TCM might be loaded from a number of sources, and these sources aren’t specified in the M7 architecture. It’s left to the MCU designers whether there is a single DMA or several data loading points from various streams like USB and video.

It’s imperative that, during the software build, IoT product developers identify which code segments and data blocks are allocated to the TCM. This is done by embedding programs into the software and by applying linker settings so that software build appropriately places the code in memory allocation.

6. Why SRAM?

Flash memory can be attached to a TCM interface, but the Flash cannot run at the processor clock speed and will require caching. As a result, this will cause delays when cache misses occur, threatening the deterministic value proposition of the TCM technology.

DRAM technology is a theoretical choice but it’s cost prohibitive. That leaves SRAM as a viable candidate for fast, direct and uncached TCM access. SRAM can be easily embedded on a chip and permits random accesses at the speed of the processor. However, cost-per-bit of SRAM is higher than Flash and DRAM, which means it’s critical to keep the size of the TCM limited.

Atmel | SMART Cortex-M7 MCUs

Take the case of Atmel’s SMART SAM E70, S70 and V70/71 microcontrollers that organize SRAM into four memory banks for TCM and System SRAM parts. The company has recently started shipping volume units of its SAM E70 and S70 families for the IoT and industrial markets, and claims that these MCUs provide 50 percent better performance than the closest competitor.

SAM-E70_S70_BlockDiagram_Lg_929x516

Atmel’s M7-based microcontrollers offer up to 384KB of embedded SRAM that is configurable as TCM or system memory for providing IoT designs with higher flexibility and utilization. For instance, E70 and S70 microcontrollers organize 384KB of embedded SRAM into four ports to limit memory access conflicts. These MCUs allocate 256KB of SRAM for TCM functions — 128 KB for ITCM and DTCM each — to deliver zero wait access at 300MHz processor speed, while the remaining 128KB of SRAM can be configured as system memory running at 150MHz.

However, the availability of an SRAM block organized in the form of a memory bank of 384KB means that both system SRAM and TCM can be used at the same time.The large on-chip SRAM of 384KB is also critical for many IoT devices, since it enables them to run multiple communication stacks and applications on the same MCU without adding external memory. That’s a significant value proposition in the IoT realm because avoiding external memories lowers the BOM cost, reduces the PCB footprint and eliminates the complexity in the high-speed PCB design.