Tag Archives: Atmel | SMART ARM Cortex-M7-based MCUs

ARM Keil ecosystem integrates the Atmel SAM ESV7


Keil is part of the ARM wide ecosystem, enabling developers to speed up system release to the market. 


Even the best System-on-Chip (SoC) is useless without software, as well as the best designed S/W needs H/W to flourish. The “old” embedded world has exploded into many emergent markets like the  IoT, wearables, and even automotive, which is no more restricted to motor control or airbags as innovative products from entertainment to ADAS are being developed. What is the common denominator with these emergent products? Each of these require more software functionality and fast memory algorithm with deterministic code execution, and consequently innovative hardware to support these requirements, such as the ARM Cortex-M7-based Atmel | SMART SAM ESV7.

AtmelChipLib Overview

ARM has released a complete software development environment for a range of ARM Cortex-M based MCU devices: Keil MDK. Keil is part of ARM wide ecosystem, enabling developers to speed up system release to the market. MDK includes the µVision IDE/Debugger and ARM C/C++ Compiler, along with the essential middleware components and software packs. If you’re familiar with Run-Time Environment stacked description, you’ll recognize the various stacks. Let’s focus on “CMSIS-Driver”. CMSIS is the standard software framework for Cortex-M MCUs, extending the SAM-ESV7 Chip Library with standardized drivers for middleware and generic component interfaces.

By definition, an MCU is designed to address multiple applications and the SAM ESV7 is dedicated to support performance demanding and DSP intensive systems. Thanks to its 300MHz clock, SAM ESV7 delivers up to 640 DMIPS and its DSP performance is double that available in the Cortex-M4. A double-precision floating-point unit and a double-issue instruction pipeline further position the Cortex-M7 for speed.

Atmel Cortex M7 based Dev board

Let’s review some of these applications where SAM ESV7 is the best choice…

Finger Printer Module

The goal is to provide human bio authentication module for office or house access control. The key design requirements are:

  • +300 MHz CPU performance to process recognition algorithms
  • Image sensor interface to read raw finger image data from finger sensor array
  • Low cost and smaller module size
  • Flash/memory to reduce BOM cost and module size
  • Memory interface to expand model with memory extension just in case.

The requirement for superior performance and an image sensor interface can be seen as essential needs, but which will make the difference will be to offer both cheaper BOM cost and smaller module size than the competitor? The SAM S70 integrates up to 2MB embedded Flash, which is twice more than the direct competitor and may allow reducing BOM and module size.

SAM S70 Finger Print

Automotive Radio System

Every cent counts in automotive design, and OEMs prefer using a MCU rather than MPU, at first for cost reasons. Building an attractive radio for tomorrow’s car requires developing very performing DSP algorithms. Such algorithms used to be developed on expansive DSP standard part, leading to large module size, including external Flash and MCU leading obviously to a heavy BOM. In a 65nm embedded Flash process device, the Cortex-M7 can achieve a 1500 CoreMark score while running at 300 MHz, and its DSP performance is double that available in the Cortex-M4. This DSP power can be used to manage eight channels of speaker processing, including six stages of biquads, delay, scaler, limiter and mute functions. The SAM S71 workload is only 63% of the CPU, leaving enough room to support Ethernet AVB stack — very popular in automotive.

One of the secret sauces of the Cortex-M7 architecture is to provide a way to bypass the standard execution mechanism using “tightly coupled memories,” or TCM. There is an excellent white paper describing TCM implementation in the SAM S70/E70 series, entitled “Run Blazingly Fast Algorithms with Cortex-M7 Tightly Coupled Memories” from Lionel Perdigon and Jacko Wilbrink, which you can find here.


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of the site. This blog first appeared on SemiWiki on October 23, 2015.

How Ethernet AVB is playing a central role in automotive streaming applications


Ethernet is emerging as the network of choice for infotainment and advanced driver assistance systems, Atmel’s Tim Grai explains.


Imagine you’re driving down the highway with the music blaring, enjoying the open road. Now imagine that the sound from your rear speaker system is delayed by a split second from the front; your enjoyment of the fancy in-car infotainment system comes to a screeching halt.

Ethernet is emerging as the network of choice for infotainment and advanced driver assistance systems that include cameras, telematics, rear-seat entertainment systems and mobile phones. But standard Ethernet protocols can’t assure timely and continuous audio/video (A/V) content delivery for bandwidth intensive and latency sensitive applications without buffering, jitter, lags or other performance hits.

fig1_popup

Audio-Video Bridging (AVB) over Ethernet is a collection of extensions to the IEEE802.1 specifications that enables local Ethernet networks to stream time synchronised, loss sensitive A/V data. Within an Ethernet network, the AVB extensions help differentiate AVB traffic from the non-AVB traffic that can also flow through the network. This is done using an industry standard approach that allows for plug-and-play communication between systems from multiple vendors.

The extensions that define the AVB standard achieve this by:

  • reserving bandwidth for AVB data transfers to avoid packet loss due to network congestion from ‘talker’ to ‘listener(s)’
  • establishing queuing and forwarding rules for AVB packets that keep packets from bunching and guarantee delivery of packets with a bounded latency from talker to listener(s) via intermediate switches, if needed
  • synchronizing time to a global clock so the time bases of all network nodes are aligned precisely to a common network master clock, and
  • creating time aware packets which include a ‘presentation time’ that specifies when A/V data inside a packet has to be played.

Designers of automotive A/V systems need to understand the AVB extensions and requirements, as well as how their chosen microcontroller will support that functionality.

AVB: A basket of standards

AVB requires that three extensions be met in order to comply with IEEE802.1:

  • IEEE802.1AS – timing and synchronisation for time-sensitive applications (gPTP)
  • IEEE802.1Qat – stream reservation protocol (SRP)
  • IEEE802.1Qav – forwarding and queuing for time-sensitive streams (FQTSS).

In order to play music or video from one source, such as a car’s head unit, to multiple destinations, like backseat monitors, amplifiers and speakers, the system needs a common understanding of time in order to avoid lags or mismatch in sound or video. IEEE802.1AS-2011 specifies how to establish and maintain a single time reference – a synchronised ‘wall clock’ – for all nodes in a local network. The generalized precision time protocol (gPTP), based on IEEE1588, is used to synchronize and syntonize all network nodes to sub-microsecond accuracy. Nodes are synchronized if their clocks show the same time and are syntonised if their clocks increase at the same rate.

fig.2

This protocol selects a Grand Master Clock from which the current time is propagated to all network end-stations. In addition, the protocol specifies how to correct for clock offset and clock drifts by measuring path delays and frequency offsets. New MCUs, such as the Atmel | SMART SAMV7x (shown above), detect and capture time stamps automatically when gPTP event messages cross MII layers. They can also transport gPTP messages over raw Ethernet, IPv4 or IPv6. This hardware recognition feature helps to calculate clock offset and link delay with greater accuracy and minimal software load.

Meanwhile, SRP guarantees end-to-end bandwidth reservation for all streams to ensure packets aren’t delayed or dropped at any switch due to network congestion, which can occur with standard Ethernet. For the in-vehicle environment, SRP is typically configured in advance by the car maker, who defines data streams and bandwidth allocations.

Talkers (the source of A/V data) ‘advertise’ data streams and their characteristics. Switches process these announcements from talker and listeners to:

  • register and prune streams’ path through the network
  • reserve bandwidth and prevent over subscription of available bandwidth
  • establish forwarding rules for incoming packets
  • establish the SRP domain, and
  • merge multiple listener declarations for the same stream

The standard stipulates that AVB data can reserve only 75% of total available bandwidth, so for a 100Mbit/s link, the maximum AVB data is 75Mbit/s. The remaining bandwidth can be used for all other Ethernet protocols.

In automotive systems, the streams may be preconfigured and bandwidth can be reserved statically at system startup to reduce the time needed to bring the network into a fully operational state. This supports safety functions, such as driver alerts and the reversing camera, that must be displayed within seconds.

SRP uses other signalling protocols, such as Multiple MAC Registration Protocol, Multiple VLAN Registration Protocol and Multiple Stream Registration Protocol to establish bandwidth reservations for A/V streams dynamically.

The third extension is FQTSS, which guarantees that time sensitive A/V streams arrive at their listeners within a bounded latency. It also defines procedures for priority regenerations and credit based traffic shaper algorithms to meet stream reservations for all available devices.

The AVB standard can support up to eight traffic classes, which are used to determine quality of service. Typically, nodes support at least two traffic classes – Class A, the highest priority, and Class B. Microcontroller features help manage receive and transmit data with multiple priority queues to support AVB and ‘best effort class’ non AVB data.

box

Automotive tailored requirements

Automotive use cases typically fix many parameters at the system definition phase, which means that AVB implementation can be optimised and simplified to some extent.

  • Best Master Clock algorithm (BMCA): the best clock master is fixed at the network definition phase so dynamic selection using BCMA isn’t needed.
  • SRP: all streams, their contents and their characteristics are known at system definition and no new streams are dynamically created or destroyed; the proper reservation of data is known at the system definition phase; switches, talkers and listeners can have their configurations loaded at system startup from pre-configured tables, rather than from dynamic negotiations
  • Latency; while this is not critical, delivery is. Automotive networks are very small with only a few nodes between a talker and listener. It is more important not to drop packets due to congestion.

Conclusion

The requirement to transfer high volumes of time sensitive audio and video content inside vehicles necessitates developers to understand and apply the Ethernet AVB extensions. AVB standardization results in interoperable end-devices from multiple vendors that can deliver audio and video streams to distributed equipment on the network with micro-second accuracy or better. While the standard brings complexities, new MCUs with advanced features are simplifying automotive A/V design.


This article was originally published on New Electronics on October 13, 2015 and authored by Tim Grai, Atmel’s Director of Automotive MCU Application Engineering. 

6 memory considerations for Cortex-M7-based IoT designs


Taking a closer look at the configurable memory aspects of Cortex-M7 microcontrollers.


Tightly coupled memory (TCM) is a salient feature in the Cortex-M7 lineup as it boosts the MCU’s performance by offering single cycle access for the CPU and by securing the high-priority latency-critical requests from the peripherals.

Cortex-M7-chip-diagramLG

The early MCU implementations based on the ARM’s M7 embedded processor core — like Atmel’s SAM E70 and S70 chips — have arrived in the market. So it’d be worthwhile to have a closer look at the configurable memory aspects of M7 microcontrollers and see how the TCMs enable the execution of deterministic code and fast transfer of real-time data at the full processor speed.

Here are some of the key findings regarding the advanced memory architecture of Cortex-M7 microcontrollers:

1. TCM is Configurable

First and foremost, the size of TCM is configurable. TCM, which is part of the physical memory map of the MCU, supports up to 16MB of tightly coupled memory. The configurability of the ARM Cortex-M7 core allows SoC architects to integrate a range of cache sizes. So that industrial and Internet of Things product developers can determine the amount of critical code and real-time data in TCM to meet the needs of the target application.

The Atmel | SMART Cortex-M7 architecture doesn’t specify what type of memory or how much memory should be provided; instead, it leaves these decisions to designers implementing M7 in a microcontroller as a venue for differentiation. Consequently, a flexible memory system can be optimized for performance, determinism and low latency, and thus can be tuned to specific application requirements.

2. Instruction TCM

Instruction TCM or ITCM implements critical code with deterministic execution for real-time processing applications such as audio encoding/decoding, audio processing and motor control. The use of standard memory will lead to delays due to cache misses and interrupts, and therefore will hamper the deterministic timing required for real-time response and seamless audio and video performance.

The deterministic critical software routines should be loaded in a 64-bit instruction memory port (ITCM) that supports dual-issue processor architecture and provide single-cycle access for the CPU to boost MCU performance. However, developers need to carefully calibrate the amount of code that need zero-wait execution performance to determine the amount of ITCM required in an MCU device.

The anatomy of TCM inside the M7 architecture

The anatomy of TCM inside the M7 architecture.

3. Data TCM

Data TCM or DTCM is used in fast data processing tasks like 2D bar decoding and fingerprint and voice recognition. There are two data ports (DTCMs) that provide simultaneous and parallel 32-bit data accesses to real-time data. Both instruction TCM and data TCM — used for efficient access to on-chip Flash and external resources — must have the same size.

4. System RAM and TCM

System RAM, also known as general RAM, is employed for communications stacks related to networking, field buss, high-bandwidth bridging, USB, etc. It implements peripheral data buffers generally through direct memory access (DMA) engines and can be accessed by masters without CPU intervention.

Here, product developers must remember the memory access conflicts that arise from the concurrent data transfer to both CPU and DMA. So developers must set clear priorities for latency-critical requests from the peripherals and carefully plan latency-critical data transfers like the transfer of a USB descriptor or a slow data rate peripheral with a small local buffer. Access from the DMA and the caches are generally burst to consecutive addresses to optimize system performance.

It’s worth noting that while system memory is logically separate from the TCM, microcontroller suppliers like Atmel are incorporating TCM and system RAM in a single SRAM block. That lets IoT developers share general-purpose tasks while splitting TCM and system RAM functions for specific use cases.

A single SRAM block for TCM and system memory allows higher flexibility and utilization

A single SRAM block for TCM and system memory allows higher flexibility and utilization.

5. TCM Loading

The Cortex-M7 uses a scattered RAM architecture to allow the MCU to maximize performance by having a dedicated RAM part for critical tasks and data transfer. The TCM might be loaded from a number of sources, and these sources aren’t specified in the M7 architecture. It’s left to the MCU designers whether there is a single DMA or several data loading points from various streams like USB and video.

It’s imperative that, during the software build, IoT product developers identify which code segments and data blocks are allocated to the TCM. This is done by embedding programs into the software and by applying linker settings so that software build appropriately places the code in memory allocation.

6. Why SRAM?

Flash memory can be attached to a TCM interface, but the Flash cannot run at the processor clock speed and will require caching. As a result, this will cause delays when cache misses occur, threatening the deterministic value proposition of the TCM technology.

DRAM technology is a theoretical choice but it’s cost prohibitive. That leaves SRAM as a viable candidate for fast, direct and uncached TCM access. SRAM can be easily embedded on a chip and permits random accesses at the speed of the processor. However, cost-per-bit of SRAM is higher than Flash and DRAM, which means it’s critical to keep the size of the TCM limited.

Atmel | SMART Cortex-M7 MCUs

Take the case of Atmel’s SMART SAM E70, S70 and V70/71 microcontrollers that organize SRAM into four memory banks for TCM and System SRAM parts. The company has recently started shipping volume units of its SAM E70 and S70 families for the IoT and industrial markets, and claims that these MCUs provide 50 percent better performance than the closest competitor.

SAM-E70_S70_BlockDiagram_Lg_929x516

Atmel’s M7-based microcontrollers offer up to 384KB of embedded SRAM that is configurable as TCM or system memory for providing IoT designs with higher flexibility and utilization. For instance, E70 and S70 microcontrollers organize 384KB of embedded SRAM into four ports to limit memory access conflicts. These MCUs allocate 256KB of SRAM for TCM functions — 128 KB for ITCM and DTCM each — to deliver zero wait access at 300MHz processor speed, while the remaining 128KB of SRAM can be configured as system memory running at 150MHz.

However, the availability of an SRAM block organized in the form of a memory bank of 384KB means that both system SRAM and TCM can be used at the same time.The large on-chip SRAM of 384KB is also critical for many IoT devices, since it enables them to run multiple communication stacks and applications on the same MCU without adding external memory. That’s a significant value proposition in the IoT realm because avoiding external memories lowers the BOM cost, reduces the PCB footprint and eliminates the complexity in the high-speed PCB design.

Why do drones love the Atmel SAM E70?


Eric Esteve explains why the latest Cortex-M7 MCU series will open up countless capabilities for drones other than just flying. 


By nature, avionics is a mature market requiring the use of validated system solution: safety is an absolute requirement, while innovative systems require a stringent qualification phase. That’s why the very fast adoption of drones as an alternative solution for human piloted planes is impressive. It took 10 or so years for drones to become widely developed and employed for various applications, ranging from war to entertainment, with prices spanning a hundreds of dollars to several hundreds of thousands. But, even if we consider consumer-oriented, inexpensive drones, the required processing capabilities not only call for high performance but versatile MCU as well, capable of managing its built-in gyroscope, accelerator, geomagnetic sensor, GPS, rotational station, four to six-axis control, optical flow and so on.

Drone-camera-use-cases-for-atmel-sam-e70

When I was designing for avionics, namely the electronic CFM56 motor control (this reactor being jointly developed by GE in the U.S. and Snecma in France, equipping Boeing and Airbus planes), the CPU was a multi-hundred dollar Motorola 68020, leading to a $20 per MIPS cost! While I may not know the Atmel | SMART SAM E70 price precisely — I would guess that it cost a few dollars — what I do I know is that the MCU is offering an excess of 600 DMIPS. Aside from its high performance, this series boasts a rather large on-chip memory size of up to 384KB SRAM and 2MB Flash — just one of many pivotal reasons that this MCU has been selected to support the “drone with integrated navigation control to avoid obstacle and improve stability.”

In fact, the key design requirements for this application were: +600 DMIPS, camera sensor interface, dual ADC and PWM for motor control and dual CAN, all bundled up in a small package. Looking at the block diagram below helps link the MCU features with the various application capabilities: gyroscope (SPI), accelerator (SPI x2), geomagnetic sensor (I2C x2), GPS (UART), one or two-channel rotational station (UART x2), four or six-axis control communication (CAN x2), voltage/current (ADC), analog sensor (ADC), optical flow sensor (through image sensor Interface or ISI) and pulse width modulation (PWM x8) to support the rotational station and four or six-axis speed PWM control.

For those of you who may not know, the SAM E70 is based on the ARM-Cortex M7 — a principle and multi-verse handling MCU that combines superior performance with extensive peripheral sets supporting multi-threaded processes. It’s this multi-thread support that will surely open up countless capabilities for drones other than simply flying.

Atmel | SMART ARM Cortex M7 SAM E70

Today’s drones already possess the ability to soar through the air or stay stationary, snapping pictures or capturing HD footage. That’s already very impressive to see sub-kilogram devices offering such capabilities! However, the drone market is already looking ahead, preparing for the future, with the desire to get more application stacks into the UAVs so they can take in automation, routing, cloud connectivity (when available), 4G/5G, and other wireless functionalities to enhance data pulling and posting.

For instance, imagine a small town tallying a few thousand habitants, except a couple of days or weeks per year because of a special event or holiday, a hundred thousand people come storming into the area. These folks want to feed their smartphone with multimedia or share live experiences by sending movies or photos, most of them at the same time. The 4G/5G and cloud infrastructure is not tailored for such an amount of people, so the communication system may break. Yet, this problem could be fixed by simply calling in drone backup to reinforce the communication infrastructure for that period of time.

While this may be just one example of what could be achieved with the advanced usage of drones, each of the innovative applications will be characterized by a common set of requirements: high processing performance, large SRAM and flash memory capability, and extensive peripheral sets supporting multi-threaded processes. In this case, the Cortex M7 ARM-based SAM E70 MCU is an ideal choice with processing power in excess of 640 DMIPS, large on-chip SRAM (up to 384 KB) and Flash (up to 2MB) capabilities managing all sorts of sensors, navigation, automation, servos, motor, routing, adjustments, video/audio and more.

Intrigued? You’ll want to check out some of the products and design kits below:


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of SemiWiki.com. This blog first appeared on SemiWiki on July 18, 2015.

Atmel tightens automotive focus with new Cortex-M7 MCUs


Large SoCs without an Ethernet interface typically have slow start-up times and high-power requirements — until now. 


Atmel, a lead partner for the ARM Cortex-M7 processor launch in October 2014, has unveiled three new M7-based microcontrollers with a unique memory architecture and advanced connectivity features for the connected car market.

According to a company spokesman, E70, V71 and V70 chips are the industry’s highest performing Cortex-M microcontrollers with six-stage dual-issue pipeline delivering 1500 CoreMarks at 300MHz. Moreover, V70 and V71 microcontrollers are the only automotive-qualified ARM Cortex-M7 MCUs with Audio Video Bridging (AVB) over Ethernet and Media LB peripheral support.

Cortex-M7-chip-diagramLG

Atmel is among the first suppliers to introduce the ARM Cortex-M7-based MCUs, whose core combines performance and simplicity and further pushes the performance envelope for embedded devices. The new MCU devices are aimed to take the connected car design to the next performance level with high-speed connectivity, high-density on-chip memory, and a solid ecosystem of design engineering tools.

Atmel’s Memory Play

Atmel has memory technology in its DNA, and that seems apparent in the design footprint of E70, V70 and V71 MCUs. The San Jose-based chipmaker is offering a flexible memory system that is optimized for performance, determinism and low latency.

Jacko Wilbrink, Senior Marketing Director at Atmel, said that the company’s Cortex-M7-based MCUs leverage Atmel’s advanced peripherals and flexible SRAM architecture for higher performance applications while keeping the Cortex-M class ease-of-use. He added that the large on-chip SRAM on SAM E70/V70/V71 chips is critical for connected car and IoT product designers since it allows them to run the multiple communication stacks and applications on the same MCU without adding external memory.

On-chip DMA and low-latency access SRAM architecture

On-chip DMA and low-latency access SRAM architecture

Avoiding the external memories reduces the PCB footprint, lowers the BOM cost and eliminates the complexity of high-speed PCB design when pushing the performance to a maximum. Next, Tim Grai, another senior manager at Atmel, pointed out another critical take from Cortex-M7 designs: The tightly coupled memory (TCM) interface. It provides the low-latency memory that the processor can use without the unpredictability that is a feature of cache memories.

Grai says that the most vital memory feature is not the memory itself but how the TCM interface to the M7 is utilized. “The available RAM is configurable to be used as system RAM or tightly-coupled instruction and data memory to the core, where it provides deterministic zero-wait state access,” Grai added. “The arrangement of SRAM allows for multiple concurrent accesses.”

Cortex-M7 a DSP Winner

According to Will Strauss, President & Principal Analyst at Forward Concepts, ARM has had considerable success with its Cortex-M4 power-efficient 32-bit processor chip family. “However, realizing that it lacked the math ability to do more sophisticated DSP functions, ARM has introduced the Cortex-M7, its newest and most powerful member of the Cortex-M family.”

Strauss adds that the M7 provides 32-bit floating point DSP capability as well as faster execution times. With the greater clock speed, floating point and twice the DSP power of the M4, the M7 is even more attractive for applications requiring high-performance audio and even video accompanying traditional automotive and control applications.

Atmel’s Grai added an interesting dimension to the DSP story in Cortex-M7 processor fabric. He pointed out that true DSPs don’t do control and logical functions well and generally lack the breadth of peripherals available on MCUs. “The attraction of the M7 is that it does both—DSP functions and control functions—hence it can be classified as a digital signal controller (DSC).”

Grai quoted the example of Atmel V70 and V71 microcontrollers used to connect end-nodes like infotainment audio amplifiers to the emerging Ethernet AVB network. In an audio amplifier, you receive a specific audio format that has to be converted, filtered, modulated to match the requirement for each specific speaker in the car. So you need Ethernet and DSP capabilities at the same time.

Grai says that the audio amplifier in infotainment applications is a good example of DSC: a mix of MCU capabilities and peripherals plus DSP capability for audio processing. Atmel is targeting the V70 and V71 chips as a bridge between large application processors and Ethernet.

Most of the time, the main processor does not integrate Ethernet AVB, as the infotainment connectivity is based on Ethernet standard. Here, the V71 microcontroller brings this feature to the main processor. “Large SoCs, which usually don’t have Ethernet interface, have slow start-up time and high power requirements,” Grai said. “Atmel’s V7x MCUs allow fast network start-up and facilitate power moding.”

The SAM E70, V70 and V71

Atmel’s three new MCU devices are aimed at multiple aspects of in-vehicle infotainment connectivity and telematics control.

SAM E70: The microcontroller series features Dual CAN-FD, 10/100 Ethernet MAC with IEEE1588 real-time stamping, and AVB support. It’s aimed at automotive industry’s movement toward controller area network (CAN) message-based protocols holistically across the cabin, eliminating isolation and wire redundancy, and have them all bridged centrally with the CAN interface.

SAM V70: It’s designed for MediaLB connectivity and leverages advanced audio processing, multi-port memory architecture and Cortex-M7 DSP capabilities. For the media-oriented systems transport (MOST) architecture, old modules are not redesigned. So Atmel offers a MOST solution that is done over Media Local Bus (MediaLB) and is supported by the V70 series.

SAM V71: The MCU series ports a complete automotive Ethernet AVB stack for in-vehicle infotainment connectivity, audio amplifiers, telematics and head control units. It mirrors the SAM V70 series features as well as combines Ethernet-AVB and MediaLB connectivity stacks.


Majeed Ahmad is the author of books Smartphone: Mobile Revolution at the Crossroads of Communications, Computing and Consumer Electronics and The Next Web of 50 Billion Devices: Mobile Internet’s Past, Present and Future.

What is real SAM V71 DSP performance in automotive audio?


The integrated FPU DSP (into the Cortex-M7 core) is using 2X the number of clock cycles when compared with the SHARC21489.


Thinking of selecting an ARM Cortex-M7-based Atmel SAM V70/71 for your next automotive entertainment application? Three key reasons to consider are the clock speed of the the Cortex-M7 (300 Mhz), the integration of a floating point (FPU) DSP, and last but not least, because the SAM V70/71 has obtained automotive qualification. If you delve deeper into the SAM V70/71 features list, you will see that this MCU is divided into several versions integrating Flash: 512 KB, 1024 KB or 2018 KB. And, if you compare with the competition, this MCU is the only Cortex-M7 supporting the 2 MB Flash option, being automotive qualified and delivering 1500 CoreMark — thanks to the 300 MHz clock speed when the closest competitor only reach 240 MHz and deliver 1200 CoreMark.

SAMV71-Auto-Infotainment-System

In fact, what makes the SAMV70/71 so unique is its FPU DSP performance. Let’s make it clear for the beginning, if you search for pure DSP performance, it will be easy to find standard DSP chip offering much higher performance. Take the Analog Device AD21489 or Blackfin70x series, for example. However, the automotive market is not only very demanding, it’s also a very cost sensitive market as well.

Think about this simple calculation: If you select AD21489 DSP, you will have to add external flash and a MCU, which would lead the total BOM to be four to five times the price associated with the SAM V71. (Let’s also keep this AD21489 as a reference in terms of performance, and examine DSP benchmark results, coming from third party DSP experts DSP Concept.)

FIR Benchmark

Before analyzing the results, we need to describe the context:

  • FIR is made on 256 samples block size
  • Results are expressed in term of clock cycles (smaller is better)
  • All DSP are floating-point except Blackfin
  • Clock cycles count is measured using Audio Weaver

To elaborate upon that even further, this FIR is used to build equalization filter — the higher Taps count, the better. If we look at the “50 Taps” benchmark results, the SAM V71 (Cortex-M7 based) exhibits 22,734 clock cycles (about three times more than the SHARC21489). Unsurprisingly, the Cortex-M4 requires 50% more, but you have to integrate a Cortex-A15 to get better results, as both the Cortex-A8 and Cortex-A9 need 30% and 40% more cycles, respectively! And when looking at standard Analog Devices Blackfin DSP, only the 70x series is better by 35%… the 53x being 30% worst.

Now, if you want to build a graphic equalizer, you will have to run Biquad. For instance, when building eight channels and six stages graphic equalizer, your DSP will have to run 48 Biquad.

Biquad Benchmark

Again, the context:

  • Biquad is made on 256 samples block size
  • Results are expressed in term of clock cycles (smaller is better)
  • All DSP are floating-point except Blackfin
  • Clock cycles count is measured using Audio Weaver

In fact, the results are quite similar to those of the FIR benchmark: only the Cortex-A15 and the SHARC21489 exhibits better performance. The integrated FPU DSP (into the Cortex-M7 core) is using twice the amount of clock cycles when put side-by-side with the SHARC21489. If you compare the performance per price, the Cortex-M7 integrated in the SAMV71 is 50% cheaper! Using a SHARC DSP certainly makes sense if you want to build high performance home cinema system, but if you target automotive, it’s much more effective to select a FPU DSP integrated together with Flash (512KB to 2MB) and a full featured MCU.

The Atmel SAM V71 is specifically dedicated to support automotive infotainment application, offering Dual CAN and Ethernet MAC support. Other notable specs include:

  • 10/100 Mbps, IEEE1588 support
  • 12 KB SRAM plus DMA
  • AVB support with Qav & Qas HW support for audio traffic support
  • 802.3az Energy efficiency support
  • Dual CAN-FD
  • Up to 64 SRAM-based mailboxes
  • Wake up from sleep or wake up modes on RX/TX

Don’t forget that when looking to construct an automotive high-end radio, you still need room for Ethernet MAC and AVB support… What’s more, the SAM V71 only consume 68% of the DSP resource, leaving well enough space for both AVB and Ethernet MAC.

Interested? Explore the Atmel | SMART SAM V ARM Cortex-M7 family here. More information about the the DSP benchmark can be also found on DSP Concept’s website.  Also, be sure the detailed DSP Concept’s audio processing benchmarks.


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of SemiWiki.com. This blog first appeared on SemiWiki on May 6, 2015.

Single chip MCU + DSP architecture for automotive = SAM V71


Automotive apps are running in production by million units per year, and cost is a crucial factor when deciding on an integrated solution.


It’s all about Cost of Ownership (CoO) and system level integration. If you target automotive related application, like audio or video processing or control of systems (Motor control, inverter, etc.), you need to integrate strong performance capable MCU with a DSP. In fact, if you expect your system to support Audio Video Bridging (AVB) MAC on top of the targeted application and to get the automotive qualification, the ARM Cortex-M7 processor-based Atmel SAMV70/71 should be your selection: offering the fastest clock speed of his kind (300 MHz), integrating a DSP Floating Point Unit (FPU), supporting AVB and qualified for automotive.

Let’s have a closer look at the SAM V71 internal architecture, shall we?

A closer look at Atmel | SMART ARM based Cortex M7 - SAMV71 internal architecture.

A closer look at Atmel | SMART ARM based Cortex M7 – SAMV71 internal architecture.

When developing a system around a microcontroller unit, you expect this single chip to support as many peripherals as needed in your application to minimize the global cost of ownership. That’s why you can see the long list of system peripherals (top left of the block diagram). Meanwhile, the Atmel | SMART SAM V71 is dedicated to support automotive infotainment application, e.g. Dual CAN and Ethernet MAC (bottom right). If we delve deeper into these functions, we can list these supported features:

  • 10/100 Mbps, IEEE1588 support
  • MII (144-pin), RMII (64-, 100, 144-pin)
  • 12 KB SRAM plus DMA
  • AVB support with Qav & Qas HW support for Audio traffic support
  • 802.3az Energy efficiency support
  • Dual CAN-FD
  • Up to 64 SRAM-based mailboxes
  • Wake up from sleep or wake up modes on RX/TX

The automotive-qualified SAM V70 and V71 series also offers high-speed USB with integrated PHY and Media LB, which when combined with the Cortex-M7 DSP extensions, make the family ideal for infotainment connectivity and audio applications. Let’s take a look at this DSP benchmark:

DSP bench-Atmel-SAM-Cortex-M7

ARM CM7 Performance normalized relative to SHARC (Higher numbers are better).

If you are not limited by budget consideration and can afford integrating one standard DSP along with a MCU, you will probably select the SHARC 21489 DSP (from Analog Devices) offering the best-in-class benchmark results for FIR, Biquad and real FFT. However, such performance has a cost, not only monetarily but also in terms of power consumption and board footprint — we can call that “Cost of Ownership.” Automotive apps are running in production by million units per year, and cost is absolutely crucial in this market segment, especially when quickly deciding to go with an integrated solution.

To support audio or video infotainment application, you expect the DSP integrated in the Cortex-M7 to be “good enough” and you can see from this benchmark results that it’s the case for Biquad for example, as ARM CM7 is equal or better than any other DSP (TI C28, Blackfin 50x or 70x) except the SHARC 21489… but much cheaper! Good enough means that the SAMV70 will support automotive audio (Biquad in this case) and keep enough DSP power for Ethernet MAC (10/100 Mbps, IEEE1588) support.

Ethernet AVB via Atmel Cortex M7

Ethernet AVB Architectures (SAM V71)

In the picture above, you can see the logical SAM V71 architectures for Ethernet AVB support and how to use the DSP capabilities for Telematics Control Unit (TCU) or audio amplifier.

Integrating a DSP means that you need to develop the related DSP code. Because the DSP is tightly integrated into the ARM CM7 core, you may use the MCU development tools (and not specific DSP tools) for developing your code. Since February, the ATSAMV71-XULT (full-featured Xplained board, SAM V71 Xplained Ultra Evaluation Kit with software package drivers supporting basic drivers, software services, libraries for Atmel SAMV71, V70, E70, S70 Cortex-M7 based microcontrollers) is available from Atmel. As this board has been built around the feature-rich SAM V71, you can develop your automotive application on the same exact MCU architecture as the part going into production.

SAMV71 Ultra Xplained - Atmel ARM Cortex M7

Versatility and Integrated DSP built into the ARM CM7 core allows for MCU development tools to be used instead of having to revert to specific DSP tools. You can develop your automotive application on exactly the same MCU architecture than the part going into production.

Interested? More information on this eval/dev board can found here.


This post has been republished with permission from SemiWiki.com, where Eric Esteve is a principle blogger as well as one of the four founding members of SemiWiki.com. This blog first appeared on SemiWiki on April 29, 2015.