Tag Archives: prefetch

Atmel’s SAM4S clinches highest CoreMark/MHz scores

Atmel’s SAM4S MCU lineup – which clocks in at a top speed of 120MHz+ – is based on ARM’s Cortex-M4 core. The microcontroller series integrates a Flash read accelerator along with cache memory to increase system performance. Additional key specs include a multi-layer bus matrix, multi-channel direct memory access (DMA) and distributed memory to facilitate high data rate communication.

Recently, the EEMBC (Embedded Microprocessor Benchmark Consortium) certified five SAM4S MCU benchmark scores running a version of CoreMark compiled using the IAR Embedded Workbench for ARM version 6.50. As it turns out, Atmel’s SAM4S MCUs racked up the highest CoreMark/MHz for any Cortex-M microcontroller submitted to date.

“The CoreMark benchmark is designed to measure the performance of the processor core alone,” Atmel engineering rep Brian Hammill told Bits & Pieces.

“While the CoreMark may not always convey how well a particular part will perform in a specific application, it does offer an accurate test of core performance and efficiency. As such, CoreMark can be used to understand how the performance of a particular MCU and compiler combination compares to others.”

According to Hammill, the Atmel scores are particularly significant as they illustrate the overall efficiency of the Cortex-M4 cache implemented on the SAM4SA16 and SAM4SD32, as well as the optimized performance of the IAR Embedded Workbench version (6.50).

“Looking at the Atmel SAM4SD32CAU, we see the CoreMark for the IAR EWARM 6.50 was run at both 21 MHz and 123 MHz. If we run the EEMBC CoreMark report or export the data to Excel, here is what we see:

coremarkatmelscores

“As expected, the CoreMark scores are much higher at the faster clock speed. But what is most significant is the difference in the CoreMark/MHz scores. Notice that the 21 MHz CoreMark memory configuration is zero wait states. The memory configuration for the 123 MHz CoreMark is 5 wait states but with prefetch and cache enabled. You see a small difference in the CoreMark/MHz scores between the 21 and 123 MHz benchmarks.”

Why? Well, as Hammill, notes, if you had a perfect zero wait state memory or cache system, the exact same CoreMark/MHz would be returned regardless of the speed.

“Of course it is to be expected that the cache helps – but does not completely cover the wait states of Flash. However, the small difference between 3.32 CoreMark/Mhz at 123 MHz and 3.38 CoreMark/ MHz illustrates Atmel’s SAM4SD32CAU device has a very good implementation of cache and prefetch,” he explained.

atmelcoremark2

“Indeed, if the Atmel cache and prefetch weren’t optimized, you would expect to see a much larger difference in the CoreMark/MHz scores. I would also like to note that the Atmel SAM4SD32CAU require 5 wait states in flash to run at 123 MHz – but with very slight performance penalty as indicated by the CoreMark/MHz scores.”

atmelcoremark1

CoreMark – written in C – was developed in 2009 by Shay Gal-On at EEMBC and contains implementations of numerous algorithms. These include list processing (find and sort), Matrix (mathematics) manipulation (common matrix operations), state machine (determine if an input stream contains valid numbers) and CRC. Like any benchmark, the EEMBC CoreMark clearly isn’t perfect, although it is certainly a fair assessment of overall performance, as well as the core and memory efficiency of a specific processor.