GAP8 Performance Versus ARM M7 on Embedded CNNs
ARM recently published a new CMSIS library for embedded convolutional neural networks (CNNs) CMSIS-NN. Firstly, it was great to see ARM supporting the market that GREENWAVES and GAP8 are focused on. We particularly liked their statement that: “Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication.”. We definitely share this vision of the market and in fact have been working on making it a reality for the last couple of years. In addition we see the opportunity to do a variety of different types of content understanding on battery powered devices, not just embedded CNNs/DNNs.
We thought it would be interesting to compare their performance figures with GAP8 running the same CNN graph and see how we do. In their benchmarks ARM used an STM32 F7 as their M7 based target. The F7 is manufactured using a 90nm process which penalizes it on power consumption particularly at its maximum speed of 217MHz. To be more fair we’ve compared against power figures for an STM32 H7 processor which has a maximum clock speed of 400MHz but at 217MHz is running well inside its comfort zone from a power consumption perspective. The STM32 H7 is based on the same ARM M7 core so its cycle performance will be very similar to the F7. It is also manufactured using a 40nm process which is more comparable to GAP8 which uses a 55nm LP process from TSMC.
The power consumption figures are both estimates based on data sheets in the case of the STM32 H7 and power estimates in the case of GAP8.
The table below shows the figures from the ARM processor blog post compared against the GAP8 running the same CNN graph on the same inference operation on the 8-core GAP8 processor cluster.
Analysis of embedded CNNs performance
The M7 based STM32 F7 takes 99.1ms at 216Mhz to do the inference on the CNN which is trained on CIFAR-10 images and has a pretty classical graph structure. The weights are all quantized to 8-bits fixed point. This works out at 21.4M cycles. The performance analysis in the ARM document for the number of operations necessary for the inference comes to 24.7MOPS. The M7 is a dual issue architecture but that doesn’t seem to be helping much in this case. Seems like there is room for optimization perhaps …
The GAP8 takes 1.5M cycles to run the same operation. We are not using the Hardware Convolution Engine in GAP8 in this test since we wanted to show how effective GAP8 is as a general purpose compute engine. From our tests we would expect this to improve GAP8‘s performance in power consumption on embedded CNNs by a factor of approximately 3.
Why is GAP8 using so few cycles? Well firstly we’re running on 8 cores and GAP8‘s extremely efficient architecture for parallelization is giving us a speed-up factor of somewhere between 7 and 8 times. Secondly the optimized DSP/SIMD instructions in GAP8 are giving fine grained parallelization on the convolution operations. Finally our fine grained control over memory movement is giving us a real benefit in the amount of cycles used to load and store weights, input and output data from the CNN graph nodes. All of these factors allow us to achieve the same execution time for the inference of 99.1ms at a clock speed of 15.4Mhz. This, in turn, allows us to run the cores at 1V leading to a power consumption during the operation of 3.7mW. Here we are really benefiting from the shared instruction cache in the cluster which decreases the cost of running the 8 cores by fetching instructions only once.
The energy performance of GAP8 on this operation is a 16 times improvement versus the M7 core implemented in the STM32 H7. We’re obviously not saying that we can do everything better than an M7. What we are saying is that for this type of workload we are way more energy efficient so if you want to run a CNN on an MCU class processor you should take a look at GAP8.
The last row in the table shows GAP8 executing the CNN at full clock speed. Here the cluster is working at 1.2V and its maximum clock speed of 175 Mhz. We are able to complete the inference in 8.7ms. A performance increase of 11 times versus the M7 core at a power level that is reasonably similar of 70mW. The energy consumed is obviously less than the M7 since it is over a shorter period but from an energy perspective the GAP8 is less efficient at this operation point.
As we said earlier, using the HWCE would improve the power consumption figures on GAP8 by a factor of between 2 and 3.
This really shows how the cluster on GAP8 performs extremely efficiently on a large parallel data processing task such as embedded CNNs.
More generally we think this shows how being able to innovate in CPU and instruction set architecture can really bring massive benefits when specifically targeting an application space such as embedded CNNs. Our ability to leverage open source is absolutely key in this. We would definitely not exist without it.
- |
- +1 赞 0
- 收藏
- 评论 0
本文由JWM转载自GREENWAVES Official Website,原文标题为:GAP8 performance versus ARM M7 on Embedded CNNs,本站所有转载文章系出于传递更多信息之目的,且明确注明来源,不希望被转载的媒体或个人可与我们联系,我们将立即进行删除处理。
相关推荐
GreenWaves Technologies Partners with Open-Silicon to develop Industry’s First IoT Processor Based on PULP and RISC-V
Open-Silicon, a system-optimized ASIC solution provider, today announced it was selected by GreenWaves Technologies to develop GAP8, the industry’s first IoT processor. GAP8 is built on the open source Parallel Ultra Low Power (PULP) and RISC-V ISA projects. Open-Silicon is providing GreenWaves Technologies with the complete RTL-to-physical design custom SoC implementation that is required to transform this smart IoT concept into working silicon in volume production.
产品 发布时间 : 2024-08-20
New GAP8 SDK V2.1 Was Released from GreenWaves Technologies
New GAP8 SDK release from GreenWaves Technologies. This article will show the headline changes in this SDK.
产品 发布时间 : 2024-08-22
GreenWaves Technologies Licenses Intrinsic ID Hardware Root of Trust for RISC-V AI Application Processor
GreenWaves’ pioneering RISC-V-based IoT application processors enable the cost-effective development, deployment and autonomous operation of intelligent, battery-operated sensing devices that capture, analyze, classify and act on the fusion of rich data sources such as images, sounds or vibrations at the very edge of the network.
产品 发布时间 : 2024-08-20
GreenWaves Technologies Won the Silver Golden Mousetrap Award 2019
Grenoble, France, Feb 5, 2019 – GreenWaves Technologies, a fabless semiconductor startup designing disruptive ultra-low-power embedded solutions for image, sound, and vibration artificial intelligence processing in sensing devices, announced today that it has been selected as a winner of a silver Golden Mousetrap award 2019.
原厂动态 发布时间 : 2024-08-13
Building a battery-operated smart camera in five steps using a multi-core microcontroller
In this post, we demonstrate how to train and deploy a deep learning model for image recognition on GAP8—the first generation of ultra-low power IoT application processors. Thanks to the power-optimized MCU-class architecture tailored for intensive AI workloads, GAP8 is the perfect solution when coupled with low-power cameras.
设计经验 发布时间 : 2024-11-12
A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones
Really interesting paper by Daniele Palossi on using GAP8 to autonomously navigate a microdrone. This is a great example of porting a significant CNN to GAP8. Eric Flamand, GreenWave’s CTO assisted with the CNN model creation and use of the AutoTiler CNN generators.
应用方案 发布时间 : 2024-09-30
GreenWaves Technologies Announces 7M€ Series A Funding with Huami, Soitec and other investors
Funds will finance the sales ramp of GreenWaves’ first product, GAP8,and the development of the GREENWAVES company’s next generation product.
原厂动态 发布时间 : 2024-08-31
Lynred and GreenWaves collaborate on New Occupancy Management Reference Platform for People Counting Sensor
GreenWaves and Lynred have collaborated on an open-source workspace management platform that allows quick deployment of sensors collecting accurate occupancy data. This platform combines Lynred‘s low-power IR sensors with GreenWaves‘ GAP8 processor to create battery-operated people counting devices, released under open source licenses. The platform ensures occupant anonymity using infrared technology and will be demonstrated at Embedded World in Nuremburg, Germany.
产品 发布时间 : 2024-09-07
GreenWaves Technologies Announced Availability of GAP8 Software Development Kit and GAPuino Development Board
GreenWaves’ pioneering GAP8 IoT Application Processor enables high-performing evaluation board and development kit.Grenoble, France and Santa Clara, Calif., May 22, 2018 – GreenWaves Technologies, a fabless semiconductor startup designing disruptive ultra-low power embedded solutions for image, sound and vibration AI processing in sensing devices, today announced the availability of its GAP8 Software Development Kit (SDK) and GAPuino Development Board. The GAPuino Boards are available for purchase here and the GAP8 SDK can be downloaded via GitHub.
产品 发布时间 : 2024-08-20
GAPPoc : A Family of GAP8-centric Proof Of Concept boards for Edge AI
Our GAP8 application processor chip is great at analyzing and understanding data from IoT sensors, from the simplest to the most complex, in a very tight power envelope – from a few tens of milliwatt in active mode down to a few microwatts in sleep mode.
产品 发布时间 : 2024-08-14
GAPMod 3.x GAP8 Centric Core Module with QSPI memories HARDWARE OVERVIEW
型号- GAPMOD3.0,GAP8,GAPMOD,GAPMOD 3.X
GAP8 SDK V3.0 Release from GreenWaves Technologies
New flow for neural networks which is replacing tf2gap8.This allows mapping high-level graphs (e.g. from Keras) directly to gap8 with automatic quantization.
产品 发布时间 : 2024-08-18
GreenWaves Technologies Unveils GAP8 IoT Application Processor, Enabling Groundbreaking Embedded Artificial Intelligence at the Very Edge
GreenWaves Technologies Unveils GAP8 The Industry’s Lowest Power IoT Application Processor, Enabling Groundbreaking Embedded Artificial Intelligence at the Very Edge.
产品 发布时间 : 2024-08-15
登录 | 立即注册
提交评论