GAP8 Performance Versus ARM M7 on Embedded CNNs

2024-09-10 GREENWAVES Official Website
ARM processor,GAP8,GREENWAVES ARM processor,GAP8,GREENWAVES ARM processor,GAP8,GREENWAVES ARM processor,GAP8,GREENWAVES

ARM recently published a new CMSIS library for embedded convolutional neural networks (CNNs) CMSIS-NN. Firstly, it was great to see ARM supporting the market that GREENWAVES and GAP8 are focused on. We particularly liked their statement that: “Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication.”. We definitely share this vision of the market and in fact have been working on making it a reality for the last couple of years. In addition we see the opportunity to do a variety of different types of content understanding on battery powered devices, not just embedded CNNs/DNNs.

 

We thought it would be interesting to compare their performance figures with GAP8 running the same CNN graph and see how we do. In their benchmarks ARM used an STM32 F7 as their M7 based target. The F7 is manufactured using a 90nm process which penalizes it on power consumption particularly at its maximum speed of 217MHz. To be more fair we’ve compared against power figures for an STM32 H7 processor which has a maximum clock speed of 400MHz but at 217MHz is running well inside its comfort zone from a power consumption perspective. The STM32 H7 is based on the same ARM M7 core so its cycle performance will be very similar to the F7. It is also manufactured using a 40nm process which is more comparable to GAP8 which uses a 55nm LP process from TSMC.

 

The power consumption figures are both estimates based on data sheets in the case of the STM32 H7 and power estimates in the case of GAP8.


The table below shows the figures from the ARM processor blog post compared against the GAP8 running the same CNN graph on the same inference operation on the 8-core GAP8 processor cluster.

Analysis of embedded CNNs performance

The M7 based STM32 F7 takes 99.1ms at 216Mhz to do the inference on the CNN which is trained on CIFAR-10 images and has a pretty classical graph structure. The weights are all quantized to 8-bits fixed point. This works out at 21.4M cycles. The performance analysis in the ARM document for the number of operations necessary for the inference comes to 24.7MOPS. The M7 is a dual issue architecture but that doesn’t seem to be helping much in this case. Seems like there is room for optimization perhaps …

 

The GAP8 takes 1.5M cycles to run the same operation. We are not using the Hardware Convolution Engine in GAP8 in this test since we wanted to show how effective GAP8 is as a general purpose compute engine. From our tests we would expect this to improve GAP8‘s performance in power consumption on embedded CNNs by a factor of approximately 3.

 

Why is GAP8 using so few cycles? Well firstly we’re running on 8 cores and GAP8‘s extremely efficient architecture for parallelization is giving us a speed-up factor of somewhere between 7 and 8 times. Secondly the optimized DSP/SIMD instructions in GAP8 are giving fine grained parallelization on the convolution operations. Finally our fine grained control over memory movement is giving us a real benefit in the amount of cycles used to load and store weights, input and output data from the CNN graph nodes. All of these factors allow us to achieve the same execution time for the inference of 99.1ms at a clock speed of 15.4Mhz. This, in turn, allows us to run the cores at 1V leading to a power consumption during the operation of 3.7mW. Here we are really benefiting from the shared instruction cache in the cluster which decreases the cost of running the 8 cores by fetching instructions only once.

 

The energy performance of GAP8 on this operation is a 16 times improvement versus the M7 core implemented in the STM32 H7. We’re obviously not saying that we can do everything better than an M7. What we are saying is that for this type of workload we are way more energy efficient so if you want to run a CNN on an MCU class processor you should take a look at GAP8.

 

The last row in the table shows GAP8 executing the CNN at full clock speed. Here the cluster is working at 1.2V and its maximum clock speed of 175 Mhz. We are able to complete the inference in 8.7ms. A performance increase of 11 times versus the M7 core at a power level that is reasonably similar of 70mW. The energy consumed is obviously less than the M7 since it is over a shorter period but from an energy perspective the GAP8 is less efficient at this operation point.

 

As we said earlier, using the HWCE would improve the power consumption figures on GAP8 by a factor of between 2 and 3.

 

This really shows how the cluster on GAP8 performs extremely efficiently on a large parallel data processing task such as embedded CNNs.

 

More generally we think this shows how being able to innovate in CPU and instruction set architecture can really bring massive benefits when specifically targeting an application space such as embedded CNNs. Our ability to leverage open source is absolutely key in this. We would definitely not exist without it.


授权代理商:世强先进(深圳)科技股份有限公司
技术资料,数据手册,3D模型库,原理图,PCB封装文件,选型指南来源平台:世强硬创平台www.sekorm.com
现货商城,价格查询,交期查询,订货,现货采购,在线购买,样品申请渠道:世强硬创平台电子商城www.sekorm.com/supply/
概念,方案,设计,选型,BOM优化,FAE技术支持,样品,加工定制,测试,量产供应服务提供:世强硬创平台www.sekorm.com
集成电路,电子元件,电子材料,电气自动化,电机,仪器全品类供应:世强硬创平台www.sekorm.com
  • +1 赞 0
  • 收藏
  • 评论 0

本文由JWM转载自GREENWAVES Official Website,原文标题为:GAP8 performance versus ARM M7 on Embedded CNNs,本站所有转载文章系出于传递更多信息之目的,且明确注明来源,不希望被转载的媒体或个人可与我们联系,我们将立即进行删除处理。

评论

   |   

提交评论

全部评论(0

暂无评论

相关推荐

GreenWaves Technologies Partners with Open-Silicon to develop Industry’s First IoT Processor Based on PULP and RISC-V

Open-Silicon, a system-optimized ASIC solution provider, today announced it was selected by GreenWaves Technologies to develop GAP8, the industry’s first IoT processor. GAP8 is built on the open source Parallel Ultra Low Power (PULP) and RISC-V ISA projects. Open-Silicon is providing GreenWaves Technologies with the complete RTL-to-physical design custom SoC implementation that is required to transform this smart IoT concept into working silicon in volume production.

产品    发布时间 : 2024-08-20

GreenWaves Technologies Licenses Intrinsic ID Hardware Root of Trust for RISC-V AI Application Processor

GreenWaves’ pioneering RISC-V-based IoT application processors enable the cost-effective development, deployment and autonomous operation of intelligent, battery-operated sensing devices that capture, analyze, classify and act on the fusion of rich data sources such as images, sounds or vibrations at the very edge of the network.

产品    发布时间 : 2024-08-20

Lynred and GreenWaves collaborate on New Occupancy Management Reference Platform for People Counting Sensor

GreenWaves and Lynred have collaborated on an open-source workspace management platform that allows quick deployment of sensors collecting accurate occupancy data. This platform combines Lynred‘s low-power IR sensors with GreenWaves‘ GAP8 processor to create battery-operated people counting devices, released under open source licenses. The platform ensures occupant anonymity using infrared technology and will be demonstrated at Embedded World in Nuremburg, Germany.

产品    发布时间 : 2024-09-07

应用笔记或设计指南  -  GREENWAVES  - Version: Rel.1.2  - Feb.2021 PDF 英文 下载

用户指南  -  GREENWAVES  - Rel.1.4  - Aug., 2020 PDF 英文 下载

用户指南  -  GREENWAVES  - Rel. 1.2  - 11-Aug-21 PDF 英文 下载

数据手册  -  GREENWAVES  - Version 1.1  - 2020/2/20 PDF 英文 下载

用户指南  -  GREENWAVES  - Version 1.1  - 2018 05 28 PDF 英文 下载

超低功耗AI应用处理器

型号- GAP9,GAP10,GAP8

商品及供应商介绍  -  GREENWAVES  - 2021/4/23 PDF 中文 下载

用户指南  -  GREENWAVES  - Version 1.2  - 2018 11 21 PDF 英文 下载

数据手册  -  GREENWAVES  - Version 1.9  - 2020/2/21 PDF 英文 下载

用户指南  -  GREENWAVES  - Version 1.5.5  - 30 Jan 2019 PDF 英文 下载

Automated design intelligence with GAPflow

Our first product GAP8, in production since the beginning of 2020, is the leading off-the-shelf ultra-low power IoT Application Processor that combines ultra-low energy consumption, low-cost and high-computational power for compute-intensive tasks but still preserving the form-factor, system cost, energy efficiency and flexibility of a typical microcontroller.

设计经验    发布时间 : 2024-08-20

AI-deck is Available in Early Access

Super-edge-computing is now possible on a microdrone Crazyflie thanks to the GAP8 IoT application processor from GreenWaves Technologies. GAP8 delivers over 10 GOPS of compute power at exceptionally low power consumption enabling complex tasks such as pathfinding and target following on a CrazyFlie, consuming less than 0,1% of the total energy.

产品    发布时间 : 2024-08-10

GreenWaves Technologies Announces 7M€ Series A Funding with Huami, Soitec and other investors

Funds will finance the sales ramp of GreenWaves’ first product, GAP8,and the development of the GREENWAVES company’s next generation product.

厂牌及品类    发布时间 : 2024-08-31

展开更多

电子商城

查看更多

只看有货

暂无此商品

千家代理品牌,百万SKU现货供应/大批量采购订购/报价

现货市场

查看更多

暂无此商品

海量正品紧缺物料,超低价格,限量库存搜索料号

世强和原厂的技术专家将在一个工作日内解答,帮助您快速完成研发及采购。
我要提问

954668/400-830-1766(工作日 9:00-18:00)

service@sekorm.com

研发客服
商务客服
服务热线

联系我们

954668/400-830-1766(工作日 9:00-18:00)

service@sekorm.com

投诉与建议

E-mail:claim@sekorm.com

商务合作

E-mail:contact@sekorm.com

收藏
收藏当前页面