Power Your Edge AI Application with the Industry’s Most Powerful Arm MCUs

时间： 2023-11-03 来源：RENESAS Blogs

技术问答

选型帮助

研发客服

商务客服

The Internet of Things is exploding. Devices are connected and communicating with one another, enabled by ubiquitous wired and wireless connectivity. This hyper-connectivity allows the collection of massive amounts of data that can be harvested, analyzed, and used to make intelligent decisions. The ability to draw insights from data and make autonomous decisions based on these insights is the essence of Artificial Intelligence (AI). The combination of AI and IoT or Artificial Intelligence of Things (AIoT), enables the creation of “intelligent” devices that learn from data and make decisions without human intervention.

There are several drivers of this trend to build intelligence on edge devices:

Decision-making on the edge reduces latency and cost associated with cloud connectivity and makes real-time operation possible
Lack of bandwidth to the cloud drives computing and decision-making on edge devices
Security is a key consideration – requirements for data privacy and confidentiality drive the need to process and store data on the device itself

AI at the edge, therefore, provides advantages of autonomy, lower latency, lower power, lower bandwidth requirements, lower costs, and higher security, all of which make it more attractive for new emerging applications and use cases.

The AIoT has opened up new markets for MCUs, enabling an increasing number of new applications and use cases that can use MCUs paired with some form of AI acceleration to facilitate intelligent control on edge and end-point devices. These AI-enabled MCUs provide a unique blend of DSP capability for computing and machine learning (ML) for inference and are being used in applications as diverse as keyword spotting, sensor fusion, and vibration analysis. Higher-performance MCUs enable more complex applications in vision and imaging such as face recognition, fingerprint analysis, and object detection.

Neural networks are used in AI / ML applications such as image classification, person detection, and speech recognition. These are basic building blocks used in implementing machine learning algorithms and make extensive use of linear algebra operations such as dot products and matrix multiplications for inference processing, network training, and weight updates. As you might imagine, building AI into edge products requires significant computing capability on the processors. Designers of these new, emerging AI applications need to address demands for higher performance, larger memory, and lower power, all while keeping costs low. In days past, this was the purview of GPUs and MPUs with powerful CPU cores, large memory resources, and cloud connectivity for analytics. More recently, AI accelerators are available that can offload this task from the main CPU. Other edge computing applications such as audio or image processing require support for fast multiply-accumulate operations. Often, designers opt to add a DSP into the system to handle the signal processing and computational tasks. All these options provide the high performance required but add significant cost to the system and tend to be more power hungry and thus not suitable for low-power and low-cost endpoint devices.

How can MCUs fill the gap?

The availability of higher-performance MCUs allows low-cost, low-power edge AIoT to become a reality. AIoT is enabled by the higher compute capability of recent MCUs, as well as thin neural network models that are more suited for resource-constrained MCUs used in these end-point devices. AI on MCU-based IoT devices allows real-time decision-making and faster response to events and also brings the advantages of lower bandwidth requirements, lower power, lower latency, lower costs, and higher security than MPUs or DSPs. MCUs also offer faster wake times which enable faster time to inference and lower power consumption, as well as higher integration with memory and peripherals to help lower overall system costs for cost-sensitive applications.

The Cortex-M4 / M33-based MCUs can address the needs of simpler AI use cases such as keyword spotting and predictive maintenance tasks with lower performance needs. However, when it comes to more complex use cases such as vision AI (object detection, pose estimation, image classification) or voice AI (speech recognition, NLP), a more powerful processor is required. The older Cortex-M7 core can handle some of these tasks but the inference performance is low, typically only in the 2-4 fps range.

What is needed is a higher-performance microcontroller with AI acceleration.

Introducing the RA8 Series high-performance AI MCUs

The new RA8 series MCUs featuring the Arm Cortex-M85 core based on the Arm v8.1M architecture and a 7-stage superscalar pipeline, provide the additional acceleration needed for compute-intensive neural network processing or signal processing tasks.

The Cortex-M85 is the highest-performance Cortex-M core and comes equipped with Helium™, the Arm M-Profile Vector Extension (MVE) introduced with the Arm v8.1M architecture. Helium is a Single Instruction Multiple Data (SIMD) vector processing instruction set extension that can provide performance uplift by processing multiple data elements with a single instruction, such as repetitive multiply accumulates over multiple data. Helium significantly accelerates signal processing and machine learning capabilities in resource-constrained MCU devices and enables an unprecedented 4x acceleration in ML tasks and 3x acceleration in DSP tasks compared to the older Cortex-M7 core. Combined with large memory, advanced security, and a rich set of peripherals and external interfaces, the RA8 MCUs are ideally suited for voice and vision AI applications, as well as compute-intensive applications requiring signal processing support such as audio processing, JPEG decoding, and motor control.

What RA8 MCUs with Helium enable

The Helium performance boost is enabled by processing wide 128-bit vector registers that can hold multiple data elements (SIMD) with a single instruction. Multiple instructions may overlap in the pipeline execution stage. The Cortex-M85 is a dual-beat CPU core and can process two 32-bit data words in one clock cycle, as shown in Figure 1. A Multiply-Accumulate operation requires a load from memory to a vector register followed by a multiply-accumulate, which can happen at the same time as the next data is being loaded from memory. The overlapping of the loads and multiplies enables the CPU to have double the performance of an equivalent scalar processor without the area and power penalties.

Figure 1: CM85 is a dual-beat CPU, meaning two 32-bit words can be processed per clock cycle

Helium introduces 150 new scalar and vector instructions for the acceleration of signal processing and machine learning including:

Low Overhead Branch Extension (LOBE) for optimized branch and loop operations
Lane predication that allows conditional execution of each lane in a vector
Vector gather-load and scatter-store instructions for reads and writes to non-contiguous memory locations useful in the implementation of circular buffers
Arithmetic operations on complex numbers such as add, multiply, and rotate used in DSP algorithms
DSP functions such as circular buffers for FIR filters, bit reversed addressing for FFT implementations format conversion in image and video processing
Polynomial math that supports finite field arithmetic, cryptographic algorithms, and error correction
Support for 8, 16, and 32-bit fixed point integer data used in audio/image processing and ML and half, single and dual precision floating point data used in signal processing

These features make a Helium-enabled MCU particularly suited for AI / ML and DSP-style tasks without an additional DSP or hardware AI accelerator in the system and also lowering costs and power consumption.

Voice AI Application with RA8M1 MCUs

RENESAS has successfully demonstrated this performance uplift with Helium, in a few AI / ML use cases, showing significant improvement over a Cortex-M7 MCU – more than 3.6x in some cases. One such application is a voice command recognition use case that runs on the RA8M1 and implements a deep neural network (DNN) that is trained with thousands of diverse voices and supports over 40 languages. This voice application presents an enhancement over simple keyword spotting and supports a modified form of Natural Language Understanding (NLU) that does not depend only on the command word or phrase, but instead looks for intent. This enables the use of more natural language without having to remember exact keywords or phrases.

The voice implementation makes use of the SIMD instructions available on the Cortex-M85 core with Helium. RA8M1 is a natural fit for these kinds of voice AI solutions with its large memory, support for audio acquisition, and above all, the high performance and ML acceleration enabled by the Cortex-M85 core and Helium. Even the preliminary implementation of this solution with and without Helium demonstrates more than 2x inference performance improvement over the Cortex-M7-based MCU, as shown in Figure 2.

Figure 2: Voice AI Application on the RA8M1 MCU demonstrates performance improvements of CM85 over CM7, without and with Helium

As is evident, RA8 MCUs with Helium can significantly improve neural network performance without the need for any additional hardware acceleration, thus providing a low-cost, low-power option for the implementation of simpler AI and machine learning use cases.

发送到邮箱 |
+1 赞 0
收藏
评论 0
| 转发至：

本文由翊翊所思转载自RENESAS Blogs，原文标题为:Power Your Edge AI Application with the Industry’s Most Powerful Arm MCUs，本站所有转载文章系出于传递更多信息之目的，且明确注明来源，不希望被转载的媒体或个人可与我们联系，我们将立即进行删除处理。

全部评论（0）

暂无评论

Leveraging Helium and ARM® Cortex®-M85 for Unprecedented DSP and AI Performance on an MCU Core

In conclusion, the Cortex-M85 with Helium can contribute to a significant uplift in AI/ML and DSP performance while outshining the rest of the Cortex-M cores in scalar performance. This makes it an ideal choice for more complex processing tasks.

技术探讨发布时间 : 2023-04-26

Semiconductor Industry is Pulling AI Across a Diversity of End Uses and Applications: Renesas Provides A Series of Products for AI Applications

Renesas supplies a range of compute devices, from battery-powered MCUs to Linux-based MPUs. Intensive AI applications are offloaded to onboard accelerators with high TOPS/watt performance. For model deployment and management at scale and MLDevOps, Renesas also provides cloud connectivity stacks.

技术探讨发布时间 : 2023-08-10

【技术】发展的MCU市场、培养用户友好的客户设计生态系统的想法以及人工智能对MCU组件选择和设计流程的影响

资深半导体行业编辑兼顾问Andrew MacLellan最近与瑞萨电子物联网和基础设施业务部MCU业务部高级副总裁Roger Wendelken一起讨论不断发展的MCU市场、培养用户友好的客户设计生态系统的想法以及人工智能(AI)对MCU组件选择和设计流程的影响。

技术探讨发布时间 : 2023-04-29

AI加速边缘计算，聚焦AIOT芯片，NPU SOC，离线语音MCU，高算力智能模组等

世强硬创联合地平线，阿普奇，启英泰伦，美格智能，普林芯驰，唯创知音，九芯电子，芯闻，VINKO，MERRY带来AI新产品，聚焦AIOT芯片，NPU SOC，离线语音MCU，高算力智能模组等，加速边缘计算。

活动发布时间 : 2023-06-08

恒烁半导体（合肥）股份有限公司成立于2015年，恒烁半导体（合肥）股份有限公司是一家主营业务为存储芯片和MCU芯片研发、设计及销售的集成电路企业。公司于2022年8月29日在上海证券交易所科创板上市。

MCU SPI NOR FLASH

授权代理商 - 世强先进（深圳）科技股份有限公司线上商城技术资料

展开

招聘高算力MCU/SOC工程师，负责智能驾驶、AI、机器人等热门项目

参与汽车电子、人工智能、工业等热门项目的研发。技术大牛带队、接触全品类器件，快速拓宽市场应用领域和技术的广度、深度；服务中国TOP2000+知名硬科技企业，丰厚项目奖、年终奖。

招聘信息发布时间 : 2024-08-08

他山（TASHAN）AI触觉传感芯片选型指南

描述- TS3F605 是一款集成了一个 30通道 24 位电容-数字转换（CDC）、内置 R-SpiNNaker 四核类脑芯片架构的 32位SOC 芯片。内置的 24 位 CDC 可配置成电容测量模式或电压测量模式，这两种模式都可配置成单端模式及差分模式。其中电容测量模式即可实现互电容的测量，又可实现自电容的测量。30 个通道通过内置的模拟路由可配置每个通道不同的功能，可实现 CDC 的片间路由。内置 R-SpiNNaker 四核类脑芯片架构，可实现分布式计算。

型号- TS4M系列,TS3系列,TS3A,TS4M41X,TS4M21X,TS4M43X,TS3M0,TS3F505,TS3F605,TS3F,TS3,TS3F600,TS3M04X,TS3000,TS3M02X,TS3F500,TS3005,TS3A0,TS3A805,TS4M42X,TS4M22X,TS3A700,TS3A800,TS4M23X,TS3A0系列,TS3A705,TS3A系列,TS3M0系列,TS3M03X,TS3F系列,TS3M01X,TS4M

选型指南 - 他山 - 2023/10/9 PDF 中文下载

启英泰伦（Chipintelli）AI语音芯片选型指南

描述- 启英泰伦成立于2015年，是集语音芯片、语音算法、方案、平台于一体的行业领导型智能语音解决方案供应商，公司产品已广泛应用于智能家居家电、照明、玩具、IoT、车载、音响、机器人等领域，是端侧人工智能芯片及解决方案的行业开拓者和领导者。

型号- CI130X,CI130X系列,CI131系列,CI13,CI1311,CI11,CI1312,CI2305,CI231系列,CI2306,CI230系列,CI132X,CI1122,CI1002,CI2311,CI231,CI1102,CI2312,CI1103,CI1301,CI1302,CI1006,CI13系列,CI230,CI1303,CI131,CI11系列,CI1306

选型指南 - 启英泰伦 - 2023/7/27 PDF 中文下载

合纵连横，航顺HK32 MCU预测AI大潮下的MCU发展新趋势

在6月15日召开的“赋能创芯，共筑生态”2024年度航顺HK32MCU新品发布会暨第二次代理商培训大会上，航顺芯片联合创始人、首席科学家＆CTO王翔分享了AI大潮下的MCU发展趋势，概括起来就是---合纵连横！

厂牌及品类发布时间 : 2024-06-22

芯科科技PG28 32位微控制器配备AI/ML硬件加速器，可在较低能耗的情况下将机器学习推理的性能加倍

本文重点介绍了Silicon Labs（亦称“芯科科技”）的PG28 32位微控制器（MCU）的特性和优势，其领先集成了人工智能和机器学习（AI/ML）硬件加速器，可在边缘位置以更低功耗进行更快推理；同时还兼容EFR32xG28 无线SoC 平台（ZG28、FG28 和 SG28），为开发人员提供实现各种低功耗、高性能嵌入式物联网应用的理想选择。

厂牌及品类发布时间 : 2024-06-22

32位MCU APM32F407IG测评：移植轻量级AI推理框架TinyMaix如何实现手写数字识别

本文将介绍如何为APM32F407IG芯片移植轻量级AI推理框架——TinyMaix，并在开发板上运行TinyMaix的手写数字识别示例。

设计经验发布时间 : 2023-10-31

PG26荣获中国电子报评选为“2024边缘AI MCU优秀案例”

Silicon Labs（亦称“芯科科技”）最新发布的EFM32PG26（PG26） 32位微控制器（MCU）近期荣获中国电子报评选并推荐为“2024边缘AI MCU优秀案例”。PG26通过提升了两倍的闪存和RAM容量以及GPIO的数量，同时还嵌入人工智能和机器学习（AI/ML）硬件加速器来满足各种低功耗和高性能嵌入式物联网应用需求，因而获得行业的认可及青睐。

厂牌及品类发布时间 : 2024-06-11

世强目前有代理riscv的mcu吗？

世强代理的RISC-V内核MCU厂家越来越多，还在不断增长。国外厂家有瑞萨【产品】瑞萨推出全新RISC-V MCU R9A02G020，优化先进电机控制系统设计和降低用户开发成本国内厂家介绍如下：广芯微内置32位RISC-V内核的PD SoC芯片UM3506，最高主频33MHz 全球首家全栈自研RISC-V内核的通讯接口芯片/全栈MCU供应商沁恒（WCH） RISC-V内核低功耗32位MCU，中移芯昇授权世强先进全线代理芯昇科技携多款芯片产品亮相ICDIA，助力RISC-V生态发展先楫半导体提供多系列通用MCU，以开源的RISC-V架构为核心，综合性能达世界领先水平航顺芯片首次进入胡润全球独角兽榜，已量产中国第一颗M3+RISC-V多核MCU 中微半导体积极推动RISC-V架构安全化、智能化，共筑国产安全芯片生态博流业界第一款基于RISC-V CPU的WI-FI+BLE双模SoC芯片，打造智慧家居AIoT芯片平台璇玑CLE系列是核芯互联基于32位RISC-V内核推出的通用嵌入式MCU处理器方寸微电子加入RISC-V产业联盟，推动安全芯片国产化，打破国外芯片技术垄断

技术问答发布时间 : 2019-12-09

Zbit Participated SCGC 2023 and Began to Market the Best Cost-effective AI Products and Solutions

Zhao xinlin, vice president of marketing of Zbit, attended the annual meeting as an invited guest. In the future, the company will combine its own technology accumulation in the field of NOR Flash and MCU, to develop integrated memory and Computing AI chip development and application.

厂牌及品类发布时间 : 2023-09-12