Codasip L31 and L11 RISC-V cores for AI/ML support TFLite Micro, customizations
Codasip announced the L31 and L11 low-power embedded RISC-V processor cores optimized for customizing AI/ML IoT edge applications with power and size constraints.
The company further explains that the new L31/L11 RISC-V cores can run Google’s TensorFlow Lite for Microcontrollers (TFLite Micro) and can be optimized for specific applications via Codasip Studio RISC-V design tools. As I understand it, this can be done by customers themselves through a full architecture license, as Codasip CTO Zdeněk Přikryl said:
Licensing the CodAL description of a RISC-V core gives Codasip customers a full architecture license allowing both the ISA and the microarchitecture to be customized. The new L11/31 cores make it even easier to add customer-requested features, such as state-of-the-art artificial intelligence, into the smallest and most energy-efficient embedded processor designs.
The ability to customize cores is important for AI and ML applications because data types, quantization, and performance requirements differ significantly from application to application, and standard processors may not be optimized for a specific task.
We aren’t given many details about the new cores except that they all come with a 3-stage pipeline, while the Codasip L31/L31F (with FPU) use the RV32IMC instruction set, offer 32 registers and a parallel multiplier, while the Codasip L11 relies on the RV32EMC instruction set, comes with 16 registers and a sequential multiplier. They also replace older Codasip L30(F) and L10 cores which are no longer recommended for new designs.
Codadip discusses the benefits of using TFLite-Micro and customization in a white paper titled “Embedded AI on L-Series Cores – Neural networks empowered by custom instructions” (registration required, but you can use a fake email ). They used the “MNIST handwritten digit classification” as an example and compared various implementations in terms of cycle, power and area.
The L31 with FPU (31F) in the middle is much faster, consumes much less power, but it would make a much bigger chip. One solution is to use L31 with quantization of neural network parameters and input data supported by TFLite-Micro, with almost the same performance as the FPU hardware solution, even lower power consumption and the same area since the chip does not need to be modified. Going to integer instead of floating point had a negligible impact on accuracy: 98.91% (fp32) and 98.89% (int8) over a set of 10,000 frames.
So the best compromise is to use L31 with TFLite-Micro, but to further optimize the design, they profiled the program with Codasip Studio to locate the (C) code and associated instructions that consume the most cycles.
To optimize vector memory loads and convolutional multiply and accumulate sequences, they added two custom instructions:
- mac3 to join multiplication and addition in a single clock cycle (speeds up the fourth line above)
- lb.ft to increment the address immediately after the load instruction. (attach lines 2 and 3)
The new instructions appear in the profiles, and the whole loop consumes far fewer cycles. Specifically, this resulted in 10% fewer cycles and an 8% reduction in energy consumption. The new custom instructions increased the area, but only by 0.8%.
TFLite-Micro support is new to Codasip’s RISC-V microcontrollers, but has now been added to all of their cores.
Core evaluation can be performed on a Digilent Nexys A7 FPGA board running bare metal code or an RTOS such as FreeRTOS. More details about the L31 and L11 RISC-V cores can be found on the Codasip website and in the press release.
Jean-Luc started CNX Software in 2010 on a part-time basis, before stepping down as Director of Software Engineering and starting writing daily news and reviews full-time later in 2011.