Have you heard about our newly announced SAKURA-II platform?
In this blog, you will discover the key advantages of this exciting new Edge AI platform. You will learn how the SAKURA-II platform offers the best balance of complex model processing, seamless software integration, efficient processing, and minimized power consumption for the edge AI market. We will share how the complete platform of the SAKURA-II AI Accelerator, optimized for Generative AI at the edge, our proprietary Dynamic Neural Accelerator (DNA) architecture, the heterogenous MERA compiler, and our selection of Modules and Cards can help you easily integrate AI functionality into your next design.
Many AI Accelerators in the market are only capable of processing Convolutional workloads. Today, the ability to handle Transformer workloads is critical as most new applications need to process highly complex large models, such as language and vision. SAKURA-II is able to easily process these Generative AI workloads while maintaining extremely high energy efficiency, critical at the edge where resources are limited. The SAKURA-II platform is tailored specifically for processing these Generative AI workloads at the edge with low power consumption of less than 10W.
GPUs can be adept at handling multi-billion parameter models like Llama 2, Stable Diffusion, DETR, and ViT, for example. However, due to their low efficiency in the range of 40%, GPUs consume a large amount of power while processing these large Generative AI models. Conversely, many AI accelerators cannot process these complex models. SAKURA-II combines the ability to process these complex models at up to 90% efficiency and does so within a typical power envelope of 10W. This combination of efficient processing combined with low power makes SAKURA-II ideal for Generative AI at the edge.
One of the first challenges designers face when deciding to implement AI processing at the edge - What software platform can support everything I need to do? Often, designers already have a chosen CPU, so a platform that can support popular CPUs while offering AI functionality is paramount. The EdgeCortix MERA compiler framework meets this need, supporting CPUs such as Intel, AMD, Arm, and RISC-V, while offering easy integration of the SAKURA-II AI Accelerator. Pre-defined, optimized models can be directly sourced from Hugging Face or our Model Library, followed by calibration and quantization. MERA leverages Apache TVM and MLIR functionality, and its front end is open sourced.
An often overlooked limitation in AI processing at the edge is the inability of many AI accelerators to manage memory resources effectively. In Generative AI applications like Large Language Models (LLMs) and Large Visual Models (LVMs), memory access is critical to attaining the level of performance needed to process these complex models. When memory bandwidth is limited, it caps the ability of the accelerator to transfer the data needed to process models and provide timely results. Our SAKURA-II AI Accelerator provides up to four times more DRAM bandwidth than other accelerators, up to 68 GB/sec, ensuring superior performance for these complex LLMs and LVMs.
Many applications implementing AI processing at the edge are extremely time sensitive. Applications including traffic management, face and object recognition, and security control are some of the many examples where an inability to analyze and respond immediately to inputs could result in disastrous outcomes, even to the point of threatening human lives. Any AI accelerator or GPU that cannot respond in real time is useless for these types of critical applications. Our SAKURA-II AI Accelerator is optimized for very low latency operation under real time conditions, making it ideal for usage in designs where immediate response is vital to ensure the safety of all.
Many AI accelerators only support fixed point precision, typically INT8, limiting them to only convolutional applications. In the ever evolving AI market, support for floating point operations provides an increase in accuracy and efficiency for AI applications. Efficient AI processing should use both fixed and floating point precision to ensure the best operation based on the provided data sets and models. The SAKURA-II AI Accelerator operating in conjunction with our MERA compiler framework, provides full mixed-precision support for near FP32 accuracy, and optimizes the proper precision type to ensure the most efficient AI processing at the edge.
In real world AI applications that involve modeling complex phenomena, data sets often contain a large number of zero elements. Without efficient memory handling, these data sets can suffer from poor overall performance as the accelerator, GPU or CPU can spend a lot of time accessing unnecessary data. The ability to parallelize and compact this memory results in a significant performance improvement. The SAKURA-II AI Accelerator, driven by our Dynamic Neural Accelerator (DNA) architecture, efficiently handles the zero elements and reduces the size of the data set. The end result is a substantially reduced memory footprint, with optimized memory bandwidth, resulting in more efficient AI processing, up to 60 TOPS.
One of the main challenges in classification functions happens when the data set cannot be easily differentiated. For complex data sets, a variety of activation functions may be required. The SAKURA-II AI Accelerator platform has built-in support for popular activation functions. The key advantage of SAKURA-II is the additional ability to emulate any activation functions without a chip redesign which other solutions do not support. The result is that your end system is fully future-proofed for changing data sets that may not be supported with built-in activation functions.
With the significant increase in Generative AI applications requiring state-of-the-art complex transformer models, efficient deep neural networks may adjust the order of the tensor elements using Reshaper->Transpose before they are fed into computational operators. Without this functionality, host intervention is required, and this additional communication with the host CPU results in increased latency and delayed execution. With SAKURA-II, EdgeCortix has implemented a dedicated ‘Reshaper’ hardware block, which performs these data shuffling operations on the tensors using onboard DDR memory without host CPU intervention. The benefit of this added functionality is that it allows the entire network to be processed solely on the SAKURA-II device, freeing up PCIe bandwidth and reducing the load on the host CPU. This leads to low latency execution and overall network efficiency improvement, critical for these advanced Generative AI applications at the edge.
For AI processing at the edge, it is critical to meet specific power requirements in order to create a successful product. Whether the system is battery-based, or AC powered, the overall power consumption affects the cost of daily operation, long-term reliability, replacement and repair costs, etc. Even if a GPU or accelerator meets the AI processing requirements, if the device consumes too much power, it will be untenable for edge applications. The SAKURA-II platform operates within a power envelope of just 10W, making it ideal for most edge applications. In addition, SAKURA-II uses built-in, automatic on-chip power gating to minimize power consumption and offers users the ability to shut down parts of the DNA engines to optimize their system power.
We are now accepting pre-orders for four development options to get you started on your next AI design. Here are the options:
You can learn more about the architecture and specifications of these modules and cards on our Modules and Cards Page.
To place your pre-order, please visit our Pre-Order Page.
If this blog, you have learned about the many ways that the SAKURA-II platform can optimize and augment your next AI design at the edge. We have shared how the SAKURA-II platform offers the best balance of complex model processing, seamless software integration, efficient processing, and minimized power consumption for the edge AI market. You have also seen the options available for pre-order to get ready to implement this exciting new technology in your next AI design.
In addition, we encourage you to explore and learn more about the SAKURA-II platform. Learn more about the components in the platform on our Product Overview Page. Explore the SAKURA-II device details on the SAKURA-II Page. Learn more about our MERA Compiler and Software Framework on the MERA Page. Discover our Dynamic Neural Accelerator run-time reconfigurable architecture on the DNA Page. You can also download the product briefs here: SAKURA-II AI Accelerator Product Brief; SAKURA-II M.2 Modules Brief; and SAKURA-II PCIe Cards Brief.
Michael is EdgeCortix’s Director of Product Marketing. Michael has four decades of experience in product marketing for both large-scale worldwide semiconductor companies and small, start-up companies in the early stages of emerging markets. His experience includes the integration and marketing of complete product solutions, including silicon, software and development platforms. Michael holds an MBA from Santa Clara University and an BSEE from California Polytechnic State University, San Luis Obispo.