Back to Blog

Artificial Intelligence Edge Computing

Multimodal Generative AI on Energy-Efficient Edge Processors

Image of Dr. Sakyasingha Dasgupta

Dr. Sakyasingha Dasgupta

Multimodal-Generative-AI-on-Energy-Efficient-Edge-Processors

Listen to this blog:

Edge computing will grow exponentially in the next few years, as more and more devices and applications demand low-latency, high-performance, and privacy-preserving computation at the edge of the network.

However, one of the biggest challenges facing edge computing is handling the increasing complexity and diversity of data sources and modalities, such as images, videos, audio, text, speech, and sensors. This challenge is where multimodal generative artificial intelligence (AI) comes into play.

Multimodal AI enables applications to understand and generate rich, natural human-like interactions. Generative AI can be unimodal or multimodal. Multimodal AI allows applications to leverage the complementary and redundant information from different modalities to improve the accuracy and robustness of the results. Examples of applications already using multimodal generative AI capability are conversational agents, image captioning, video summarization, and emotion recognition.

However, multimodal AI poses significant challenges for edge computing, as it requires high computational power, large memory bandwidth and complex algorithms to process and fuse multiple data streams in real-time. Traditional edge devices, such as smartphones, cameras, and IoT sensors, are often constrained by limited battery life, storage capacity, and processing capabilities. These constraints drive a need for innovative solutions that can enable multimodal AI at the edge without compromising on performance, efficiency, or quality.

Enabling Devices with Multimodal Generative AI Inside

In this context, generative AI and large language models have the potential to redefine how we create and consume digital content. Devices can produce realistic and engaging text, images, audio and video from scratch or based on user input. Imagine a smart camera that can generate captions for live video streams, or a voice assistant that can synthesize natural speech from text, seamlessly in real-time.

As an example, DeepMind’s recent Flamingo visual language model has an interface of text and visuals, that can guide the model to solve a multimodal task. Just like large language models (LLMs) can handle a language task by processing task examples in their text prompt. The model can generate an answer when asked a question with a new image or video, after seeing just a few pairs of visual inputs and expected text responses.

More recently, OpenFlamingo, an open-source reproduction of the Flamingo model was released. At its core, OpenFlamingo is a multimodal language model trained on a very large multimodal dataset that can be used for a variety of tasks like generating image captions and generate a question or provide an answer given an image and a text passage together.

Visual language model in multimodal (images and text) generative AI context
Video credit: Flamingo

Energy-efficient edge AI processors will play a key role in enabling such multimodal language models on various devices, combined with innovative new techniques and technologies that can reduce the overall memory and power footprint. Some possible directions are:

Designing more compact and efficient generative AI (including smaller language models) in a multimodal context, that can fit on edge devices without compromising performance or accuracy.
Developing novel compression and quantization methods reducing the size and complexity of generative AI and large language models without losing information or quality.
Leveraging distributed and federated learning approaches for training and updating such multimodal generative AI models on edge devices using local data and resources.
Exploring hybrid architectures combining cloud and edge computing to optimize the trade-off between speed, quality, and cost of generative AI models.

Bringing multimodal generative AI to the edge can revolutionize entire industries. EdgeCortix is pioneering the technology for energy-efficient AI processors and software ready for the future of edge computing. We deliver advanced hardware and software tools for creating multimodal AI applications with high performance, low power consumption, and flexible programmability.

Putting AI in the Hands of More Edge Developers

EdgeCortix's vision is to empower the edge with multimodal AI capabilities for industries such as defense & security, smart cities, healthcare, education, entertainment and more. We believe that multimodal AI is the key to unlocking the full potential of edge computing. Two EdgeCortix product families are already making significant impacts.

Our SAKURA AI processors employ a novel architecture combining heterogeneous cores, reconfigurable data paths, and memory to deliver scalable and adaptable performance for different data types and modalities. The heterogeneous cores consist of compute cores for deep learning, including convolutional and transformer models, vector cores for arithmetic tasks, and programmable general-purpose cores. The reconfigurable data paths enable custom hardware acceleration for specific algorithms or applications, while maximizing compute utilization at low power. The memory fabric provides high-bandwidth and low-latency access to on-chip and off-chip memory resources.
Our MERA software stack provides a unified framework for easily and efficiently developing and deploying multimodal AI applications on various heterogeneous edge systems. MERA consists of a compiler that optimizes the code for the target device, a runtime managing the execution of applications on a device, a library providing common functions and algorithms for multimodal AI tasks, and a toolchain for debugging, profiling and testing applications.

Multimodal generative AI is a cutting-edge field demanding innovative solutions for performance, power-efficiency and quality issues at the edge. EdgeCortix is an edge AI company delivering such solutions with its groundbreaking SAKURA AI processors and MERA software. We are dedicated to enabling the edge with low-power multimodal AI capabilities that can serve various industries and domains well beyond today’s power-hungry and high-cost solutions intended for data centers. EdgeCortix is at the forefront of building a more intelligent, energy-efficient, and connected world with multimodal AI at the edge.

Stay Ahead of the Curve: Subscribe to Our Blog for the Latest Edge AI Technology Trends!

Image of Dr. Sakyasingha Dasgupta

Dr. Sakyasingha Dasgupta

Dr. Sakyasingha Dasgupta is the founder and CEO of EdgeCortix group companies. He is an artificial intelligence (AI) and machine learning technologist, entrepreneur, and engineer with over a decade of experience in taking cutting edge AI research from ideation stage to scalable products, across different industry verticals. Having lead teams at global companies like Microsoft and IBM Research / IBM Japan, along with national research labs like RIKEN Japan and Max Planck Institute Germany; in his more recent roles prior to founding EdgeCortix, he helped establish and lead the technology division at lean startups in Japan and Singapore, in semiconductor technology, robotics & autonomous vehicles and Fintech sectors. After spending more than a decade in research and development in diverse areas like, brain inspired computing, robotics, computer vision, AI acceleration on semiconductors, wearable devices, internet of things, machine learning in finance and healthcare, Sakya founded EdgeCortix in 2019, as a fabless semiconductor design company focused on enabling energy-efficient edge intelligence. EdgeCortix has its R&D headquarters and semiconductor design team based in Tokyo, Japan, working on the radical idea of taking a software first approach, while designing an AI specific reconfigurable processor from the ground up using a patented technique called "hardware & software co-exploration". Targeting advanced computer vision applications first, using software IP on existing processors like FPGAs and custom ASIC design, EdgeCortix is positively disrupting the rapidly growing AI semiconductor space across defense, security, aerospace, smart cities, industry 4.0, autonomous vehicles and robotics. Sakya holds a PhD. in Physics of Complex Systems from the Max Planck Institute in Germany, along with Masters in Artificial Intelligence from The University of Edinburgh, U.K. Prior to founding EdgeCortix he also completed his entrepreneurship studies from the MIT Sloan School of Management. He holds over 20 patents worldwide and his research has garnered over 1000 citations.

Related Posts

Software-defined heterogeneous computing enables better solutions for complex problems

AI Drives the Software-Defined Heterogeneous Computing Era

Image of Dr. Sakyasingha Dasgupta

Dr. Sakyasingha Dasgupta

Artificial Intelligence Edge Computing

{% module_block module "widget_6cbf07ef-2622-4453-8e26-0a4bf013f4b1" %}{% module_attribute...

Deploying AI at the Edge: Enhancing Military Readiness and Response

Image of Stan Crow

Stan Crow

Artificial Intelligence Edge Computing

{% module_block module "widget_1d6e0a79-7de3-4243-b2f9-60fa1bff70c2" %}{% module_attribute...