Listen to this blog:
However, one of the biggest challenges facing edge computing is handling the increasing complexity and diversity of data sources and modalities, such as images, videos, audio, text, speech, and sensors. This challenge is where multimodal generative artificial intelligence (AI) comes into play.
Multimodal AI enables applications to understand and generate rich, natural human-like interactions. Generative AI can be unimodal or multimodal. Multimodal AI allows applications to leverage the complementary and redundant information from different modalities to improve the accuracy and robustness of the results. Examples of applications already using multimodal generative AI capability are conversational agents, image captioning, video summarization, and emotion recognition.
However, multimodal AI poses significant challenges for edge computing, as it requires high computational power, large memory bandwidth and complex algorithms to process and fuse multiple data streams in real-time. Traditional edge devices, such as smartphones, cameras, and IoT sensors, are often constrained by limited battery life, storage capacity, and processing capabilities. These constraints drive a need for innovative solutions that can enable multimodal AI at the edge without compromising on performance, efficiency, or quality.
In this context, generative AI and large language models have the potential to redefine how we create and consume digital content. Devices can produce realistic and engaging text, images, audio and video from scratch or based on user input. Imagine a smart camera that can generate captions for live video streams, or a voice assistant that can synthesize natural speech from text, seamlessly in real-time.
As an example, DeepMind’s recent Flamingo visual language model has an interface of text and visuals, that can guide the model to solve a multimodal task. Just like large language models (LLMs) can handle a language task by processing task examples in their text prompt. The model can generate an answer when asked a question with a new image or video, after seeing just a few pairs of visual inputs and expected text responses.
More recently, OpenFlamingo, an open-source reproduction of the Flamingo model was released. At its core, OpenFlamingo is a multimodal language model trained on a very large multimodal dataset that can be used for a variety of tasks like generating image captions and generate a question or provide an answer given an image and a text passage together.
Visual language model in multimodal (images and text) generative AI context
Video credit: Flamingo
Energy-efficient edge AI processors will play a key role in enabling such multimodal language models on various devices, combined with innovative new techniques and technologies that can reduce the overall memory and power footprint. Some possible directions are:
Bringing multimodal generative AI to the edge can revolutionize entire industries. EdgeCortix is pioneering the technology for energy-efficient AI processors and software ready for the future of edge computing. We deliver advanced hardware and software tools for creating multimodal AI applications with high performance, low power consumption, and flexible programmability.
EdgeCortix's vision is to empower the edge with multimodal AI capabilities for industries such as defense & security, smart cities, healthcare, education, entertainment and more. We believe that multimodal AI is the key to unlocking the full potential of edge computing. Two EdgeCortix product families are already making significant impacts.
Multimodal generative AI is a cutting-edge field demanding innovative solutions for performance, power-efficiency and quality issues at the edge. EdgeCortix is an edge AI company delivering such solutions with its groundbreaking SAKURA AI processors and MERA software. We are dedicated to enabling the edge with low-power multimodal AI capabilities that can serve various industries and domains well beyond today’s power-hungry and high-cost solutions intended for data centers. EdgeCortix is at the forefront of building a more intelligent, energy-efficient, and connected world with multimodal AI at the edge.
Dr. Sakyasingha Dasgupta is the founder and CEO of EdgeCortix group companies. He is an artificial intelligence (AI) and machine learning technologist, entrepreneur, and engineer with over a decade of experience in taking cutting edge AI research from ideation stage to scalable products, across different industry verticals. Having lead teams at global companies like Microsoft and IBM Research / IBM Japan, along with national research labs like RIKEN Japan and Max Planck Institute Germany; in his more recent roles prior to founding EdgeCortix, he helped establish and lead the technology division at lean startups in Japan and Singapore, in semiconductor technology, robotics & autonomous vehicles and Fintech sectors. After spending more than a decade in research and development in diverse areas like, brain inspired computing, robotics, computer vision, AI acceleration on semiconductors, wearable devices, internet of things, machine learning in finance and healthcare, Sakya founded EdgeCortix in 2019, as a fabless semiconductor design company focused on enabling energy-efficient edge intelligence. EdgeCortix has its R&D headquarters and semiconductor design team based in Tokyo, Japan, working on the radical idea of taking a software first approach, while designing an AI specific reconfigurable processor from the ground up using a patented technique called "hardware & software co-exploration". Targeting advanced computer vision applications first, using software IP on existing processors like FPGAs and custom ASIC design, EdgeCortix is positively disrupting the rapidly growing AI semiconductor space across defense, security, aerospace, smart cities, industry 4.0, autonomous vehicles and robotics. Sakya holds a PhD. in Physics of Complex Systems from the Max Planck Institute in Germany, along with Masters in Artificial Intelligence from The University of Edinburgh, U.K. Prior to founding EdgeCortix he also completed his entrepreneurship studies from the MIT Sloan School of Management. He holds over 20 patents worldwide and his research has garnered over 1000 citations.