articleintermediate

On-device small language models with multimodality, RAG, and Function Calling

By cuuupidMarch 6, 2026hackernews

Google AI Edge introduces on-device small language models that support multimodality, retrieval-augmented generation (RAG), and function calling capabilities. These models enable developers to build intelligent applications that run locally on devices while maintaining privacy and reducing latency.

Key Points

•Small language models (SLMs) enable on-device AI inference with reduced latency, lower power consumption, and improved privacy by processing data locally without cloud connectivity
•Multimodality support allows SLMs to process and understand multiple input types (text, images, audio) simultaneously for richer context and more intelligent responses
•Retrieval-Augmented Generation (RAG) enhances SLM accuracy by dynamically fetching relevant external knowledge and documents to augment model responses in real-time
•Function calling enables SLMs to invoke external APIs and tools, allowing models to take actions beyond text generation and integrate with application workflows
•On-device deployment reduces infrastructure costs, eliminates network latency, and maintains user data privacy by keeping sensitive information on local devices
•SLMs are optimized for edge devices (smartphones, IoT, embedded systems) with limited computational resources while maintaining competitive performance
•Combining multimodality, RAG, and function calling creates intelligent agents capable of understanding context, retrieving information, and executing complex tasks autonomously
•On-device models enable offline functionality and work in environments with unreliable or unavailable internet connectivity

Found this useful? Add it to a playbook for a step-by-step implementation guide.

Workflow Diagram

Start Process

Step A

Step B

Step C

Complete

Quality★★★★★

Concepts

Integrations Skills & Tools Setup & Infrastructure Tool Use