Why can't I put my API key directly in the React Native app?

Because anyone can extract strings from a published app binary, exposing your key and letting strangers spend your money. The correct pattern is a thin backend proxy that stores the key server-side and forwards requests, so the phone only ever talks to your own server.

How do I get streaming replies to work on mobile?

Have your backend request a streaming response from the model and forward the chunks to the app over a streaming connection. In the app, append each incoming token to the current assistant message in state, so the UI updates word by word rather than waiting for the full response.

How does voice input work?

Record audio with Expo AV, then send the audio file to a speech-to-text endpoint such as Whisper, which returns a transcript. You then treat that transcript exactly like typed input and run it through your normal chat flow, so voice and text share one code path.

Should I use Expo or bare React Native?

For most AI apps, Expo is the faster path — it bundles audio, networking, and storage modules and handles builds for both platforms. Drop to bare React Native only if you need a native module Expo does not support, which is uncommon for an LLM chat or voice app.

How do I keep API costs and latency under control on mobile?

Cache recent responses on the device so repeated questions do not re-hit the model, cap response length, default to a smaller model, and show streaming output so the app feels fast even when total latency is unchanged. Set a hard spending limit in your provider dashboard.

How to Build an AI-Powered Mobile App with React Native

What you are building

This tutorial walks through adding real AI features to a React Native app — the kind of streaming chat and voice experience users now expect. The stack is Expo (which bundles the audio, storage, and networking pieces you need), a small backend proxy that holds your API key, and the OpenAI API for chat and transcription. The single most important architectural decision is that the phone never talks to the model provider directly: a key shipped inside an app binary can be extracted by anyone, so all requests go through your own backend. Get that right and the rest — streaming, voice, caching — is ordinary app code. Use the planner below to sequence the build and estimate effort for each piece.

How it works

The architecture is three layers. The app renders a chat UI and records audio. A backend proxy holds your secret key, forwards chat and transcription requests to OpenAI, and streams responses back. The model provider does the actual generation. For chat, your backend requests a streaming completion and pipes the chunks to the app, which appends each token to the current message so replies appear word by word. For voice, Expo AV records audio, your backend sends it to a transcription model like Whisper, and the returned text flows into the same chat path — voice and text share one code path. A small on-device cache stores recent responses so repeat questions return instantly without another paid call.

Tips and pitfalls

The proxy is non-negotiable; never embed the key. Make the app feel fast with streaming even when total latency is unchanged — perceived speed comes from seeing the first words quickly. Reuse one code path for voice and text by transcribing audio to text early, so you do not maintain two flows. Cache aggressively on device with a shape-guarded read (validate the stored JSON before using it) so a corrupted cache never crashes the app. Default to a smaller model for routine turns, cap output length, and set a hard spending limit in your provider dashboard so a runaway loop on a user’s phone cannot drain your account. Build the chat slice end to end first, then layer in voice and caching once the core loop works.