How I Built WazWuz with Gemini Live API and Google Cloud

By Amir Lotfy • March 10, 2026 • 8 min read • Gemini Live API, Google Cloud, Next.js

This article was created specifically for the purpose of entering the Gemini Live API Developer Challenge hackathon. If you share your own build journey publicly, use the hashtag #GeminiLiveAgentChallenge.

What is WazWuz?

WazWuz is a live, conversation-first creative assistant for image workflows. Instead of treating AI as a one-shot prompt box, WazWuz treats the session as a structured creative process:

speak naturally to drive edits and variants
preserve non-destructive version history
branch, compare, and reset with confidence
run queue-based batch operations
export outputs to download or Google Drive

The product goal was simple: combine real-time AI interaction with production-style workflow reliability.

Why this build matters

Most AI demos stop at model output quality. Real products need more:

explicit route and auth boundaries
deterministic backend contracts
reliable state synchronization between UI and APIs
reproducible testing and deploy gates
clean handoff paths (download/Drive, version graph, batch status)

This project was designed around those constraints from day one.

System architecture (high level)

WazWuz is a single Next.js codebase with clear layers:

Client Layer
public marketing/sign-in/legal pages
protected app pages under /app/*
Frontend Layer
Next.js App Router UI
React Query for data synchronization
live studio interaction and tool-driven UX
Backend Layer
typed Next.js route handlers
canonical API surfaces for live tools, uploads, drive, project operations
explicit queue/process split for batch execution
AI + Data Layer
Gemini Live API for low-latency conversational flow
Gemini GenAI for analysis/editing/trait resolution tasks
Firestore for metadata and graph records
Cloud Storage for binary assets
Google Drive API for export delivery
Deployment + Quality
Docker + Cloud Run runtime
CI gates: lint -> typecheck -> test -> build

Core stack

Frontend: Next.js 16 (App Router), TypeScript, Tailwind
State and data sync: React Query
Auth: Auth.js with Google OAuth and magic-link support
AI: Gemini Live API + Gemini GenAI
Data: Firestore
Assets: Google Cloud Storage
Delivery: Google Drive API
Runtime: Cloud Run
Testing: Vitest + smoke/e2e checks

Product flows implemented

1) Live studio workflow

user opens project studio
live session connects with ephemeral token flow
assistant can trigger backend tools (variants, edits, branching, compare metadata, export actions)
frontend invalidates canonical query keys and updates current state

2) Non-destructive versions

Every important transformation creates a version node. This enabled:

restore to any prior point
branch from selected state
compare before/after versions
avoid destructive overwrite behavior

3) References + trend resolution

Users can build reference boards and resolve style terms into structured traits, then apply those traits through dedicated trend flows.

4) Batch queue/process model

Batch operations are intentionally split:

/run to enqueue
/process to execute worker-style processing

This avoids pretending synchronous request/response is sufficient for heavier jobs.

5) Export and handoff

Exports support:

preset-based output sizing
local download
Google Drive upload and link generation with explicit sharing scope

Gemini integration design

Gemini Live API

Used for conversational, low-latency interaction in studio:

voice-driven intent capture
live guidance and iterative creative direction
tool invocation bridge into backend operations

Gemini GenAI

Used for structured task execution:

image analysis support
style/trait interpretation
prompt-driven transformation calls

The key pattern: conversation determines intent, backend tools enforce stateful execution.

Route security and boundary model

A major hardening step was formalizing public vs protected boundaries:

Public pages: /, /signin, /signin/verify, /privacy, /terms
Protected pages: all /app/*
Public APIs: /api/auth/[...nextauth], /api/auth/magic-link
Protected APIs: all other /api/*

Middleware policy and handler-level auth checks are both in place, reducing accidental exposure risk.

API contract quality and canonical routing

To reduce drift, canonical routes were enforced and aliases marked deprecated:

canonical live tools: POST /api/live/tools
canonical uploads: POST /api/uploads
canonical drive: /api/drive/status, /api/drive/folders

Alias routes now return deprecation headers so clients can migrate safely.

Reliability improvements made during hardening

callback URL preservation with querystrings in auth redirects
stronger payload validation for live tools
improved batch status truthfulness and failure accounting
stronger storage-disabled behavior handling
server-side token handling for Drive operations
reproducible test/docs updates for judges and reviewers

Reproducible testing approach

WazWuz includes a documented judge-friendly sequence in the repo:

install dependencies
set required environment variables
run quality gates in CI order
run local app
run optional e2e smoke with explicit environment guards

This was done intentionally to make external validation straightforward.

What I would improve next

deeper end-to-end authenticated journey coverage
stronger observability/tracing for route and tool failures
managed queue infra option for batch worker orchestration
richer model-routing strategy for cost/latency optimization per task class

Closing

WazWuz was built to prove that real-time AI creativity can be productized with strong engineering boundaries. Gemini made the interaction model possible; Google Cloud made it operationally practical.

If you are building for this challenge too, optimize for flow + reliability + explicit contracts, not only model output.