← Back to Home ← Blog

How I Built WazWuz with Gemini Live API and Google Cloud

By Amir Lotfy March 10, 2026 8 min read Gemini Live API, Google Cloud, Next.js
Gemini Live API Google Cloud Next.js Firestore Cloud Run AI Agent

This article was created specifically for the purpose of entering the Gemini Live API Developer Challenge hackathon. If you share your own build journey publicly, use the hashtag #GeminiLiveAgentChallenge.

What is WazWuz?

WazWuz is a live, conversation-first creative assistant for image workflows. Instead of treating AI as a one-shot prompt box, WazWuz treats the session as a structured creative process:

  • speak naturally to drive edits and variants
  • preserve non-destructive version history
  • branch, compare, and reset with confidence
  • run queue-based batch operations
  • export outputs to download or Google Drive

The product goal was simple: combine real-time AI interaction with production-style workflow reliability.

Why this build matters

Most AI demos stop at model output quality. Real products need more:

  • explicit route and auth boundaries
  • deterministic backend contracts
  • reliable state synchronization between UI and APIs
  • reproducible testing and deploy gates
  • clean handoff paths (download/Drive, version graph, batch status)

This project was designed around those constraints from day one.

System architecture (high level)

WazWuz is a single Next.js codebase with clear layers:

  1. Client Layer
    public marketing/sign-in/legal pages
    protected app pages under /app/*
  2. Frontend Layer
    Next.js App Router UI
    React Query for data synchronization
    live studio interaction and tool-driven UX
  3. Backend Layer
    typed Next.js route handlers
    canonical API surfaces for live tools, uploads, drive, project operations
    explicit queue/process split for batch execution
  4. AI + Data Layer
    Gemini Live API for low-latency conversational flow
    Gemini GenAI for analysis/editing/trait resolution tasks
    Firestore for metadata and graph records
    Cloud Storage for binary assets
    Google Drive API for export delivery
  5. Deployment + Quality
    Docker + Cloud Run runtime
    CI gates: lint -> typecheck -> test -> build

Core stack

  • Frontend: Next.js 16 (App Router), TypeScript, Tailwind
  • State and data sync: React Query
  • Auth: Auth.js with Google OAuth and magic-link support
  • AI: Gemini Live API + Gemini GenAI
  • Data: Firestore
  • Assets: Google Cloud Storage
  • Delivery: Google Drive API
  • Runtime: Cloud Run
  • Testing: Vitest + smoke/e2e checks

Product flows implemented

1) Live studio workflow

  • user opens project studio
  • live session connects with ephemeral token flow
  • assistant can trigger backend tools (variants, edits, branching, compare metadata, export actions)
  • frontend invalidates canonical query keys and updates current state

2) Non-destructive versions

Every important transformation creates a version node. This enabled:

  • restore to any prior point
  • branch from selected state
  • compare before/after versions
  • avoid destructive overwrite behavior

3) References + trend resolution

Users can build reference boards and resolve style terms into structured traits, then apply those traits through dedicated trend flows.

4) Batch queue/process model

Batch operations are intentionally split:

  • /run to enqueue
  • /process to execute worker-style processing

This avoids pretending synchronous request/response is sufficient for heavier jobs.

5) Export and handoff

Exports support:

  • preset-based output sizing
  • local download
  • Google Drive upload and link generation with explicit sharing scope

Gemini integration design

Gemini Live API

Used for conversational, low-latency interaction in studio:

  • voice-driven intent capture
  • live guidance and iterative creative direction
  • tool invocation bridge into backend operations

Gemini GenAI

Used for structured task execution:

  • image analysis support
  • style/trait interpretation
  • prompt-driven transformation calls

The key pattern: conversation determines intent, backend tools enforce stateful execution.

Route security and boundary model

A major hardening step was formalizing public vs protected boundaries:

  • Public pages: /, /signin, /signin/verify, /privacy, /terms
  • Protected pages: all /app/*
  • Public APIs: /api/auth/[...nextauth], /api/auth/magic-link
  • Protected APIs: all other /api/*

Middleware policy and handler-level auth checks are both in place, reducing accidental exposure risk.

API contract quality and canonical routing

To reduce drift, canonical routes were enforced and aliases marked deprecated:

  • canonical live tools: POST /api/live/tools
  • canonical uploads: POST /api/uploads
  • canonical drive: /api/drive/status, /api/drive/folders

Alias routes now return deprecation headers so clients can migrate safely.

Reliability improvements made during hardening

  • callback URL preservation with querystrings in auth redirects
  • stronger payload validation for live tools
  • improved batch status truthfulness and failure accounting
  • stronger storage-disabled behavior handling
  • server-side token handling for Drive operations
  • reproducible test/docs updates for judges and reviewers

Reproducible testing approach

WazWuz includes a documented judge-friendly sequence in the repo:

  1. install dependencies
  2. set required environment variables
  3. run quality gates in CI order
  4. run local app
  5. run optional e2e smoke with explicit environment guards

This was done intentionally to make external validation straightforward.

What I would improve next

  • deeper end-to-end authenticated journey coverage
  • stronger observability/tracing for route and tool failures
  • managed queue infra option for batch worker orchestration
  • richer model-routing strategy for cost/latency optimization per task class

Closing

WazWuz was built to prove that real-time AI creativity can be productized with strong engineering boundaries. Gemini made the interaction model possible; Google Cloud made it operationally practical.

If you are building for this challenge too, optimize for flow + reliability + explicit contracts, not only model output.