This article was created specifically for the purpose of entering the Gemini Live API Developer Challenge hackathon. If you share your own build journey publicly, use the hashtag #GeminiLiveAgentChallenge.
What is WazWuz?
WazWuz is a live, conversation-first creative assistant for image workflows. Instead of treating AI as a one-shot prompt box, WazWuz treats the session as a structured creative process:
- speak naturally to drive edits and variants
- preserve non-destructive version history
- branch, compare, and reset with confidence
- run queue-based batch operations
- export outputs to download or Google Drive
The product goal was simple: combine real-time AI interaction with production-style workflow reliability.
Why this build matters
Most AI demos stop at model output quality. Real products need more:
- explicit route and auth boundaries
- deterministic backend contracts
- reliable state synchronization between UI and APIs
- reproducible testing and deploy gates
- clean handoff paths (download/Drive, version graph, batch status)
This project was designed around those constraints from day one.
System architecture (high level)
WazWuz is a single Next.js codebase with clear layers:
- Client Layer
public marketing/sign-in/legal pages
protected app pages under/app/* - Frontend Layer
Next.js App Router UI
React Query for data synchronization
live studio interaction and tool-driven UX - Backend Layer
typed Next.js route handlers
canonical API surfaces for live tools, uploads, drive, project operations
explicit queue/process split for batch execution - AI + Data Layer
Gemini Live API for low-latency conversational flow
Gemini GenAI for analysis/editing/trait resolution tasks
Firestore for metadata and graph records
Cloud Storage for binary assets
Google Drive API for export delivery - Deployment + Quality
Docker + Cloud Run runtime
CI gates:lint -> typecheck -> test -> build
Core stack
- Frontend: Next.js 16 (App Router), TypeScript, Tailwind
- State and data sync: React Query
- Auth: Auth.js with Google OAuth and magic-link support
- AI: Gemini Live API + Gemini GenAI
- Data: Firestore
- Assets: Google Cloud Storage
- Delivery: Google Drive API
- Runtime: Cloud Run
- Testing: Vitest + smoke/e2e checks
Product flows implemented
1) Live studio workflow
- user opens project studio
- live session connects with ephemeral token flow
- assistant can trigger backend tools (variants, edits, branching, compare metadata, export actions)
- frontend invalidates canonical query keys and updates current state
2) Non-destructive versions
Every important transformation creates a version node. This enabled:
- restore to any prior point
- branch from selected state
- compare before/after versions
- avoid destructive overwrite behavior
3) References + trend resolution
Users can build reference boards and resolve style terms into structured traits, then apply those traits through dedicated trend flows.
4) Batch queue/process model
Batch operations are intentionally split:
/runto enqueue/processto execute worker-style processing
This avoids pretending synchronous request/response is sufficient for heavier jobs.
5) Export and handoff
Exports support:
- preset-based output sizing
- local download
- Google Drive upload and link generation with explicit sharing scope
Gemini integration design
Gemini Live API
Used for conversational, low-latency interaction in studio:
- voice-driven intent capture
- live guidance and iterative creative direction
- tool invocation bridge into backend operations
Gemini GenAI
Used for structured task execution:
- image analysis support
- style/trait interpretation
- prompt-driven transformation calls
The key pattern: conversation determines intent, backend tools enforce stateful execution.
Route security and boundary model
A major hardening step was formalizing public vs protected boundaries:
- Public pages:
/,/signin,/signin/verify,/privacy,/terms - Protected pages: all
/app/* - Public APIs:
/api/auth/[...nextauth],/api/auth/magic-link - Protected APIs: all other
/api/*
Middleware policy and handler-level auth checks are both in place, reducing accidental exposure risk.
API contract quality and canonical routing
To reduce drift, canonical routes were enforced and aliases marked deprecated:
- canonical live tools:
POST /api/live/tools - canonical uploads:
POST /api/uploads - canonical drive:
/api/drive/status,/api/drive/folders
Alias routes now return deprecation headers so clients can migrate safely.
Reliability improvements made during hardening
- callback URL preservation with querystrings in auth redirects
- stronger payload validation for live tools
- improved batch status truthfulness and failure accounting
- stronger storage-disabled behavior handling
- server-side token handling for Drive operations
- reproducible test/docs updates for judges and reviewers
Reproducible testing approach
WazWuz includes a documented judge-friendly sequence in the repo:
- install dependencies
- set required environment variables
- run quality gates in CI order
- run local app
- run optional e2e smoke with explicit environment guards
This was done intentionally to make external validation straightforward.
What I would improve next
- deeper end-to-end authenticated journey coverage
- stronger observability/tracing for route and tool failures
- managed queue infra option for batch worker orchestration
- richer model-routing strategy for cost/latency optimization per task class
Closing
WazWuz was built to prove that real-time AI creativity can be productized with strong engineering boundaries. Gemini made the interaction model possible; Google Cloud made it operationally practical.
If you are building for this challenge too, optimize for flow + reliability + explicit contracts, not only model output.