Sleep Story AI System

Project Overview

Sleep Story AI System is a fully automated content pipeline that takes a single row in a Google Sheet — a topic and a target length — and produces a finished, ready-to-upload YouTube sleep-story video. No manual editing, no scripting, no asset stitching.

The orchestrator runs in n8n and chains together six workflows: a main controller plus five purpose-built sub-workflows for image generation, voice synthesis, per-scene clip assembly, clip combination, and final ffmpeg rendering via the Rendi.dev API.

Key Features

Sheet-Driven Input: Topic + duration in a Google Sheet kicks off the entire pipeline
AI Scripting: Google Gemini (via LangChain Agent) writes the full sleep story and splits it into scenes
Per-Scene Assets: Each scene gets its own AI-generated image + AI voice narration
Automated Stitching: ffmpeg via Rendi.dev combines images + voice into clips, then merges clips into one video
Rate-Limit-Aware: Batching, 20s throttle, 15s polling, 8× retry — handles Rendi's 4/min free-tier cap without dropping runs
Full Traceability: Every asset URL and status is written back to the source Google Sheet

Technologies Used

n8n: Workflow orchestration across 6 chained workflows (32 nodes in the main controller)
Google Gemini + LangChain: Script generation with structured output parsing
Google Sheets: Input row, output asset tracking
Rendi.dev: ffmpeg-as-a-service for clip rendering and final video combination
HTTP image + voice APIs: Per-scene visual and audio generation
Docker: Containerised n8n instance

Architecture

The system is split into one orchestrator and five sub-workflows so each step is independently testable and reusable:

Sleep Story AI System (32 nodes — manual trigger + Google Sheets)
        ├── Create Image      (image gen per scene)
        ├── Create Voice      (voice synthesis per scene, with retry)
        ├── Create Clips      (image + voice → per-scene clip)
        ├── Combine Clips     (all scene clips → one video)
        └── Videofy with Rendi (ffmpeg via Rendi.dev, rate-limited)

A 15-scene job runs end-to-end in about 12–15 minutes.

Outcome

The system replaces what used to be hours of script-writing, image-sourcing, voice-recording, and video editing with a single sheet entry and a click. Designed to be left running in the background while other work happens elsewhere.