Why it's different
AI scoring on every clip
Viral score, hook strength, caption readability, dead-air risk — computed from the multimodal signals NexoClip already collects (rescore, motion, face presence, words-per-sec). The operator picks the AI's top 3 and ships.
Hook generator, 5 tones
One click, 5 viral title candidates in the streamer's voice. Tone presets: aggressive, Gen Z, corporate, curious, default. Click a candidate to drop it into the title overlay.
Intelligence timeline
Per-second markers under the preview: audio peaks, scene cuts, laughter reactions, chat-heat spikes, face-emotion changes. Click any marker to seek. Spot the moment that goes viral before you watch the clip.
Voice-marker triggers
Streamers say clipea esto (clip the next 30s) or clipeaste eso (clip the previous 60s) as natural verbal bookmarks. Custom phrases per brand kit.
Per-speaker brand kits
Multi-streamer VOD? Speaker diarization routes each clip to the right host's colors, fonts, handles, and captions. Speaker identities persist across VODs via embedding match. The differentiator most clipping tools don't ship.
Local GPU transcription
faster-whisper runs on your own GPU. Stream audio never leaves your machine for transcription — only the LLM caption-generation step calls out to Anthropic.
Auto-publish with undo
Trusted brand kits queue with a scheduled-for + undo window. Untrusted kits land in the inbox grouped by VOD/speaker. Same flow either way.
Native to AI agents
Every action is an MCP tool. Agents can ingest a VOD, score clips, pick winners, and publish — without a browser session. Built for the era where the operator is half-human, half-agent.
The growth loop
The one thing a streamer (or their agency) touches each morning:
-
1 · ingest
Drop VOD · watch Drive · pull from platform
Drag-drop, OBS-to-Drive auto-watch, or Twitch / Kick VOD pull. Single ingest endpoint, three sources.
-
2 · diarize + transcribe
Speakers labeled, words timestamped
pyannote-audio + faster-whisper, both on your GPU. Audio never leaves your machine.
-
3 · detect + score
Multimodal candidate-finding + AI scoring
Voice markers, chat heat, audio peaks, scene cuts → candidate windows. Each candidate gets a viral score, hook strength, and dead-air risk.
-
4 · cut + brand
Vertical clips routed to the right host
ffmpeg cuts each window, smart-crops 9:16 around the active face, applies the resolved speaker's brand kit.
-
5 · hook + variants
Title + caption + hashtags per platform
Claude generates 5 viral-hook titles per tone preset, captions per persona, hashtags per platform. Operator picks one or skips and accepts the AI default.
-
6 · ship
Auto-publish with undo · or manual review
Trusted brand kits queue with a scheduled-for + undo window. Untrusted kits land in the inbox grouped by VOD/speaker. Same flow either way.
Frequently asked
Why call it a "growth engine" instead of a clip editor?
Who is NexoClip for?
Do I need a GPU?
NEXOCLIP_WHISPER_DEVICE=cpu) but is much slower.
Which LLM does NexoClip use?
How do I integrate NexoClip into an agent workflow?
nexoclip mcp serve --token <api-token> to expose
the MCP server over stdio. Claude Code, Cursor, and other MCP clients
will see tools for listing streams, kicking off pipelines, fetching
clips, and managing brand kits. The same tenant token gates the JSON
REST API at /streams, /clips, etc.
What about retention and data privacy?
Is there an API I can call directly?
nexoclip tokens issue --tenant <id> --scope full
and pass it as Authorization: Bearer <token>.
NexoClip · multi-tenant SaaS for VOD-to-clip workflows · llms.txt · API docs · Sign in