Video-to-text
Created by Gonzalo Cueva
Video Source & Settings
▼
Upload Video File:
Max 3 minutes, 500MB. Supported formats: MP4, WebM, MKV, AVI
Frame Extraction FPS:
Processing:
Audio
Vision
Audio Transcription
▼
Provider:
Groq (Free, Ultra Fast)
AssemblyAI
Local Whisper
API Key:
Model:
whisper-large-v3
whisper-base
Vision Analysis
▼
Provider:
OpenRouter (Recommended)
Together.ai
Replicate
Local Moondream
API Key:
Model:
Qwen2.5 VL 72B (FREE) ⭐
Gemini 2.0 Flash Exp (FREE)
Llama 4 Maverick (FREE)
Gemma 3 27B (FREE)
Qwen2.5 VL 32B (FREE)
Gemini 2.5 Flash Lite ($0.00002/img)
Claude 3.5 Sonnet (Premium)
Temperature:
Max Tokens:
Top P:
Narrative Generation
▼
Provider:
OpenRouter (Recommended)
Groq (Free, Fast)
Together.ai
Local Ollama
API Key:
Model:
Mistral Small 3.2 24B (FREE) ⭐
Llama 4 Scout 17B (FREE)
Gemma 3 27B (FREE)
Qwen3 70B Instruct (FREE)
Claude 3.5 Sonnet (Premium)
Temperature:
Max Tokens:
Top P:
Start Analysis
Processing
0%
Initializing...
Analysis Results
Video Statistics
Clustering Analysis
Frame Clusters (Detailed)
Audio Transcription
No audio data
Unified Narrative
Generating...