This document describes all available tags, their parameters, and timing behavior.
[tag param="value"]
content
[/tag][tag param="value"/]# This is a comment (shell-style)param="value" (quotes required)param="1.5" or param="100"param="true" or param="false"param="2s" or param="500ms"param="50%"These tags define scope and contain other elements.
The root container for video content. Defines the visual layer.
[video file="path/to/video.mp4" size="content" fit="none"]
...children...
[/video]| Parameter | Type | Default | Description |
|---|---|---|---|
file |
string | required | Path to video file |
id |
string | auto | Element ID for time anchoring |
from |
time | 0 |
Start position in the video file |
size |
enum | "content" |
Duration mode (see below) |
fit |
enum | "none" |
How to fit video to duration |
| Value | Behavior |
|---|---|
content |
Video duration = duration of children (TTS, etc.) |
natural |
Video plays its full natural length |
| Value | Behavior |
|---|---|
none |
No adjustment |
trim |
Cut video to fit duration |
loop |
Loop video to fill duration |
Basic looping video:
[video file="background.mp4" size="content" fit="loop"]
[tts]This narration determines video length[/tts]
[/video]Start video from 30 seconds in:
[video file="long_video.mp4" from="30s" fit="loop"]
[tts]This starts from the 30 second mark and loops from there[/tts]
[/video]Text-to-speech block. Generates audio and optional subtitles.
[tts engine="kokoro_tts" voice="af_heart"]
Text to speak with [blue]styled words[/blue].
[/tts]| Parameter | Type | Default | Description |
|---|---|---|---|
engine |
string | config default | TTS engine to use |
voice |
string | config default | Voice ID |
id |
string | auto | Element ID for time anchoring |
subtitles |
boolean | "true" |
Show subtitles |
sub-position |
enum | "center" |
Vertical position: top, center, bottom |
sub-align |
enum | "center" |
Horizontal alignment: left, center, right |
highlight |
boolean | "true" |
Highlight current word in subtitles |
7 (top-left) 8 (top-center) 9 (top-right)
4 (middle-left) 5 (middle-center) 6 (middle-right)
1 (bottom-left) 2 (bottom-center) 3 (bottom-right)[tts engine="kokoro_tts" voice="am_michael" sub-position="bottom" sub-align="center"]
Hello world! This text will be spoken and displayed.
[/tts]Background music track.
[music file="song.mp3" fit="loop"]
...children...
[/music]| Parameter | Type | Default | Description |
|---|---|---|---|
file |
string | required | Path to audio file |
id |
string | auto | Element ID for time anchoring |
from |
time | 0 |
Start position in the audio file |
volume |
number | 1.0 |
Volume level (0.0 = silent, 1.0 = normal, 2.0 = 2x) |
start |
anchor | none | When to start on timeline (see Time Anchors) |
end |
anchor | none | When to end on timeline |
fit |
enum | "loop" |
How to fit audio: none, trim, loop |
Basic looping music:
[music file="background.mp3" fit="loop"]
[tts]Music plays throughout this narration[/tts]
[tts]And continues through this one too[/tts]
[/music]Skip intro and start from chorus (at 45 seconds):
[music file="song.mp3" from="45s" fit="loop"]
[tts]Music starts at the chorus and loops from there[/tts]
[/music]Quiet background music at 30% volume:
[music file="ambient.mp3" volume="0.3" fit="loop"]
[tts]The music is subtle behind the narration[/tts]
[/music]These tags insert media at specific points.
Display an image overlay with positioning, sizing, and opacity controls.
Self-closing (inline in TTS):
[image file="photo.png" duration="2.0"/]Wrapping (with children):
[image file="photo.png"]
[tts]Text spoken while image displays[/tts]
[/image]| Parameter | Type | Default | Description |
|---|---|---|---|
file |
string | required | Path to image file |
id |
string | auto | Element ID |
duration |
time | 1.0 |
Display duration (self-closing only) |
position |
enum | "center" |
Preset position (see below) |
x |
dimension | none | Custom X position (overrides position) |
y |
dimension | none | Custom Y position (overrides position) |
width |
dimension | none | Target width |
height |
dimension | none | Target height |
scale |
number | 1.0 |
Scale factor (e.g., 0.5 = half size) |
fit |
enum | "contain" |
How image fits target size |
opacity |
number | 1.0 |
Transparency (0.0 = invisible, 1.0 = opaque) |
Dimensions can be specified as:
"200px" or "200""50%" (relative to frame width/height)| Value | Location |
|---|---|
center |
Dead center (default) |
top |
Top center |
bottom |
Bottom center |
left |
Left center |
right |
Right center |
top-left |
Top left corner |
top-right |
Top right corner |
bottom-left |
Bottom left corner |
bottom-right |
Bottom right corner |
| Value | Behavior |
|---|---|
contain |
Scale to fit within target, preserve aspect ratio (default) |
cover |
Scale to cover target, may crop |
fill |
Stretch to fill target exactly |
none |
Use original size (only scale applies) |
duration secondsBasic centered image:
[image file="logo.png" duration="3s"/]Corner watermark at 50% opacity:
[image file="watermark.png" position="bottom-right" scale="0.3" opacity="0.5"/]Centered with 10% padding on all sides:
[image file="photo.png" width="80%" height="80%" fit="contain"/]Custom position:
[image file="icon.png" x="100px" y="50%"/]Cover a region (crops to fill):
[image file="background.jpg" width="50%" height="50%" fit="cover" position="top-left"/]Inline in TTS:
[tts]
Look at this [image file="cat.png" duration="3s" position="center" scale="0.8"/] cute cat!
[/tts]Play a sound effect.
[sound file="explosion.wav"/]| Parameter | Type | Default | Description |
|---|---|---|---|
file |
string | required | Path to audio file |
id |
string | auto | Element ID |
[tts]
The door slammed [sound file="slam.wav"/] shut.
[/tts]Add silence/time without speech. Useful for letting video/music continue after TTS ends.
[pause duration="3s"/]| Parameter | Type | Default | Description |
|---|---|---|---|
duration |
time | 1.0 |
Duration of the pause |
id |
string | auto | Element ID (for anchoring) |
Add 3 seconds after narration:
[video file="bg.mp4" fit="loop"]
[music file="song.mp3" fit="loop"]
[tts]Here's the content[/tts]
[pause duration="3s"/]
[/music]
[/video]Use as anchor point:
[tts id="narration"]Main content[/tts]
[pause duration="2s" id="outro"/]
[music file="outro.mp3" start="outro:start"/]Style tags modify subtitle appearance. They only work inside [tts] blocks.
Generic styling tag with full control.
[style color="#FF0000" bold="true" italic="false" underline="false" size="48"]
styled text
[/style]| Parameter | Type | Default | Description |
|---|---|---|---|
color |
hex color | none | Text color (#RRGGBB or #RGB) |
bold |
boolean | "false" |
Bold text |
italic |
boolean | "false" |
Italic text |
underline |
boolean | "false" |
Underlined text |
size |
number | none | Font size in pixels |
[tts]
This is [style color="#FF6600" bold="true"]orange and bold[/style] text.
[/tts]Convenience tags for common colors.
[red]This text is red[/red][blue]This text is blue[/blue][green]This text is green[/green]Styles can be nested. Inner styles override outer styles:
[tts]
[blue]Blue text with [red]red word[/red] inside[/blue]
[/tts]Effect tags add animated camera movements to video/image content. They can be used in two ways:
Ken Burns style zoom effect. Creates smooth zoom in or zoom out animation.
[zoom from="100%" to="120%" easing="ease-out" focus-x="50%" focus-y="50%"]
...children...
[/zoom]| Parameter | Type | Default | Description |
|---|---|---|---|
from |
percentage | "100%" |
Starting zoom level |
to |
percentage | "120%" |
Ending zoom level |
easing |
enum | "linear" |
Animation curve (see below) |
focus-x |
percentage | "50%" |
Horizontal focus point |
focus-y |
percentage | "50%" |
Vertical focus point |
id |
string | auto | Element ID |
| Value | Behavior |
|---|---|
linear |
Constant speed (default) |
ease-in |
Start slow, accelerate |
ease-out |
Start fast, decelerate |
ease-in-out |
Slow at both ends |
Subtle zoom in on image:
[image file="photo.png"]
[zoom from="100%" to="110%"]
[tts]Look at this beautiful scenery[/tts]
[/zoom]
[/image]Zoom out with focus on top-left:
[zoom from="150%" to="100%" focus-x="25%" focus-y="25%" easing="ease-out"]
[tts]Revealing the full picture[/tts]
[/zoom]Inline zoom for emphasis:
[tts]
This is normal, but [zoom from="100%" to="120%"]this part is important[/zoom] and back to normal.
[/tts]Horizontal camera movement. Scales up content 1.5x to provide room for movement.
[pan from="left" to="right" easing="linear"]
...children...
[/pan]| Parameter | Type | Default | Description |
|---|---|---|---|
from |
position | "0%" |
Starting horizontal position |
to |
position | "100%" |
Ending horizontal position |
easing |
enum | "linear" |
Animation curve |
id |
string | auto | Element ID |
| Keyword | Percentage |
|---|---|
left |
0% |
center |
50% |
right |
100% |
Or use any percentage value like "25%", "75%", etc.
Pan from left to right:
[pan from="left" to="right"]
[tts]Panning across the landscape[/tts]
[/pan]Pan from center to right with easing:
[pan from="center" to="right" easing="ease-in-out"]
[tts]Following the action[/tts]
[/pan]Vertical camera movement. Scales up content 1.5x to provide room for movement.
[tilt from="top" to="bottom" easing="linear"]
...children...
[/tilt]| Parameter | Type | Default | Description |
|---|---|---|---|
from |
position | "0%" |
Starting vertical position |
to |
position | "100%" |
Ending vertical position |
easing |
enum | "linear" |
Animation curve |
id |
string | auto | Element ID |
| Keyword | Percentage |
|---|---|
top |
0% |
center |
50% |
bottom |
100% |
Tilt from top to bottom:
[tilt from="top" to="bottom"]
[tts]Scanning down the building[/tts]
[/tilt]Tilt up with ease-out:
[tilt from="bottom" to="top" easing="ease-out"]
[tts]Looking up at the sky[/tts]
[/tilt]Camera shake effect for impact, tension, or emphasis. Uses oscillating motion with optional decay.
Wrapping form:
[shake intensity="5" frequency="30" decay="true"]
...children...
[/shake]Self-closing form:
[shake intensity="10" frequency="20" duration="0.5s"/]| Parameter | Type | Default | Description |
|---|---|---|---|
intensity |
number | 5.0 |
Shake magnitude in pixels |
frequency |
number | 30.0 |
Oscillation frequency (Hz) |
decay |
boolean | "true" |
Shake diminishes over time |
duration |
time | 0.5s |
Duration (self-closing only) |
id |
string | auto | Element ID |
Self-closing shake during impact (inline in TTS):
[tts]
And then [shake intensity="8" duration="0.3s"/] the explosion hit!
[/tts]The shake starts exactly when "the" is spoken and lasts 0.3 seconds.
Wrapping shake timed to words (inline in TTS):
[tts]
Stay calm during [shake intensity="3"]the earthquake[/shake] warning.
[/tts]The shake spans from when "the" is spoken until "earthquake" finishes.
Wrapping shake around TTS:
[shake intensity="6" frequency="25" decay="true"]
[tts]The earthquake rumbled beneath us[/tts]
[/shake]Intense persistent shake:
[shake intensity="12" frequency="40" decay="false"]
[tts]The machine vibrated violently[/tts]
[/shake]Effects can be combined by nesting them:
[zoom from="100%" to="115%" easing="ease-in"]
[pan from="left" to="center"]
[tts]A dramatic reveal with zoom and pan[/tts]
[/pan]
[/zoom]Effects apply from inside out - the innermost effect is applied first, then outer effects transform the result.
The timing system is content-driven, not timestamp-driven.
Speech drives timing. Media reacts to content, not seconds.
Elements inside a container are processed sequentially:
[video file="bg.mp4"]
[tts]First narration[/tts] # 0.0s - 2.5s
[tts]Second narration[/tts] # 2.5s - 5.0s
[tts]Third narration[/tts] # 5.0s - 8.0s
[/video]Media inside TTS is positioned by word location, not timestamps:
[tts]
Hello [image file="wave.png" duration="1"/] world!
[/tts]The image appears when "world" is spoken, calculated from TTS word timings.
Effects can also be placed inside TTS blocks to get word-level timing:
Self-closing effects (like shake with duration) start at their text position:
[tts]
And then [shake intensity="5" duration="0.3s"/] the explosion hit!
[/tts]The shake effect starts exactly when "the" would be spoken (between "then" and "the").
Wrapping effects span the duration of the wrapped text:
[tts]
Hello world, [zoom from="100%" to="120%"]this is important[/zoom] okay bye
[/tts]The zoom effect starts when "this" is spoken and ends when "important" finishes.
This allows precise synchronization of camera effects with specific words:
[tts]
Stay calm during [shake intensity="3"]the earthquake[/shake] warning.
Now [pan from="left" to="right"]look across the horizon[/pan] slowly.
[/tts]Reference other elements' timing using anchors:
element_id:start # When element starts
element_id:end # When element ends
element_id:start+2s # 2 seconds after element starts
element_id:end-0.5s # 0.5 seconds before element ends[tts id="narration"]The main content[/tts]
[music file="outro.mp3" start="narration:end-1s"]
# Music starts 1 second before narration ends
[/music]Container duration = sum of children durations
[video file="bg.mp4" size="content"]
[tts]3 seconds of speech[/tts] # Video is 3 seconds
[/video]Container uses its natural file duration
[video file="intro.mp4" size="natural"]
# Video plays full 10 seconds regardless of children
[/video]When container duration differs from file duration:
| Mode | Behavior |
|---|---|
none |
No adjustment (may have black frames or cut off) |
trim |
Cut file to fit duration |
loop |
Repeat file to fill duration |
[video file="nature.mp4" size="content" fit="loop"]
[tts engine="kokoro_tts" voice="af_heart"]
Welcome to our nature documentary.
Today we explore the forest.
[/tts]
[/video][video file="presentation.mp4"]
[tts sub-position="bottom" sub-align="center"]
The answer is [red]42[/red].
This is [style color="#FFD700" bold="true"]golden[/style] knowledge.
[/tts]
[/video][video file="action.mp4"]
[tts voice="narrator"]
The hero approaches the door.
[sound file="footsteps.wav"/]
[/tts]
[tts voice="hero"]
[sound file="door_knock.wav"/]
Is anyone there?
[/tts]
[/video][video file="intro.mp4"]
[music file="ambient.mp3" fit="loop"]
[tts]
In the beginning, there was silence.
[/tts]
[tts]
Then came the music.
[/tts]
[/music]
[/video][video file="lecture.mp4"]
[tts]
Let me show you a diagram.
[image file="diagram.png" duration="5"/]
As you can see, the process flows left to right.
[/tts]
[/video][video file="scene.mp4"]
[tts id="main"]
The main story content goes here.
It could be quite long.
[/tts]
[music file="credits.mp3" start="main:end-2s" fit="trim"]
[tts]
Thanks for watching.
[/tts]
[/music]
[/video]The renderer organizes elements into layers:
| Layer | Elements | Behavior |
|---|---|---|
video |
VideoTag | Base visual layer |
image |
ImageTag | Overlaid on video |
effect |
ZoomTag, PanTag, TiltTag, ShakeTag | Visual effects applied to video/image |
primary |
TTSTag | Main audio + subtitles |
music |
MusicTag | Background audio |
sfx |
SoundTag | Sound effects |
Audio layers are mixed together. Video/image layers are composited. Effects are applied as FFmpeg filters.
The following features from the design spec are not yet implemented:
[meta] - Global project settings[scene] - Scene grouping[bg] - Background (use [video] instead)[emotion] - Emotional TTS modifiers[filter] - Visual filters[fade] - Fade transitions[transition] - Scene transitionsfade-in / fade-out parametersBuilt-in assets you can reference directly by file path. The full library is available in the video and music selectors on the create page.
Tip: right-click any asset in the selector
Right-clicking a video clip or music track gives you quick options to copy its file path (paste straight into your script), copy a direct link to the file, and copy the attribution text if the asset requires credit.
| Category | File path | Duration |
|---|---|---|
| Minecraft | video/minecraft/BXUA2FncVPI.mp4 | 10:30 |
| Subway Surfers | video/subway_surfers/i0M4ARe9v0Y.mp4 | 5:11 |
| Satisfying | video/satisfying/JvI-02Q69ms.mp4 | 3:00 |
| CS Surfing | video/cs_surfing/kuPPZCtLX4w.mp4 | 4:48 |
Browse all clips in the video selector on the create page.
| Name | File path | Mood |
|---|---|---|
Sneaky Snitch Kevin MacLeod | music/Sneaky_Snitch.opus | |
Scheming Weasel (faster version) Kevin MacLeod | music/Scheming_Weasel_(faster_version).opus | |
Monkeys Spinning Monkeys Kevin MacLeod | music/Monkeys_Spinning_Monkeys.opus | |
Fluffing a Duck Kevin MacLeod | music/Fluffing_a_Duck.opus |
Browse all tracks in the music selector on the create page.
How to use assets
Video: [video file="video/minecraft/BXUA2FncVPI.mp4" size="content" fit="loop"]
Music: [music file="music/Sneaky_Snitch.opus" volume="0.3" fit="loop"]
More from InkSlop