Video Generator Tag Reference

This document describes all available tags, their parameters, and timing behavior.


Table of Contents

  1. Syntax Rules
  2. Container Tags
  3. Media Tags
  4. Style Tags
  5. Effect Tags
  6. Timing System
  7. Examples

Syntax Rules

Block Form

[tag param="value"]
  content
[/tag]

Self-Closing Form

[tag param="value"/]

Comments

# This is a comment (shell-style)

Parameter Types

  • Strings: param="value" (quotes required)
  • Numbers: param="1.5" or param="100"
  • Booleans: param="true" or param="false"
  • Time: param="2s" or param="500ms"
  • Percentages: param="50%"

Container Tags

These tags define scope and contain other elements.


[video]

The root container for video content. Defines the visual layer.

Syntax

[video file="path/to/video.mp4" size="content" fit="none"]
  ...children...
[/video]

Parameters

Parameter Type Default Description
file string required Path to video file
id string auto Element ID for time anchoring
from time 0 Start position in the video file
size enum "content" Duration mode (see below)
fit enum "none" How to fit video to duration

Size Modes

Value Behavior
content Video duration = duration of children (TTS, etc.)
natural Video plays its full natural length

Fit Modes

Value Behavior
none No adjustment
trim Cut video to fit duration
loop Loop video to fill duration

Examples

Basic looping video:

[video file="background.mp4" size="content" fit="loop"]
  [tts]This narration determines video length[/tts]
[/video]

Start video from 30 seconds in:

[video file="long_video.mp4" from="30s" fit="loop"]
  [tts]This starts from the 30 second mark and loops from there[/tts]
[/video]

[tts]

Text-to-speech block. Generates audio and optional subtitles.

Syntax

[tts engine="kokoro_tts" voice="af_heart"]
  Text to speak with [blue]styled words[/blue].
[/tts]

Parameters

Parameter Type Default Description
engine string config default TTS engine to use
voice string config default Voice ID
id string auto Element ID for time anchoring
subtitles boolean "true" Show subtitles
sub-position enum "center" Vertical position: top, center, bottom
sub-align enum "center" Horizontal alignment: left, center, right
highlight boolean "true" Highlight current word in subtitles

Subtitle Position Grid

7 (top-left)      8 (top-center)      9 (top-right)
4 (middle-left)   5 (middle-center)   6 (middle-right)
1 (bottom-left)   2 (bottom-center)   3 (bottom-right)

Example

[tts engine="kokoro_tts" voice="am_michael" sub-position="bottom" sub-align="center"]
  Hello world! This text will be spoken and displayed.
[/tts]

[music]

Background music track.

Syntax

[music file="song.mp3" fit="loop"]
  ...children...
[/music]

Parameters

Parameter Type Default Description
file string required Path to audio file
id string auto Element ID for time anchoring
from time 0 Start position in the audio file
volume number 1.0 Volume level (0.0 = silent, 1.0 = normal, 2.0 = 2x)
start anchor none When to start on timeline (see Time Anchors)
end anchor none When to end on timeline
fit enum "loop" How to fit audio: none, trim, loop

Examples

Basic looping music:

[music file="background.mp3" fit="loop"]
  [tts]Music plays throughout this narration[/tts]
  [tts]And continues through this one too[/tts]
[/music]

Skip intro and start from chorus (at 45 seconds):

[music file="song.mp3" from="45s" fit="loop"]
  [tts]Music starts at the chorus and loops from there[/tts]
[/music]

Quiet background music at 30% volume:

[music file="ambient.mp3" volume="0.3" fit="loop"]
  [tts]The music is subtle behind the narration[/tts]
[/music]

Media Tags

These tags insert media at specific points.


[image]

Display an image overlay with positioning, sizing, and opacity controls.

Syntax

Self-closing (inline in TTS):

[image file="photo.png" duration="2.0"/]

Wrapping (with children):

[image file="photo.png"]
  [tts]Text spoken while image displays[/tts]
[/image]

Parameters

Parameter Type Default Description
file string required Path to image file
id string auto Element ID
duration time 1.0 Display duration (self-closing only)
position enum "center" Preset position (see below)
x dimension none Custom X position (overrides position)
y dimension none Custom Y position (overrides position)
width dimension none Target width
height dimension none Target height
scale number 1.0 Scale factor (e.g., 0.5 = half size)
fit enum "contain" How image fits target size
opacity number 1.0 Transparency (0.0 = invisible, 1.0 = opaque)

Dimension Values

Dimensions can be specified as:

  • Pixels: "200px" or "200"
  • Percentage: "50%" (relative to frame width/height)

Position Presets

Value Location
center Dead center (default)
top Top center
bottom Bottom center
left Left center
right Right center
top-left Top left corner
top-right Top right corner
bottom-left Bottom left corner
bottom-right Bottom right corner

Fit Modes

Value Behavior
contain Scale to fit within target, preserve aspect ratio (default)
cover Scale to cover target, may crop
fill Stretch to fill target exactly
none Use original size (only scale applies)

Timing Behavior

  • Self-closing inside TTS: Image appears at that word position, displays for duration seconds
  • Wrapping: Image displays for duration of children

Examples

Basic centered image:

[image file="logo.png" duration="3s"/]

Corner watermark at 50% opacity:

[image file="watermark.png" position="bottom-right" scale="0.3" opacity="0.5"/]

Centered with 10% padding on all sides:

[image file="photo.png" width="80%" height="80%" fit="contain"/]

Custom position:

[image file="icon.png" x="100px" y="50%"/]

Cover a region (crops to fill):

[image file="background.jpg" width="50%" height="50%" fit="cover" position="top-left"/]

Inline in TTS:

[tts]
  Look at this [image file="cat.png" duration="3s" position="center" scale="0.8"/] cute cat!
[/tts]

[sound]

Play a sound effect.

Syntax

[sound file="explosion.wav"/]

Parameters

Parameter Type Default Description
file string required Path to audio file
id string auto Element ID

Timing Behavior

  • Inside TTS: Sound plays at that word position in the narration
  • Outside TTS: Sound plays at that point in sequence

Example

[tts]
  The door slammed [sound file="slam.wav"/] shut.
[/tts]

[pause]

Add silence/time without speech. Useful for letting video/music continue after TTS ends.

Syntax

[pause duration="3s"/]

Parameters

Parameter Type Default Description
duration time 1.0 Duration of the pause
id string auto Element ID (for anchoring)

Timing Behavior

  • Advances the timeline by the specified duration
  • Video and music continue playing during the pause
  • No audio is generated (silence)

Examples

Add 3 seconds after narration:

[video file="bg.mp4" fit="loop"]
  [music file="song.mp3" fit="loop"]
    [tts]Here's the content[/tts]
    [pause duration="3s"/]
  [/music]
[/video]

Use as anchor point:

[tts id="narration"]Main content[/tts]
[pause duration="2s" id="outro"/]
[music file="outro.mp3" start="outro:start"/]

Style Tags

Style tags modify subtitle appearance. They only work inside [tts] blocks.


[style]

Generic styling tag with full control.

Syntax

[style color="#FF0000" bold="true" italic="false" underline="false" size="48"]
  styled text
[/style]

Parameters

Parameter Type Default Description
color hex color none Text color (#RRGGBB or #RGB)
bold boolean "false" Bold text
italic boolean "false" Italic text
underline boolean "false" Underlined text
size number none Font size in pixels

Example

[tts]
  This is [style color="#FF6600" bold="true"]orange and bold[/style] text.
[/tts]

Color Shortcuts

Convenience tags for common colors.

[red]

[red]This text is red[/red]

[blue]

[blue]This text is blue[/blue]

[green]

[green]This text is green[/green]

Nesting

Styles can be nested. Inner styles override outer styles:

[tts]
  [blue]Blue text with [red]red word[/red] inside[/blue]
[/tts]

Effect Tags

Effect tags add animated camera movements to video/image content. They can be used in two ways:

  1. Wrapping containers - Wrap other elements (TTS, images) and inherit their duration
  2. Inline within TTS - Positioned by word timing for precise synchronization (see Inline Effect Timing)

[zoom]

Ken Burns style zoom effect. Creates smooth zoom in or zoom out animation.

Syntax

[zoom from="100%" to="120%" easing="ease-out" focus-x="50%" focus-y="50%"]
  ...children...
[/zoom]

Parameters

Parameter Type Default Description
from percentage "100%" Starting zoom level
to percentage "120%" Ending zoom level
easing enum "linear" Animation curve (see below)
focus-x percentage "50%" Horizontal focus point
focus-y percentage "50%" Vertical focus point
id string auto Element ID

Easing Modes

Value Behavior
linear Constant speed (default)
ease-in Start slow, accelerate
ease-out Start fast, decelerate
ease-in-out Slow at both ends

Examples

Subtle zoom in on image:

[image file="photo.png"]
  [zoom from="100%" to="110%"]
    [tts]Look at this beautiful scenery[/tts]
  [/zoom]
[/image]

Zoom out with focus on top-left:

[zoom from="150%" to="100%" focus-x="25%" focus-y="25%" easing="ease-out"]
  [tts]Revealing the full picture[/tts]
[/zoom]

Inline zoom for emphasis:

[tts]
  This is normal, but [zoom from="100%" to="120%"]this part is important[/zoom] and back to normal.
[/tts]

[pan]

Horizontal camera movement. Scales up content 1.5x to provide room for movement.

Syntax

[pan from="left" to="right" easing="linear"]
  ...children...
[/pan]

Parameters

Parameter Type Default Description
from position "0%" Starting horizontal position
to position "100%" Ending horizontal position
easing enum "linear" Animation curve
id string auto Element ID

Position Values

Keyword Percentage
left 0%
center 50%
right 100%

Or use any percentage value like "25%", "75%", etc.

Examples

Pan from left to right:

[pan from="left" to="right"]
  [tts]Panning across the landscape[/tts]
[/pan]

Pan from center to right with easing:

[pan from="center" to="right" easing="ease-in-out"]
  [tts]Following the action[/tts]
[/pan]

[tilt]

Vertical camera movement. Scales up content 1.5x to provide room for movement.

Syntax

[tilt from="top" to="bottom" easing="linear"]
  ...children...
[/tilt]

Parameters

Parameter Type Default Description
from position "0%" Starting vertical position
to position "100%" Ending vertical position
easing enum "linear" Animation curve
id string auto Element ID

Position Values

Keyword Percentage
top 0%
center 50%
bottom 100%

Examples

Tilt from top to bottom:

[tilt from="top" to="bottom"]
  [tts]Scanning down the building[/tts]
[/tilt]

Tilt up with ease-out:

[tilt from="bottom" to="top" easing="ease-out"]
  [tts]Looking up at the sky[/tts]
[/tilt]

[shake]

Camera shake effect for impact, tension, or emphasis. Uses oscillating motion with optional decay.

Syntax

Wrapping form:

[shake intensity="5" frequency="30" decay="true"]
  ...children...
[/shake]

Self-closing form:

[shake intensity="10" frequency="20" duration="0.5s"/]

Parameters

Parameter Type Default Description
intensity number 5.0 Shake magnitude in pixels
frequency number 30.0 Oscillation frequency (Hz)
decay boolean "true" Shake diminishes over time
duration time 0.5s Duration (self-closing only)
id string auto Element ID

Examples

Self-closing shake during impact (inline in TTS):

[tts]
  And then [shake intensity="8" duration="0.3s"/] the explosion hit!
[/tts]

The shake starts exactly when "the" is spoken and lasts 0.3 seconds.

Wrapping shake timed to words (inline in TTS):

[tts]
  Stay calm during [shake intensity="3"]the earthquake[/shake] warning.
[/tts]

The shake spans from when "the" is spoken until "earthquake" finishes.

Wrapping shake around TTS:

[shake intensity="6" frequency="25" decay="true"]
  [tts]The earthquake rumbled beneath us[/tts]
[/shake]

Intense persistent shake:

[shake intensity="12" frequency="40" decay="false"]
  [tts]The machine vibrated violently[/tts]
[/shake]

Nesting Effects

Effects can be combined by nesting them:

[zoom from="100%" to="115%" easing="ease-in"]
  [pan from="left" to="center"]
    [tts]A dramatic reveal with zoom and pan[/tts]
  [/pan]
[/zoom]

Effects apply from inside out - the innermost effect is applied first, then outer effects transform the result.


Timing System

The timing system is content-driven, not timestamp-driven.

Core Principle

Speech drives timing. Media reacts to content, not seconds.

Sequential Processing

Elements inside a container are processed sequentially:

[video file="bg.mp4"]
  [tts]First narration[/tts]      # 0.0s - 2.5s
  [tts]Second narration[/tts]     # 2.5s - 5.0s
  [tts]Third narration[/tts]      # 5.0s - 8.0s
[/video]

Inline Media Timing

Media inside TTS is positioned by word location, not timestamps:

[tts]
  Hello [image file="wave.png" duration="1"/] world!
[/tts]

The image appears when "world" is spoken, calculated from TTS word timings.

Inline Effect Timing

Effects can also be placed inside TTS blocks to get word-level timing:

Self-closing effects (like shake with duration) start at their text position:

[tts]
  And then [shake intensity="5" duration="0.3s"/] the explosion hit!
[/tts]

The shake effect starts exactly when "the" would be spoken (between "then" and "the").

Wrapping effects span the duration of the wrapped text:

[tts]
  Hello world, [zoom from="100%" to="120%"]this is important[/zoom] okay bye
[/tts]

The zoom effect starts when "this" is spoken and ends when "important" finishes.

This allows precise synchronization of camera effects with specific words:

[tts]
  Stay calm during [shake intensity="3"]the earthquake[/shake] warning.
  Now [pan from="left" to="right"]look across the horizon[/pan] slowly.
[/tts]

Time Anchors

Reference other elements' timing using anchors:

element_id:start       # When element starts
element_id:end         # When element ends
element_id:start+2s    # 2 seconds after element starts
element_id:end-0.5s    # 0.5 seconds before element ends

Example

[tts id="narration"]The main content[/tts]
[music file="outro.mp3" start="narration:end-1s"]
  # Music starts 1 second before narration ends
[/music]

Container Duration Modes

size="content" (Default)

Container duration = sum of children durations

[video file="bg.mp4" size="content"]
  [tts]3 seconds of speech[/tts]  # Video is 3 seconds
[/video]

size="natural"

Container uses its natural file duration

[video file="intro.mp4" size="natural"]
  # Video plays full 10 seconds regardless of children
[/video]

Fit Modes

When container duration differs from file duration:

Mode Behavior
none No adjustment (may have black frames or cut off)
trim Cut file to fit duration
loop Repeat file to fill duration

Examples

Basic Video with Narration

[video file="nature.mp4" size="content" fit="loop"]
  [tts engine="kokoro_tts" voice="af_heart"]
    Welcome to our nature documentary.
    Today we explore the forest.
  [/tts]
[/video]

Styled Subtitles

[video file="presentation.mp4"]
  [tts sub-position="bottom" sub-align="center"]
    The answer is [red]42[/red].
    This is [style color="#FFD700" bold="true"]golden[/style] knowledge.
  [/tts]
[/video]

Multiple TTS with Sound Effects

[video file="action.mp4"]
  [tts voice="narrator"]
    The hero approaches the door.
    [sound file="footsteps.wav"/]
  [/tts]

  [tts voice="hero"]
    [sound file="door_knock.wav"/]
    Is anyone there?
  [/tts]
[/video]

Background Music with Narration

[video file="intro.mp4"]
  [music file="ambient.mp3" fit="loop"]
    [tts]
      In the beginning, there was silence.
    [/tts]
    [tts]
      Then came the music.
    [/tts]
  [/music]
[/video]

Image Overlay During Speech

[video file="lecture.mp4"]
  [tts]
    Let me show you a diagram.
    [image file="diagram.png" duration="5"/]
    As you can see, the process flows left to right.
  [/tts]
[/video]

Anchored Timing

[video file="scene.mp4"]
  [tts id="main"]
    The main story content goes here.
    It could be quite long.
  [/tts]

  [music file="credits.mp3" start="main:end-2s" fit="trim"]
    [tts]
      Thanks for watching.
    [/tts]
  [/music]
[/video]

Layer System

The renderer organizes elements into layers:

Layer Elements Behavior
video VideoTag Base visual layer
image ImageTag Overlaid on video
effect ZoomTag, PanTag, TiltTag, ShakeTag Visual effects applied to video/image
primary TTSTag Main audio + subtitles
music MusicTag Background audio
sfx SoundTag Sound effects

Audio layers are mixed together. Video/image layers are composited. Effects are applied as FFmpeg filters.


Not Yet Implemented

The following features from the design spec are not yet implemented:

  • [meta] - Global project settings
  • [scene] - Scene grouping
  • [bg] - Background (use [video] instead)
  • [emotion] - Emotional TTS modifiers
  • [filter] - Visual filters
  • [fade] - Fade transitions
  • [transition] - Scene transitions
  • fade-in / fade-out parameters

Asset Library

Built-in assets you can reference directly by file path. The full library is available in the video and music selectors on the create page.

Tip: right-click any asset in the selector

Right-clicking a video clip or music track gives you quick options to copy its file path (paste straight into your script), copy a direct link to the file, and copy the attribution text if the asset requires credit.

Video Clips

CategoryFile pathDuration
Minecraftvideo/minecraft/BXUA2FncVPI.mp410:30
Subway Surfersvideo/subway_surfers/i0M4ARe9v0Y.mp45:11
Satisfyingvideo/satisfying/JvI-02Q69ms.mp43:00
CS Surfingvideo/cs_surfing/kuPPZCtLX4w.mp44:48

Browse all clips in the video selector on the create page.

Music Tracks

NameFile pathMood
Sneaky Snitch
Kevin MacLeod
music/Sneaky_Snitch.opus
Scheming Weasel (faster version)
Kevin MacLeod
music/Scheming_Weasel_(faster_version).opus
Monkeys Spinning Monkeys
Kevin MacLeod
music/Monkeys_Spinning_Monkeys.opus
Fluffing a Duck
Kevin MacLeod
music/Fluffing_a_Duck.opus

Browse all tracks in the music selector on the create page.

How to use assets

Video: [video file="video/minecraft/BXUA2FncVPI.mp4" size="content" fit="loop"]

Music: [music file="music/Sneaky_Snitch.opus" volume="0.3" fit="loop"]

More from InkSlop