Video Object Masks - Track objects and generate mask output

Video Object Masks

Track objects across a video and return mask data for downstream editing.

Click to upload or drag video here
MP4, WebM, MOV supported

Upload MP4, MOV, or WebM. Duration-priced tools require uploaded files so the server can read video metadata.

Video durationUpload a video to read duration

Credits are calculated from server-read video metadata. Max 600 seconds.

Estimated credits

10 credits per 5 seconds · 120 credits per minute

0

Please sign in to use this tool.

No mask output yet

Upload a video and describe the objects to track.

Object mask output

Track objects through video and return structured mask data.

Video Object Masks is for workflows that need more than a caption. Describe the object you want to track and the tool returns mask-oriented output that can support editing, segmentation, review, or downstream automation.

600s
Max video length
Prompt
Target selection
10+
Credits by duration

Features

1

Prompt-based targets

Describe the object or region to follow, such as a person, vehicle, product, or item in the frame.

2

Mask-oriented output

Review structured outputs and raw payloads for downstream video editing or automation.

3

Longer clip support

Designed for object mask jobs up to ten minutes, with credits scaling by seconds.

Workflow

01

Add a video

Upload a video file so its duration can be read automatically for credit calculation.

02

Describe targets

Write the objects or regions you want the tool to track.

03

Inspect output

Review generated mask data, URLs, and raw payloads from the result panel.

Use cases

Use video object masks for visual workflows that need tracked regions instead of plain captions.

Video segmentationObject trackingEditing masksDataset annotationProduct isolationAutomation pipelines

FAQ

Name the object or region clearly, such as person in red shirt, white car, logo, product box, or foreground subject.

This tool is focused on object mask-style output. The result panel also shows text or raw JSON when the processor returns it.

The current object mask page supports videos up to 600 seconds.

Credits are estimated at 2 credits per second with a minimum of 10 credits.

Track objects in a video.

Describe the target object and generate mask-oriented output for your clip.

Generate object masks