Filtering

Apply motion-based filtering to clips and aesthetic filtering to frames to prune low-quality assets during curation.

How it Works

Filtering runs in two passes that balance speed and quality:

  1. Motion pass (fast): The pipeline decodes lightweight motion vectors and computes motion scores to drop static or near‑static clips at the first filtering stage. This step adds decoded_motion_data per clip, then writes motion_score_global_mean and motion_score_per_patch_min_256. Clips below thresholds move to video.filtered_clips, and video.clip_stats.num_filtered_by_motion increments.
  2. Aesthetic pass (model based): Upstream, the pipeline extracts frames using the sequence policy at a chosen target_fps. The aesthetic stage reads extracted_frames[sequence-<target_fps>], produces an aesthetic_score, and removes clips below the threshold. These clips move to video.filtered_clips, and video.clip_stats.num_filtered_by_aesthetic increments.

Before You Start

Motion decoding and aesthetic scoring operate on clip buffers. You must run clipping and encoding first so each clip has a valid buffer (bytes).


Quickstart

Use the pipeline stages or the example script flags to enable motion and aesthetic filtering.

Filtering Options

Motion Filtering

Motion filtering is a two‑step process: first decode motion vectors, then filter clips based on motion scores.

  1. Add MotionVectorDecodeStage to sample motion vectors from each clip.

    from nemo_curator.stages.video.filtering.motion_filter import MotionVectorDecodeStage
    
    decode = MotionVectorDecodeStage(
        target_fps=2.0,
        target_duration_ratio=0.5,
        num_cpus_per_worker=4.0,
    )

    This step adds decoded_motion_data to each clip, or records an error in clip.errors.

  2. Add MotionFilterStage to compute motion scores and filter out low‑motion clips.

    from nemo_curator.stages.video.filtering.motion_filter import MotionFilterStage
    
    motion = MotionFilterStage(
        score_only=False,
        global_mean_threshold=0.00098,
        per_patch_min_256_threshold=0.000001,
        motion_filter_batch_size=64,
        num_gpus_per_worker=0.5,
        verbose=True,
    )
    • Adds motion_score_global_mean and motion_score_per_patch_min_256 to each clip.
    • Moves filtered clips to video.filtered_clips and increments video.clip_stats.num_filtered_by_motion.

Parameters

Aesthetic Filtering

Aesthetic filtering works best when you prepare frames first, then score clips using a CLIP‑based aesthetic model.

  1. Extract frames earlier in the pipeline. Use a frame extraction stage with a sequence policy and set a target_fps that matches the aesthetic stage. Refer to Frame Extraction for guidance.

    # Example: upstream frame extraction snippet (pseudocode)
    from nemo_curator.stages.video.frame_extraction import FrameExtractionStage
    frames = FrameExtractionStage(policy="sequence", target_fps=1.0)

    Frame Requirements:

    • Use sequence frame extraction policy.
    • Match target_fps here and in the aesthetic stage.
    • Ensure clip.extracted_frames contains frames for the signature sequence-<target_fps>.
  2. Add ClipAestheticFilterStage to score each clip and drop clips below a threshold.

    from nemo_curator.stages.video.filtering.clip_aesthetic_filter import ClipAestheticFilterStage
    
    aesthetic = ClipAestheticFilterStage(
        model_dir="/models",
        score_threshold=3.5,
        reduction="min",  # or "mean"
        target_fps=1.0,
        num_gpus_per_worker=0.25,
        verbose=True,
    )
    • Adds aesthetic_score to each clip.
    • Moves filtered clips to video.filtered_clips and increments video.clip_stats.num_filtered_by_aesthetic.

Parameters

ParameterTypeDefaultDescription
model_dirstr"models/clip_aesthetic"Directory for model weights; downloaded on each node if missing.
score_thresholdfloat0.5Minimum aesthetic score required to keep a clip.
reductionmin"min"Aggregate frame‑level scores using mean or minimum.
target_fpsfloat1.0Frame sampling rate expected to match extracted frames.
num_gpus_per_workerfloat0.25GPUs reserved per worker for aesthetic scoring.
verboseboolFalseLog per‑clip aesthetic scores and decisions.