Datasets documentation

Load video data

You are viewing v3.2.0 version. A newer version v3.3.2 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Load video data

Video support is experimental and is subject to change.

Video datasets have Video type columns, which contain decord objects.

To work with video datasets, you need to have the vision dependency installed. Check out the installation guide to learn how to install it.

When you load an video dataset and call the video column, the videos are decoded as decord Videos:

>>> from datasets import load_dataset, Video

>>> dataset = load_dataset("path/to/video/folder", split="train")
>>> dataset[0]["video"]
<decord.video_reader.VideoReader at 0x1652284c0>

Index into an video dataset using the row index first and then the video column - dataset[0]["video"] - to avoid reading all the video objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.

For a guide on how to load any type of dataset, take a look at the general loading guide.

Read frames

Access frames directly from a video using the VideoReader:

>>> dataset[0]["video"][0].shape  # first frame
(240, 320, 3)

To get multiple frames at once, use get_batch. This is the efficient way to obtain a long list of frames:

>>> frames = dataset[0]["video"].get_batch([1, 3, 5, 7, 9])
>>> frames.shape
(5, 240, 320, 3)

Local files

You can load a dataset from the video path. Use the cast_column() function to accept a column of video file paths, and decode it into a decord video with the Video feature:

>>> from datasets import Dataset, Video

>>> dataset = Dataset.from_dict({"video": ["path/to/video_1", "path/to/video_2", ..., "path/to/video_n"]}).cast_column("video", Video())
>>> dataset[0]["video"]
<decord.video_reader.VideoReader at 0x1657d0280>

If you only want to load the underlying path to the video dataset without decoding the video object, set decode=False in the Video feature:

>>> dataset = dataset.cast_column("video", Video(decode=False))
>>> dataset[0]["video"]
{'bytes': None,
 'path': 'path/to/video/folder/video0.mp4'}


You can also load a dataset with an VideoFolder dataset builder which does not require writing a custom dataloader. This makes VideoFolder ideal for quickly creating and loading video datasets with several thousand videos for different vision tasks. Your video dataset structure should look like this:



Load your dataset by specifying videofolder and the directory of your dataset in data_dir:

>>> from datasets import load_dataset

>>> dataset = load_dataset("videofolder", data_dir="/path/to/folder")
>>> dataset["train"][0]
{"video": <decord.video_reader.VideoReader at 0x161715e50>, "label": 0}

>>> dataset["train"][-1]
{"video": <decord.video_reader.VideoReader at 0x16170bd90>, "label": 1}

Load remote datasets from their URLs with the data_files parameter:

>>> dataset = load_dataset("videofolder", data_files="", split="train")

Some datasets have a metadata file (metadata.csv/metadata.jsonl) associated with it, containing other information about the data like bounding boxes, text captions, and labels. The metadata is automatically loaded when you call load_dataset() and specify videofolder.

To ignore the information in the metadata file, set drop_labels=False in load_dataset(), and allow VideoFolder to automatically infer the label name from the directory name:

>>> from datasets import load_dataset

>>> dataset = load_dataset("videofolder", data_dir="/path/to/folder", drop_labels=False)

For more information about creating your own VideoFolder dataset, take a look at the Create a video dataset guide.


The WebDataset format is based on a folder of TAR archives and is suitable for big video datasets. Because of their size, WebDatasets are generally loaded in streaming mode (using streaming=True).

You can load a WebDataset like this:

>>> from datasets import load_dataset

>>> dataset = load_dataset("webdataset", data_dir="/path/to/folder", streaming=True)
< > Update on GitHub