Long-Term Visual Object Tracking Benchmark

We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for visual object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking. We benchmark the dataset on 17 state of the art trackers and rank them according to tracking accuracy and run time speeds. To the best of our knowledge, TLP benchmark is the first large-scale evaluation of the state of the art trackers, focusing on long duration aspect and makes a strong case for much needed research efforts in this direction.

Read paper


TLP V2 and TinyTLP V2 released!

Please visit this page for more information.

TLP dataset consists of 50 long HD sequences (total 676,431 frames). Each sequence consists of a single object to be tracked, marked in the first frame. TinyTLP is a challenging high-resolution short-term dataset for visual tracking, derived from TLP. It consists of first 600 frames (20 sec) of each sequence of the TLP dataset. The length of 20 sec is chosen to align with the average per sequence length of OTB dataset. We propose this TinyTLP dataset to compare and highlight the challenges incurred in long-term tracking.

Annotation Format
Per frame bounding box annotations are provided for the target object in each sequence in the following format:
[frameID, xmin, ymin, width, height, isLost]

frameID - Frame that this annotation represents.
xmin - Top left x-coordinate of the bounding box.
ymin - Top left y-coordinate of the bounding box.
width - Width of annotation box.
height - Height of annotation box.
isLost - If 1, the target object is not visible at all; else 0.

Directory Structure
For easy integration of trackers, each sequence has the same directory structure as that of OTB. To be clear, a sequence, say CarChase1, would have the following directory structure when downloaded from the link below and unzipped:

       ├── groundtruth_rect.txt
       └── img/
            ├── 00001.jpg
            ├── 00002.jpg
            ├── 00003.jpg
            ├── 00004.jpg
Annotation format and directory structure of TinyTLP sequences are exactly the same as that of TLP dataset.

Tracking Results
Results of all the 17 evaluated trackers can be downloaded from the following link. Each tracker directory contains results in the form of 50 space-seperated text files with format [xmin ymin width height], corresponding to each sequence of TLP dataset.


If you use any of our datasets or find our work useful in your research, please cite:

Moudgil, Abhinav, and Vineet Gandhi. "Long-Term Visual Object Tracking Benchmark." arXiv preprint arXiv:1712.01358 (2017).

  title={Long-Term Visual Object Tracking Benchmark},
  author={Moudgil, Abhinav and Gandhi, Vineet},
  journal={arXiv preprint arXiv:1712.01358},


For more information or help, please get in touch with us via email.

Abhinav Moudgil

MS by Research, IIIT Hyderabad


Vineet Gandhi

Assistant Professor, IIIT Hyderabad


Creative Commons Licence
All datasets and benchmark on this page are copyright by authors and published under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Users can share this work only if they (1) cite this work in the manner specified by authors, (2) do not use this work for any commercial purposes, and (3) distribute any additions, transformations or changes to this work under the same license.