Person Identification in TV series CVPR 2012

At CeBIT 2013

Among other face analysis, we presented a demonstration of the person identification method at CeBIT 2013. Here is some press coverage. [English] [German]

Paper and Poster

``Knock! Knock! Who is it?'' Probabilistic Person Identification in TV series
Makarand Tapaswi, Martin Baeuml and Rainer Stiefelhagen
IEEE Computer Vision and Pattern Recognition (CVPR Poster), Providence, RI, June 2012
[paper] [poster-1] [poster-2]

Errata: The precision and recall labels on Figure 7 should be flipped.


  • Shift from face tracks to full person tracks to achieve full coverage
  • Automatically learn clothing models using face recognition results
  • Leverage the temporal structure of the episodes
  • Model the problem using a Markov Random Field


Download code release v0.1. Contains the most important codes which will help understand the method. It is not yet a fully reproducible package.

Supplementary material video

Download original supplementary material (avi + README, ~20MB).

Disclaimer: This video clip is presented for academic, non-profit purposes to demonstrate our person identification methods. Copyrights are held by original content creators, producers or country-specific copyright holders.


UPDATE (20.06.2013) We have an updated version of the data set containing face tracks with features and speaker identity assigned to them. Check it out here! They also contain six more videos from Buffy The Vampire Slayer (Season 5, Episodes 1 to 6). The work however does not focus on person tracks, so continue to use them from below.

The Big Bang Theory (Season 1, Episodes 1 to 6). You can buy the season 1 DVD at any store (Amazon US DE). Please note that the following data contains only the annotations and not the actual audio-visual content.

This data has been used in our CVPR 2012 paper, please cite it if you use the data.

PRACTICAL NOTE: The bounding boxes for tracks, timestamps, etc. are obtained from Region 2 DVDs (PAL) for which the video frames have 720x576 resolution with display at 1024x576. The bounding boxes use the latter 1024x576 resolution. The frame rate is 25fps.

  • Video Events videvents.tar.gz
    Contains a list of auto-detected video events: shots, special sequences, title song, credits.
    Format: start_frame, start_time, TYPE, [end_frame,] [end_time]
  • Face Tracks facetracks.tar.gz
    Contains face tracks
    Format: frame_number, timestamp, number_of_tracks, [track_information]
    Track information is declared in the header
  • Person Tracks persontracks.tar.gz
    Contains person tracks
    Format: frame_number, timestamp, number_of_tracks, [track_information]
    Track information is declared in the header
    Note that the tracks here are provided for every 10th consecutive frame since we used them in that way.
  • Speaker Labels speakerid.tar.gz
    Contains speaker identity labels
    Compatible with Praat
    Includes a Matlab-Praat format reader


At a glimpse

A snapshot of all multimedia related projects can be found here. This is the link you will come to from any of our papers too!

Multimedia Resources

I hope to collect links at one place to make it easy to browse the wonderful works.

CV, MM Papers on the Web

Meta-resources for Computer Vision conferences CVPapers, ACM Multimedia MM papers, and Multimedia Information Retrieval papers MIR.