WiSe: Slide Segmentation in the Wild

Monica Haurilet, Alina Roitberg, Manuel Martinez, Rainer Stiefelhagen

Highlights

The first benchmark dataset for page segmentation of slides captured during lectures
Annotations for 1300 complex slides from the ClassX dataset [1]
Fine-grained pixel-wise labels
14 text, 6 image-based and 4 structural classes
Highly overlapping segments i.e. multiple labels per pixel

Description

We address the task of segmenting presentation slides, where the examined page was captured as a live photo during lectures. Slides are important document types used as visual components accompanying presentations in a variety of fields ranging from education to business. However, automatic analysis of presentation slides has not been researched sufficiently, and, so far, only preprocessed images of already digitalized slide documents were considered. We aim to introduce the task of analyzing unconstrained photos of slides taken during lectures and present a novel dataset for Page Segmentation with slides captured in the Wild (WiSe). Our dataset covers pixel-wise annotations of 25 classes on 1300 pages, allowing overlapping regions (i.e., multi-class assignments). To evaluate the performance, we define multiple benchmark metrics and baseline methods for our dataset. We further implement two different deep neural network approaches previously used for segmenting natural images and adopt them for the task. Our evaluation results demonstrate the effectiveness of the deep learning-based methods, surpassing the baseline methods by over 30%. To foster further research of slide analysis in unconstrained photos, we make the WiSe dataset publicly available to the community.

If you use this dataset in your research, please cite:

Paper

WiSe - Slide Segmentation in the Wild
Monica Haurilet, Alina Roitberg, Manuel Martinez, Rainer Stiefelhagen
Winter Conference on Applications of Computer Vision (ICDAR)
[paper]

@inproceedings{haurilet2019wise,
author = {Monica Haurilet and Alina Roitberg and Manuel Martinez and Rainer Stiefelhagen},
{{WiSe - Slide Segmentation in the Wild}},
year = {2019},
booktitle = {International Conference on Document Analysis and Recognition (ICDAR)},
month = {Sep.},
}

Download

The annotations for the 1300 slides are available here.
The slide images are part of the ClassX [1] dataset available at https://exhibits.stanford.edu/data/catalog/sf888mq5505.

Licensing

Labels: The labels are under the creative common license 4.0. These annotations were produced by the tool proposed in [2], which is available on Github https://github.com/nightrome/cocostuff.
Images: All images used in WiSe are part of the public ClassX dataset [1]. For information about the image licenses see https://exhibits.stanford.edu/data/catalog/sf888mq5505.

Contact

If you have any questions regarding this dataset, please contact us at: haurilet (at) kit.edu

References

[1] Large-Scale Query-by-Image Video Retrieval Using Bloom Filters.
A. Araujo, J. Chaves, H. Lakshman, R. Angst, and B. Girod.
arXiv, 1604.07939, 2015.
[2] Coco-stuff: Thing and stuff classes in context.
H. Caesar, J. Uijlings, and V. Ferrari.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.