2009
|
 | Cees G M Snoek, Koen E A van de Sande, Ork de Rooij, Bouke Huurnink, Jasper R R Uijlings, Michiel van Liempt, Miguel Bugalho, Isabel Trancoso, Fei Yan, Muhammad A Tahir, Krystian Mikolajczyk, Josef Kittler, Maarten de Rijke, Jan-Mark Geusebroek, Theo Gevers, Marcel Worring, Dennis C Koelma, Arnold W M Smeulders: The MediaMill TRECVID 2009 Semantic Video Search Engine. In: TRECVID, Gaithersburg, USA, 2009. @inproceedings{SnoekTRECVID09,
title = {The MediaMill TRECVID 2009 Semantic Video Search Engine},
author = {Cees G M Snoek and Koen E A van de Sande and Ork de Rooij and Bouke Huurnink and Jasper R R Uijlings and Michiel van Liempt and Miguel Bugalho and Isabel Trancoso and Fei Yan and Muhammad A Tahir and Krystian Mikolajczyk and Josef Kittler and Maarten de Rijke and Jan-Mark Geusebroek and Theo Gevers and Marcel Worring and Dennis C Koelma and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/mediamill-TRECVID2009-final.pdf},
year = {2009},
date = {2009-11-01},
booktitle = {TRECVID},
address = {Gaithersburg, USA},
abstract = {In this paper we describe our TRECVID 2009 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. The starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised learning. We improve upon this baseline system by exploring two novel research directions. Firstly, we study a multi-modal extension by including 20 audio concepts and fusion using two novel multi-kernel supervised learning methods. Secondly, with the help of recently proposed algorithmic refinements of bag-of-word representations, a GPU implementation, and compute clusters, we scale-up the amount of visual information analyzed by an order of magnitude, to a total of 1,000,000 i-frames. Our experiments evaluate the merit of these new components, ultimately leading to 64 robust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justifies the need to rely on as many auxiliary information channels as possible. For automatic search we therefore explore how we can learn to rank various information channels simultaneously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and relevance feedback mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2009 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for both concept detection and interactive search. Again a lot has been learned during this year's TRECVID campaign; we highlight the most important lessons at the end of this paper.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we describe our TRECVID 2009 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. The starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised learning. We improve upon this baseline system by exploring two novel research directions. Firstly, we study a multi-modal extension by including 20 audio concepts and fusion using two novel multi-kernel supervised learning methods. Secondly, with the help of recently proposed algorithmic refinements of bag-of-word representations, a GPU implementation, and compute clusters, we scale-up the amount of visual information analyzed by an order of magnitude, to a total of 1,000,000 i-frames. Our experiments evaluate the merit of these new components, ultimately leading to 64 robust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justifies the need to rely on as many auxiliary information channels as possible. For automatic search we therefore explore how we can learn to rank various information channels simultaneously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and relevance feedback mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2009 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for both concept detection and interactive search. Again a lot has been learned during this year's TRECVID campaign; we highlight the most important lessons at the end of this paper. |
 | Xirong Li, Cees G M Snoek: Visual Categorization with Negative Examples for Free. In: MM, Beijing, China, 2009. @inproceedings{LiACM09,
title = {Visual Categorization with Negative Examples for Free},
author = {Xirong Li and Cees G M Snoek},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/li-negative-for-free-acm2009.pdf},
year = {2009},
date = {2009-10-01},
booktitle = {MM},
address = {Beijing, China},
abstract = {Automatic visual categorization is critically dependent on labeled examples for supervised learning. As an alternative to traditional expert labeling, social-tagged multimedia is becoming a novel yet subjective and inaccurate source of learning examples. Different from existing work focusing on collecting positive examples, we study in this paper the potential of substituting social tagging for expert labeling for creating negative examples. We present an empirical study using 6.5 million Flickr photos as a source of social tagging. Our experiments on the PASCAL VOC challenge 2008 show that with a relative loss of only 4.3% in terms of mean average precision, expert-labeled negative examples can be completely replaced by social-tagged negative examples for consumer photo categorization.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Automatic visual categorization is critically dependent on labeled examples for supervised learning. As an alternative to traditional expert labeling, social-tagged multimedia is becoming a novel yet subjective and inaccurate source of learning examples. Different from existing work focusing on collecting positive examples, we study in this paper the potential of substituting social tagging for expert labeling for creating negative examples. We present an empirical study using 6.5 million Flickr photos as a source of social tagging. Our experiments on the PASCAL VOC challenge 2008 show that with a relative loss of only 4.3% in terms of mean average precision, expert-labeled negative examples can be completely replaced by social-tagged negative examples for consumer photo categorization. |
 | Ork de Rooij, Cees G M Snoek, Marcel Worring: MediaMill: Guiding the User to Results using the ForkBrowser. In: CIVR, Santorini, Greece, 2009. @inproceedings{RooijCIVR09,
title = {MediaMill: Guiding the User to Results using the ForkBrowser},
author = {Ork de Rooij and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/rooij-guiding-forkbrowser-civr2009.pdf},
year = {2009},
date = {2009-07-01},
booktitle = {CIVR},
address = {Santorini, Greece},
abstract = {In this technical demonstration we showcase the MediaMill Semantic Video Search Engine. It allows usage of multiple query methods embedded into a single browsing environment while guiding the user to better results by using a novel active learning strategy. This allows for fast and effective search trough large video collections.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this technical demonstration we showcase the MediaMill Semantic Video Search Engine. It allows usage of multiple query methods embedded into a single browsing environment while guiding the user to better results by using a novel active learning strategy. This allows for fast and effective search trough large video collections. |
 | Arjan T Setz, Cees G M Snoek: Can Social Tagged Images Aid Concept-Based Video Search?. In: ICME, pp. 1460–1463, New York, NY, USA, 2009. @inproceedings{SetzICME09,
title = {Can Social Tagged Images Aid Concept-Based Video Search?},
author = {Arjan T Setz and Cees G M Snoek},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/setz-social-tags-icme2009.pdf},
year = {2009},
date = {2009-06-01},
booktitle = {ICME},
pages = {1460--1463},
address = {New York, NY, USA},
abstract = {This paper seeks to unravel whether commonly available social tagged images can be exploited as a training resource for concept-based video search. Since social tags are known to be ambiguous, overly personalized, and often error prone, we place special emphasis on the role of disambiguation. We present a systematic experimental study that evaluates concept detectors based on social tagged images, and their disambiguated versions, in three application scenarios: within-domain, cross-domain, and together with an interacting user. The results indicate that social tagged images can aid concept-based video search indeed, especially after disambiguation and when used in an interactive video retrieval setting. These results open-up interesting avenues for future research.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper seeks to unravel whether commonly available social tagged images can be exploited as a training resource for concept-based video search. Since social tags are known to be ambiguous, overly personalized, and often error prone, we place special emphasis on the role of disambiguation. We present a systematic experimental study that evaluates concept detectors based on social tagged images, and their disambiguated versions, in three application scenarios: within-domain, cross-domain, and together with an interacting user. The results indicate that social tagged images can aid concept-based video search indeed, especially after disambiguation and when used in an interactive video retrieval setting. These results open-up interesting avenues for future research. |
 | Xirong Li, Cees G M Snoek, Marcel Worring: Annotating Images by Harnessing Worldwide User-Tagged Photos. In: ICASSP, Taipei, Taiwan, 2009. @inproceedings{LiICASSP09,
title = {Annotating Images by Harnessing Worldwide User-Tagged Photos},
author = {Xirong Li and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/li-annotating-images-icassp2009.pdf},
year = {2009},
date = {2009-04-01},
booktitle = {ICASSP},
address = {Taipei, Taiwan},
abstract = {Automatic image tagging is important yet challenging due to the semantic gap and the lack of learning examples to model a tag's visual diversity. Meanwhile, social user tagging is creating rich multimedia content on the web. In this paper, we propose to combine the two tagging approaches in a search-based framework. For an unlabeled image, we first retrieve its visual neighbors from a large user-tagged image database. We then select relevant tags from the result images to annotate the unlabeled image. To tackle the unreliability and sparsity of user tagging, we introduce a joint-modality tag relevance estimation method which efficiently addresses both textual and visual clues. Experiments on 1.5 million Flickr photos and 10 000 Corel images verify the proposed method.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Automatic image tagging is important yet challenging due to the semantic gap and the lack of learning examples to model a tag's visual diversity. Meanwhile, social user tagging is creating rich multimedia content on the web. In this paper, we propose to combine the two tagging approaches in a search-based framework. For an unlabeled image, we first retrieve its visual neighbors from a large user-tagged image database. We then select relevant tags from the result images to annotate the unlabeled image. To tackle the unreliability and sparsity of user tagging, we introduce a joint-modality tag relevance estimation method which efficiently addresses both textual and visual clues. Experiments on 1.5 million Flickr photos and 10 000 Corel images verify the proposed method. |
 | Cees G M Snoek, Marcel Worring: Concept-Based Video Retrieval. In: Foundations and Trends in Information Retrieval, vol. 4, no. 2, pp. 215–322, 2009. @article{SnoekFNTIR09,
title = {Concept-Based Video Retrieval},
author = {Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-concept-based-video-retrieval-fntir.pdf},
year = {2009},
date = {2009-01-01},
journal = {Foundations and Trends in Information Retrieval},
volume = {4},
number = {2},
pages = {215--322},
abstract = {In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human-computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human-computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities. |
| Marcel Worring, Cees G M Snoek: Visual Content Analysis. In: Liu, Ling; Özsu, M Tamer (Ed.): Encyclopedia of Database Systems, pp. 3360–3365, Springer, 2009. @incollection{WorringEDS09,
title = {Visual Content Analysis},
author = {Marcel Worring and Cees G M Snoek},
editor = {Ling Liu and M Tamer Özsu},
year = {2009},
date = {2009-01-01},
booktitle = {Encyclopedia of Database Systems},
pages = {3360--3365},
publisher = {Springer},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
|
2008
|
 | Daragh Byrne, Aiden R Doherty, Cees G M Snoek, Gareth J F Jones, Alan F Smeaton: Validating the Detection of Everyday Concepts in Visual Lifelogs. In: Duke, David; Hardman, Lynda; Hauptmann, Alex; Paulus, Dietrich; Staab, Steffen (Ed.): SAMT, pp. 15–30, Springer-Verlag, 2008. @inproceedings{ByrneSAMT08,
title = {Validating the Detection of Everyday Concepts in Visual Lifelogs},
author = {Daragh Byrne and Aiden R Doherty and Cees G M Snoek and Gareth J F Jones and Alan F Smeaton},
editor = {David Duke and Lynda Hardman and Alex Hauptmann and Dietrich Paulus and Steffen Staab},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/byrne-everyday-concepts-samt2008.pdf},
year = {2008},
date = {2008-12-01},
booktitle = {SAMT},
volume = {5392},
pages = {15--30},
publisher = {Springer-Verlag},
series = {LNCS},
abstract = {The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging. |
 | Cees G M Snoek, Koen E A van de Sande, Ork de Rooij, Bouke Huurnink, Jan C van Gemert, Jasper R R Uijlings, Jiyin He, Xirong Li, Ivo Everts, Vladimir Nedovic, Michiel van Liempt, Richard van Balen, Fei Yan, Muhammad A Tahir, Krystian Mikolajczyk, Josef Kittler, Maarten de Rijke, Jan-Mark Geusebroek, Theo Gevers, Marcel Worring, Arnold W M Smeulders, Dennis C Koelma: The MediaMill TRECVID 2008 Semantic Video Search Engine. In: TRECVID, Gaithersburg, USA, 2008. @inproceedings{SnoekTRECVID08,
title = {The MediaMill TRECVID 2008 Semantic Video Search Engine},
author = {Cees G M Snoek and Koen E A van de Sande and Ork de Rooij and Bouke Huurnink and Jan C van Gemert and Jasper R R Uijlings and Jiyin He and Xirong Li and Ivo Everts and Vladimir Nedovic and Michiel van Liempt and Richard van Balen and Fei Yan and Muhammad A Tahir and Krystian Mikolajczyk and Josef Kittler and Maarten de Rijke and Jan-Mark Geusebroek and Theo Gevers and Marcel Worring and Arnold W M Smeulders and Dennis C Koelma},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/mediamill-TRECVID2008-final.pdf},
year = {2008},
date = {2008-11-01},
booktitle = {TRECVID},
address = {Gaithersburg, USA},
abstract = {In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors using a bag-of-words approach. To that end, our concept detection experiments emphasize in particular the role of visual sampling, the value of color invariant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search experiments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and active learning mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2008 edition of the TRECVID benchmark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept detection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year's TRECVID campaign; we highlight the most important lessons at the end of this paper.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors using a bag-of-words approach. To that end, our concept detection experiments emphasize in particular the role of visual sampling, the value of color invariant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search experiments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and active learning mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2008 edition of the TRECVID benchmark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept detection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year's TRECVID campaign; we highlight the most important lessons at the end of this paper. |
 | Xirong Li, Cees G M Snoek, Marcel Worring: Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In: MIR, pp. 180–187, Vancouver, Canada, 2008. @inproceedings{LiMIR08,
title = {Learning Tag Relevance by Neighbor Voting for Social Image Retrieval},
author = {Xirong Li and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/li-tag-relevance-mir2008.pdf},
year = {2008},
date = {2008-10-01},
booktitle = {MIR},
pages = {180--187},
address = {Vancouver, Canada},
abstract = {Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably interpret the relevance of a tag with respect to the visual content it is describing. Intuitively, if different persons label similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose a novel algorithm that scalably and reliably learns tag relevance by accumulating votes from visually similar neighbors. Further, treated as tag frequency, learned tag relevance is seamlessly embedded into current tag-based social image retrieval paradigms. Preliminary experiments on one million Flickr images demonstrate the potential of the proposed algorithm. Overall comparisons for both single-word queries and multiple-word queries show substantial improvement over the baseline by learning and using tag relevance. Specifically, compared with the baseline using the original tags, on average, retrieval using improved tags increases mean average precision by 24%, from 0.54 to 0.67. Moreover, simulated experiments indicate that performance can be improved further by scaling up the amount of images used in the proposed neighbor voting algorithm.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably interpret the relevance of a tag with respect to the visual content it is describing. Intuitively, if different persons label similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose a novel algorithm that scalably and reliably learns tag relevance by accumulating votes from visually similar neighbors. Further, treated as tag frequency, learned tag relevance is seamlessly embedded into current tag-based social image retrieval paradigms. Preliminary experiments on one million Flickr images demonstrate the potential of the proposed algorithm. Overall comparisons for both single-word queries and multiple-word queries show substantial improvement over the baseline by learning and using tag relevance. Specifically, compared with the baseline using the original tags, on average, retrieval using improved tags increases mean average precision by 24%, from 0.54 to 0.67. Moreover, simulated experiments indicate that performance can be improved further by scaling up the amount of images used in the proposed neighbor voting algorithm. |
 | Ork de Rooij, Cees G M Snoek, Marcel Worring: Balancing Thread Based Navigation for Targeted Video Search. In: CIVR, pp. 485–494, Niagara Falls, Canada, 2008. @inproceedings{RooijCIVR08,
title = {Balancing Thread Based Navigation for Targeted Video Search},
author = {Ork de Rooij and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/rooij-thread-based-navigation-civr2008.pdf},
year = {2008},
date = {2008-07-01},
booktitle = {CIVR},
pages = {485--494},
address = {Niagara Falls, Canada},
abstract = {Various query methods for video search exist. Because of the semantic gap each method has its limitations. We argue that for effective retrieval query methods need to be combined at retrieval time. However, switching query methods often involves a change in query and browsing interface, which puts a heavy burden on the user. In this paper, we propose a novel method for fast and effective search trough large video collections by embedding multiple query methods into a single browsing environment. To that end we introduced the notion of query threads, which contain a shot-based ranking of the video collection according to some feature-based similarity measure. On top of these threads we define several thread-based visualizations, ranging from fast targeted search to very broad exploratory search, with the ForkBrowser as the balance between fast search and video space exploration. We compare the effectiveness and efficiency of the ForkBrowser with the CrossBrowser on the TRECVID 2007 interactive search task. Results show that different query methods are needed for different types of search topics, and that the ForkBrowser requires signifficantly less user interactions to achieve the same result as the CrossBrowser. In addition, both browsers rank among the best interactive retrieval systems currently available.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Various query methods for video search exist. Because of the semantic gap each method has its limitations. We argue that for effective retrieval query methods need to be combined at retrieval time. However, switching query methods often involves a change in query and browsing interface, which puts a heavy burden on the user. In this paper, we propose a novel method for fast and effective search trough large video collections by embedding multiple query methods into a single browsing environment. To that end we introduced the notion of query threads, which contain a shot-based ranking of the video collection according to some feature-based similarity measure. On top of these threads we define several thread-based visualizations, ranging from fast targeted search to very broad exploratory search, with the ForkBrowser as the balance between fast search and video space exploration. We compare the effectiveness and efficiency of the ForkBrowser with the CrossBrowser on the TRECVID 2007 interactive search task. Results show that different query methods are needed for different types of search topics, and that the ForkBrowser requires signifficantly less user interactions to achieve the same result as the CrossBrowser. In addition, both browsers rank among the best interactive retrieval systems currently available. |
 | Koen E A van de Sande, Theo Gevers, Cees G M Snoek: A Comparison of Color Features for Visual Concept Classification. In: CIVR, pp. 141–149, Niagara Falls, Canada, 2008. @inproceedings{SandeCIVR08,
title = {A Comparison of Color Features for Visual Concept Classification},
author = {Koen E A van de Sande and Theo Gevers and Cees G M Snoek},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/sande-colorfeatures-civr2008.pdf
https://www.koen.me/research/colordescriptors/},
year = {2008},
date = {2008-07-01},
booktitle = {CIVR},
pages = {141--149},
address = {Niagara Falls, Canada},
abstract = {Concept classification is important to access visual information on the level of objects and scene types. So far, intensity-based features have been widely used. To increase discriminative power, color features have been proposed only recently. As many features exist, a structured overview is required of color features in the context of concept classification. Therefore, this paper studies 1. the invariance properties and 2. the distinctiveness of color features in a structured way. The invariance properties of color features with respect to photometric changes are summarized. The distinctiveness of color features is assessed experimentally using an image and a video benchmark: the PASCAL VOC Challenge 2007 and the Mediamill Challenge. Because color features cannot be studied independently from the points at which they are extracted, different point sampling strategies based on Harris-Laplace salient points, dense sampling and the spatial pyramid are also studied. From the experimental results, it can be derived that invariance to light intensity changes and light color changes affects concept classification. The results reveal further that the usefulness of invariance is concept-specific.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Concept classification is important to access visual information on the level of objects and scene types. So far, intensity-based features have been widely used. To increase discriminative power, color features have been proposed only recently. As many features exist, a structured overview is required of color features in the context of concept classification. Therefore, this paper studies 1. the invariance properties and 2. the distinctiveness of color features in a structured way. The invariance properties of color features with respect to photometric changes are summarized. The distinctiveness of color features is assessed experimentally using an image and a video benchmark: the PASCAL VOC Challenge 2007 and the Mediamill Challenge. Because color features cannot be studied independently from the points at which they are extracted, different point sampling strategies based on Harris-Laplace salient points, dense sampling and the spatial pyramid are also studied. From the experimental results, it can be derived that invariance to light intensity changes and light color changes affects concept classification. The results reveal further that the usefulness of invariance is concept-specific. |
 | Ork de Rooij, Cees G M Snoek, Marcel Worring: MediaMill: Fast and Effective Video Search using the ForkBrowser. In: CIVR, pp. 561–561, Niagara Falls, Canada, 2008. @inproceedings{RooijCIVR08b,
title = {MediaMill: Fast and Effective Video Search using the ForkBrowser},
author = {Ork de Rooij and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/rooij-forkbrowser-civr2008.pdf},
year = {2008},
date = {2008-07-01},
booktitle = {CIVR},
pages = {561--561},
address = {Niagara Falls, Canada},
abstract = {In this technical demonstration we showcase the MediaMill ForkBrowser for video retrieval. It embeds multiple query methods into a single browsing environment. We show that users can switch query methods on demand without the need to adapt to a different interface. This allows for fast and effective search trough large video collections.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this technical demonstration we showcase the MediaMill ForkBrowser for video retrieval. It embeds multiple query methods into a single browsing environment. We show that users can switch query methods on demand without the need to adapt to a different interface. This allows for fast and effective search trough large video collections. |
 | Koen E A van de Sande, Theo Gevers, Cees G M Snoek: Evaluation of Color Descriptors for Object and Scene Recognition. In: CVPR, Anchorage, Alaska, USA, 2008. @inproceedings{SandeCVPR08,
title = {Evaluation of Color Descriptors for Object and Scene Recognition},
author = {Koen E A van de Sande and Theo Gevers and Cees G M Snoek},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/sande-colordescriptors-cvpr2008.pdf
https://www.koen.me/research/colordescriptors/},
year = {2008},
date = {2008-06-01},
booktitle = {CVPR},
address = {Anchorage, Alaska, USA},
abstract = {Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used. To increase illumination invariance and discriminative power, color descriptors have been proposed only recently. As many descriptors exist, a structured overview of color invariant descriptors in the context of image category recognition is required. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors in a structured way. The invariance properties of color descriptors are shown analytically using a taxonomy based on invariance properties with respect to photometric transformations. The distinctiveness of color descriptors is assessed experimentally using two benchmarks from the image domain and the video domain.
From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results reveal further that, for light intensity changes, the usefulness of invariance is category-specific.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used. To increase illumination invariance and discriminative power, color descriptors have been proposed only recently. As many descriptors exist, a structured overview of color invariant descriptors in the context of image category recognition is required. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors in a structured way. The invariance properties of color descriptors are shown analytically using a taxonomy based on invariance properties with respect to photometric transformations. The distinctiveness of color descriptors is assessed experimentally using two benchmarks from the image domain and the video domain.
From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results reveal further that, for light intensity changes, the usefulness of invariance is category-specific. |
 | Koen E A van de Sande, Theo Gevers, Cees G M Snoek: Color Descriptors for Object Category Recognition. In: Proceedings of the IS&T European Conference on Colour in Graphics, Imaging, and Vision, Terrassa-Barcelona, Spain, 2008. @inproceedings{SandeCGIV08,
title = {Color Descriptors for Object Category Recognition},
author = {Koen E A van de Sande and Theo Gevers and Cees G M Snoek},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/sande-color-descriptors-cgiv2008.pdf
https://www.koen.me/research/colordescriptors/},
year = {2008},
date = {2008-06-01},
booktitle = {Proceedings of the IS&T European Conference on Colour in Graphics, Imaging, and Vision},
address = {Terrassa-Barcelona, Spain},
abstract = {Category recognition is important to access visual information on the level of objects. A common approach is to compute image descriptors first and then to apply machine learning to achieve category recognition from annotated examples. As a consequence,the choice of image descriptors is of great influence on the recognition accuracy. So far, intensity-based (e.g. SIFT) descriptors computed at salient points have been used. However, color has been largely ignored. The question is, can color information improve accuracy of category recognition? Therefore, in this paper, we will extend both salient point detection and region description with color information. The extension of color descriptors is integrated into the framework of category recognition enabling to select both intensity and color variants. Our experiments on an image benchmark show that category recognition benefits from the use of color. Moreover, the combination of intensity and color descriptors yields a 30% improvement over intensity features alone.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Category recognition is important to access visual information on the level of objects. A common approach is to compute image descriptors first and then to apply machine learning to achieve category recognition from annotated examples. As a consequence,the choice of image descriptors is of great influence on the recognition accuracy. So far, intensity-based (e.g. SIFT) descriptors computed at salient points have been used. However, color has been largely ignored. The question is, can color information improve accuracy of category recognition? Therefore, in this paper, we will extend both salient point detection and region description with color information. The extension of color descriptors is integrated into the framework of category recognition enabling to select both intensity and color variants. Our experiments on an image benchmark show that category recognition benefits from the use of color. Moreover, the combination of intensity and color descriptors yields a 30% improvement over intensity features alone. |
 | Cees G M Snoek, Richard van Balen, Dennis C Koelma, Arnold W M Smeulders, Marcel Worring: Analyzing Video Concept Detectors Visually. In: ICME, pp. 1603–1604, Hannover, Germany, 2008. @inproceedings{SnoekICME08,
title = {Analyzing Video Concept Detectors Visually},
author = {Cees G M Snoek and Richard van Balen and Dennis C Koelma and Arnold W M Smeulders and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-demo-icme2008.pdf},
year = {2008},
date = {2008-06-01},
booktitle = {ICME},
pages = {1603--1604},
address = {Hannover, Germany},
abstract = {In this demonstration we showcase an interactive analysis tool for researchers working on concept-based video retrieval. By visualizing intermediate concept detection analysis stages, the tool aids in understanding the success and failure of video concept detection methods. We demonstrate the tool on the domain of pop concert video.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this demonstration we showcase an interactive analysis tool for researchers working on concept-based video retrieval. By visualizing intermediate concept detection analysis stages, the tool aids in understanding the success and failure of video concept detection methods. We demonstrate the tool on the domain of pop concert video. |
 | Cees G M Snoek, Marcel Worring, Ork de Rooij, Koen E A van de Sande, Rong Yan, Alexander G Hauptmann: VideOlympics: Real-Time Evaluation of Multimedia Retrieval Systems. In: IEEE Multimedia, vol. 15, no. 1, pp. 86–91, 2008. @article{SnoekMM08,
title = {VideOlympics: Real-Time Evaluation of Multimedia Retrieval Systems},
author = {Cees G M Snoek and Marcel Worring and Ork de Rooij and Koen E A van de Sande and Rong Yan and Alexander G Hauptmann},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-videolympics-mm.pdf
https://www.youtube.com/watch?v=-QmhEguCoZ0
},
year = {2008},
date = {2008-01-01},
journal = {IEEE Multimedia},
volume = {15},
number = {1},
pages = {86--91},
abstract = {Video search is an experience for the senses. As a result, traditional information retrieval metrics can't fully measure the quality of a video search system. To provide a more interactive assessment of today's video search engines, the authors have organized the VideOlympics as a real-time evaluation showcase where systems compete to answer specific video searches in front of a live audience. At VideOlympics, seeing and hearing is believing.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Video search is an experience for the senses. As a result, traditional information retrieval metrics can't fully measure the quality of a video search system. To provide a more interactive assessment of today's video search engines, the authors have organized the VideOlympics as a real-time evaluation showcase where systems compete to answer specific video searches in front of a live audience. At VideOlympics, seeing and hearing is believing. |
2007
|
 | Cees G M Snoek, Ivo Everts, Jan C van Gemert, Jan-Mark Geusebroek, Bouke Huurnink, Dennis C Koelma, Michiel van Liempt, Ork de Rooij, Koen E A van de Sande, Arnold W M Smeulders, Jasper R R Uijlings, Marcel Worring: The MediaMill TRECVID 2007 Semantic Video Search Engine. In: TRECVID, Gaithersburg, USA, 2007. @inproceedings{SnoekTRECVID07,
title = {The MediaMill TRECVID 2007 Semantic Video Search Engine},
author = {Cees G M Snoek and Ivo Everts and Jan C van Gemert and Jan-Mark Geusebroek and Bouke Huurnink and Dennis C Koelma and Michiel van Liempt and Ork de Rooij and Koen E A van de Sande and Arnold W M Smeulders and Jasper R R Uijlings and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/mediamill-TRECVID2007-final.pdf},
year = {2007},
date = {2007-11-01},
booktitle = {TRECVID},
address = {Gaithersburg, USA},
abstract = {In this paper we describe our TRECVID 2007 experiments. The MediaMill team participated in two tasks: concept detection and search. For concept detection we extract region-based image features, on grid, keypoint, and segmentation level, which we combine with various supervised learners. In addition, we explore the utility of temporal image features. A late fusion approach of all region-based analysis methods using geometric mean was our most successful run. What is more, using MediaMill Challenge and LSCOM annotations, our visual-only approach generalizes to a set of 572 concept detectors. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concept detectors based on text matching, ontology querying, and visual concept likelihood. The suggestion engine is evaluated as
part of the automatic search task and forms the entry point for our interactive search experiments. For this task we experiment with two browsers for interactive exploration: the well-known CrossBrowser and the novel ForkBrowser. It was found that, while retrieval performance varies substantially per topic, the ForkBrowser is able to produce the same overall results as the CrossBrowser. However, the ForkBrowser obtains top-performance for most topics with less user interaction. Indicating the potential of this browser for interactive search. Similar to previous years our best interactive search runs yield high overall performance, ranking 3rd and 4th.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we describe our TRECVID 2007 experiments. The MediaMill team participated in two tasks: concept detection and search. For concept detection we extract region-based image features, on grid, keypoint, and segmentation level, which we combine with various supervised learners. In addition, we explore the utility of temporal image features. A late fusion approach of all region-based analysis methods using geometric mean was our most successful run. What is more, using MediaMill Challenge and LSCOM annotations, our visual-only approach generalizes to a set of 572 concept detectors. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concept detectors based on text matching, ontology querying, and visual concept likelihood. The suggestion engine is evaluated as
part of the automatic search task and forms the entry point for our interactive search experiments. For this task we experiment with two browsers for interactive exploration: the well-known CrossBrowser and the novel ForkBrowser. It was found that, while retrieval performance varies substantially per topic, the ForkBrowser is able to produce the same overall results as the CrossBrowser. However, the ForkBrowser obtains top-performance for most topics with less user interaction. Indicating the potential of this browser for interactive search. Similar to previous years our best interactive search runs yield high overall performance, ranking 3rd and 4th. |
 | Frank J Seinstra, Jan-Mark Geusebroek, Dennis Koelma, Cees G M Snoek, Marcel Worring, Arnold W M Smeulders: High-Performance Distributed Image and Video Content Analysis with Parallel-Horus. In: IEEE Multimedia, vol. 14, no. 4, pp. 64–75, 2007. @article{SeinstraMM07,
title = {High-Performance Distributed Image and Video Content Analysis with Parallel-Horus},
author = {Frank J Seinstra and Jan-Mark Geusebroek and Dennis Koelma and Cees G M Snoek and Marcel Worring and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/seinstra-parallel-horus-mm.pdf},
year = {2007},
date = {2007-10-01},
journal = {IEEE Multimedia},
volume = {14},
number = {4},
pages = {64--75},
abstract = {As the world uses more digital video that requires greater storage space, Grid computing is becoming indispensable for urgent problems in multimedia content analysis. Parallel-Horus, a support tool for applications in multimedia Grid computing, lets users implement multimedia applications as sequential programs for efficient execution on clusters and Grids, based on wide-area multimedia services.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
As the world uses more digital video that requires greater storage space, Grid computing is becoming indispensable for urgent problems in multimedia content analysis. Parallel-Horus, a support tool for applications in multimedia Grid computing, lets users implement multimedia applications as sequential programs for efficient execution on clusters and Grids, based on wide-area multimedia services. |
 | Ork de Rooij, Cees G M Snoek, Marcel Worring: Query on Demand Video Browsing. In: MM, pp. 811–814, Augsburg, Germany, 2007. @inproceedings{RooijACM07,
title = {Query on Demand Video Browsing},
author = {Ork de Rooij and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/rooij-rotor-acm2007.pdf},
year = {2007},
date = {2007-09-01},
booktitle = {MM},
pages = {811--814},
address = {Augsburg, Germany},
abstract = {This paper describes a novel method for browsing a large collection of news video by linking various forms of related video fragments together as threads. Each thread contains a sequence of shots with high feature-based similarity. Two interfaces are designed which use threads as the basis for browsing. One interface shows a minimal set of threads, and the other as many as possible. Both interfaces are evaluated in the TRECVID interactive retrieval task, where they ranked among the best interactive retrieval systems currently available. The results indicate that the use of threads in interactive video search is very beneficial. We have found that in general the query result and the timeline are the most important threads. However, having several additional threads allow a user to find unique results which cannot easily be found by using query results and time alone.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper describes a novel method for browsing a large collection of news video by linking various forms of related video fragments together as threads. Each thread contains a sequence of shots with high feature-based similarity. Two interfaces are designed which use threads as the basis for browsing. One interface shows a minimal set of threads, and the other as many as possible. Both interfaces are evaluated in the TRECVID interactive retrieval task, where they ranked among the best interactive retrieval systems currently available. The results indicate that the use of threads in interactive video search is very beneficial. We have found that in general the query result and the timeline are the most important threads. However, having several additional threads allow a user to find unique results which cannot easily be found by using query results and time alone. |
 | Arnold W M Smeulders, Jan C van Gemert, Bouke Huurnink, Dennis C Koelma, Ork de Rooij, Koen E A van de Sande, Cees G M Snoek, Cor J Veenman, Marcel Worring: Semantic Video Search. In: ICIAP, Modena, Italy, 2007. @inproceedings{SmeuldersICIAP07,
title = {Semantic Video Search},
author = {Arnold W M Smeulders and Jan C van Gemert and Bouke Huurnink and Dennis C Koelma and Ork de Rooij and Koen E A van de Sande and Cees G M Snoek and Cor J Veenman and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/smeulders-search-iciap2007.pdf},
year = {2007},
date = {2007-09-01},
booktitle = {ICIAP},
address = {Modena, Italy},
abstract = {In this paper we describe the current performance of our MediaMill system as presented in the TRECVID 2006 benchmark for video search engines. The MediaMill team participated in two tasks: concept detection and search. For concept detection we use the MediaMill Challenge as experimental platform. The MediaMill Challenge divides the generic video indexing problem into a visual-only, textual-only, early fusion, late fusion, and combined analysis experiment. We provide a baseline implementation for each experiment together with baseline results. We extract image features, on global, regional, and keypoint level, which we combine with various supervised learners. A late fusion approach of visual-only analysis methods using geometric mean was our most successful run. With this run we conquer the Challenge baseline by more than 50%. Our concept detection experiments have resulted in the best score for three concepts: i.e. desert, flag us, and charts. What is more, using LSCOM annotations, our visual-only approach generalizes well to a set of 491 concept detectors. To handle such a large thesaurus in retrieval, an engine is developed which allows users to select relevant concept detectors based on interactive browsing using advanced visualizations. Similar to previous years our best interactive search runs yield top performance, ranking 2nd and 6th overall.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we describe the current performance of our MediaMill system as presented in the TRECVID 2006 benchmark for video search engines. The MediaMill team participated in two tasks: concept detection and search. For concept detection we use the MediaMill Challenge as experimental platform. The MediaMill Challenge divides the generic video indexing problem into a visual-only, textual-only, early fusion, late fusion, and combined analysis experiment. We provide a baseline implementation for each experiment together with baseline results. We extract image features, on global, regional, and keypoint level, which we combine with various supervised learners. A late fusion approach of visual-only analysis methods using geometric mean was our most successful run. With this run we conquer the Challenge baseline by more than 50%. Our concept detection experiments have resulted in the best score for three concepts: i.e. desert, flag us, and charts. What is more, using LSCOM annotations, our visual-only approach generalizes well to a set of 491 concept detectors. To handle such a large thesaurus in retrieval, an engine is developed which allows users to select relevant concept detectors based on interactive browsing using advanced visualizations. Similar to previous years our best interactive search runs yield top performance, ranking 2nd and 6th overall. |
 | Cees G M Snoek, Bouke Huurnink, Laura Hollink, Maarten de Rijke, Guus Schreiber, Marcel Worring: Adding Semantics to Detectors for Video Retrieval. In: IEEE Transactions on Multimedia, vol. 9, no. 5, pp. 975–986, 2007. @article{SnoekTMM07b,
title = {Adding Semantics to Detectors for Video Retrieval},
author = {Cees G M Snoek and Bouke Huurnink and Laura Hollink and Maarten de Rijke and Guus Schreiber and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-semantics2detectors-tmm.pdf},
year = {2007},
date = {2007-08-01},
journal = {IEEE Transactions on Multimedia},
volume = {9},
number = {5},
pages = {975--986},
abstract = {In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e., a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 h in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small for us to be definite in our conclusions, but the results suggest promising new lines of research.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e., a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 h in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small for us to be definite in our conclusions, but the results suggest promising new lines of research. |
 | Cees G M Snoek, Marcel Worring, Arnold W M Smeulders, Bauke Freiburg: The Role of Visual Content and Style for Concert Video Indexing. In: ICME, pp. 252–255, Beijing, China, 2007. @inproceedings{SnoekICME07b,
title = {The Role of Visual Content and Style for Concert Video Indexing},
author = {Cees G M Snoek and Marcel Worring and Arnold W M Smeulders and Bauke Freiburg},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-fabchannel-icme2007.pdf},
year = {2007},
date = {2007-07-01},
booktitle = {ICME},
pages = {252--255},
address = {Beijing, China},
abstract = {This paper contributes to the automatic indexing of concert video. In contrast to traditional methods, which rely primarily on audio information for summarization applications, we explore how a visual-only concept detection approach could be employed. We investigate how our recent method for news video indexing -- which takes into account the role of content and style -- generalizes to the concert domain. We analyze concert video on three levels of visual abstraction, namely: content, style, and their fusion. Experiments with 12 concept detectors, on 45 hours of visually challenging concert video, show that the automatically learned best approach is concept-dependent. Moreover, these results suggest that the visual modality provides ample opportunity for more effective indexing and retrieval of concert video when used in addition to the auditory modality.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper contributes to the automatic indexing of concert video. In contrast to traditional methods, which rely primarily on audio information for summarization applications, we explore how a visual-only concept detection approach could be employed. We investigate how our recent method for news video indexing -- which takes into account the role of content and style -- generalizes to the concert domain. We analyze concert video on three levels of visual abstraction, namely: content, style, and their fusion. Experiments with 12 concept detectors, on 45 hours of visually challenging concert video, show that the automatically learned best approach is concept-dependent. Moreover, these results suggest that the visual modality provides ample opportunity for more effective indexing and retrieval of concert video when used in addition to the auditory modality. |
 | Cees G M Snoek, Marcel Worring: Are Concept Detector Lexicons Effective for Video Search?. In: ICME, pp. 1966–1969, Beijing, China, 2007. @inproceedings{SnoekICME07a,
title = {Are Concept Detector Lexicons Effective for Video Search?},
author = {Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-concept-icme2007.pdf},
year = {2007},
date = {2007-07-01},
booktitle = {ICME},
pages = {1966--1969},
address = {Beijing, China},
abstract = {Until now, systematic studies on the effectiveness of concept detectors for video search have been carried out using less than 20 detectors, or in combination with other retrieval techniques. We investigate whether video search using just large concept detector lexicons is a viable alternative for present day approaches. We demonstrate that increasing the number of concept detectors in a lexicon yields improved video retrieval performance indeed. In addition, we show that combining concept detectors at query time has the potential to boost performance further. We obtain the experimental evidence on the automatic video search task of TRECVID 2005 using 363 machine learned concept detectors.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Until now, systematic studies on the effectiveness of concept detectors for video search have been carried out using less than 20 detectors, or in combination with other retrieval techniques. We investigate whether video search using just large concept detector lexicons is a viable alternative for present day approaches. We demonstrate that increasing the number of concept detectors in a lexicon yields improved video retrieval performance indeed. In addition, we show that combining concept detectors at query time has the potential to boost performance further. We obtain the experimental evidence on the automatic video search task of TRECVID 2005 using 363 machine learned concept detectors. |
 | Ork de Rooij, Cees G M Snoek, Marcel Worring: MediaMill: Semantic Video Search using the RotorBrowser. In: CIVR, pp. 649, Amsterdam, The Netherlands, 2007. @inproceedings{RooijCIVR07,
title = {MediaMill: Semantic Video Search using the RotorBrowser},
author = {Ork de Rooij and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/rooij-rotorbrowser-civr2007.pdf},
year = {2007},
date = {2007-07-01},
booktitle = {CIVR},
pages = {649},
address = {Amsterdam, The Netherlands},
abstract = {In this technical demonstration we showcase the current version of the MediaMill system, a search engine that facilitates access to news video archives at a semantic level. The core of the system is a thesaurus of 500 automatically detected semantic concepts. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concepts based on a textual query, and an novel user interface which uses multi dimensional browsing to visualize the result set.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this technical demonstration we showcase the current version of the MediaMill system, a search engine that facilitates access to news video archives at a semantic level. The core of the system is a thesaurus of 500 automatically detected semantic concepts. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concepts based on a textual query, and an novel user interface which uses multi dimensional browsing to visualize the result set. |
 | Ork de Rooij, Cees G M Snoek, Marcel Worring: MediaMill: Video Query on demand using the RotorBrowser. In: ICME, pp. 6, Beijing, China, 2007. @inproceedings{RooijICME07,
title = {MediaMill: Video Query on demand using the RotorBrowser},
author = {Ork de Rooij and Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/rooij-rotorbrowser-icme2007.pdf},
year = {2007},
date = {2007-07-01},
booktitle = {ICME},
pages = {6},
address = {Beijing, China},
abstract = {In this technical demonstration we showcase the RotorBrowser, A visualization within MediaMill system which uses query exploration as the basis for search in video archives.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this technical demonstration we showcase the RotorBrowser, A visualization within MediaMill system which uses query exploration as the basis for search in video archives. |
 | Marcel Worring, Cees G M Snoek, Ork de Rooij, Giang P Nguyen, Arnold W M Smeulders: The MediaMill Semantic Video Search Engine. In: ICASSP, pp. 1213–1216, Honolulu, Hawaii, USA, 2007. @inproceedings{WorringICASSP07,
title = {The MediaMill Semantic Video Search Engine},
author = {Marcel Worring and Cees G M Snoek and Ork de Rooij and Giang P Nguyen and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/worring-mediamill-icassp2007.pdf},
year = {2007},
date = {2007-04-01},
booktitle = {ICASSP},
volume = {4},
pages = {1213--1216},
address = {Honolulu, Hawaii, USA},
abstract = {In this paper we present the methods underlying the MediaMill semantic video search engine. The basis for the engine is a semantic indexing process which is currently based on a lexicon of 491 concept detectors. To support the user in navigating the collection, the system defines a visual similarity space, a semantic similarity space, a semantic thread space, and browsers to explore them. We compare the different browsers and their utility within the TRECVID benchmark. In 2005, We obtained a top-3 result for 19 out of 24 search topics. In 2006 for 14 out of 24.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we present the methods underlying the MediaMill semantic video search engine. The basis for the engine is a semantic indexing process which is currently based on a lexicon of 491 concept detectors. To support the user in navigating the collection, the system defines a visual similarity space, a semantic similarity space, a semantic thread space, and browsers to explore them. We compare the different browsers and their utility within the TRECVID benchmark. In 2005, We obtained a top-3 result for 19 out of 24 search topics. In 2006 for 14 out of 24. |
 | Cees G M Snoek, Marcel Worring, Dennis C Koelma, Arnold W M Smeulders: A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval. In: IEEE Transactions on Multimedia, vol. 9, no. 2, pp. 280–292, 2007. @article{SnoekTMM07b,
title = {A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval},
author = {Cees G M Snoek and Marcel Worring and Dennis C Koelma and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-lexicon-tmm.pdf},
year = {2007},
date = {2007-02-01},
journal = {IEEE Transactions on Multimedia},
volume = {9},
number = {2},
pages = {280--292},
abstract = {Effective video retrieval is the result of an interplay between interactive query selection, advanced visualization of results, and a goal-oriented human user. Traditional interactive video retrieval approaches emphasize paradigms, such as query-by-keyword and query-by-example, to aid the user in the search for relevant footage. However, recent results in automatic indexing indicate that query-by-concept is becoming a viable resource for interactive retrieval also. We propose in this paper a new video retrieval paradigm. The core of the paradigm is formed by first detecting a large lexicon of semantic concepts. From there, we combine query-by-concept, query-by-example, query-by-keyword, and user interaction into the emphMediaMill semantic video search engine. To measure the impact of increasing lexicon size on interactive video retrieval performance, we performed two experiments against the 2004 and 2005 NIST TRECVID benchmarks, using lexicons containing 32 and 101 concepts respectively. The results suggest that from all factors that play a role in interactive retrieval, a large lexicon of semantic concepts matters most. Indeed, by exploiting large lexicons, many video search questions are solvable without using query-by-keyword and query-by-example. What is more, we show that the lexicon-driven search engine outperforms all state-of-the-art video retrieval systems in both TRECVID 2004 and 2005.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Effective video retrieval is the result of an interplay between interactive query selection, advanced visualization of results, and a goal-oriented human user. Traditional interactive video retrieval approaches emphasize paradigms, such as query-by-keyword and query-by-example, to aid the user in the search for relevant footage. However, recent results in automatic indexing indicate that query-by-concept is becoming a viable resource for interactive retrieval also. We propose in this paper a new video retrieval paradigm. The core of the paradigm is formed by first detecting a large lexicon of semantic concepts. From there, we combine query-by-concept, query-by-example, query-by-keyword, and user interaction into the emphMediaMill semantic video search engine. To measure the impact of increasing lexicon size on interactive video retrieval performance, we performed two experiments against the 2004 and 2005 NIST TRECVID benchmarks, using lexicons containing 32 and 101 concepts respectively. The results suggest that from all factors that play a role in interactive retrieval, a large lexicon of semantic concepts matters most. Indeed, by exploiting large lexicons, many video search questions are solvable without using query-by-keyword and query-by-example. What is more, we show that the lexicon-driven search engine outperforms all state-of-the-art video retrieval systems in both TRECVID 2004 and 2005. |
 | Cees G M Snoek, Marcel Worring, Jan-Mark Geusebroek, Dennis C Koelma, Frank J Seinstra, Arnold W M Smeulders: Semantic Video Indexing. In: Blanken, Henk M; de Vries, Arjen P; Blok, Henk Ernst; Feng, Ling (Ed.): Multimedia Retrieval, pp. 225–249, Springer, 2007. @incollection{SnoekMIR07,
title = {Semantic Video Indexing},
author = {Cees G M Snoek and Marcel Worring and Jan-Mark Geusebroek and Dennis C Koelma and Frank J Seinstra and Arnold W M Smeulders},
editor = {Henk M Blanken and Arjen P de Vries and Henk Ernst Blok and Ling Feng},
year = {2007},
date = {2007-01-01},
booktitle = {Multimedia Retrieval},
pages = {225--249},
publisher = {Springer},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
|
2006
|
 | Cees G M Snoek, Marcel Worring, Bouke Huurnink, Jan C van Gemert, Koen E A van de Sande, Dennis C Koelma, Ork de Rooij: MediaMill: Video Search using a Thesaurus of 500 Machine Learned Concepts. In: SAMT, Athens, Greece, 2006. @inproceedings{SnoekSAMT06,
title = {MediaMill: Video Search using a Thesaurus of 500 Machine Learned Concepts},
author = {Cees G M Snoek and Marcel Worring and Bouke Huurnink and Jan C van Gemert and Koen E A van de Sande and Dennis C Koelma and Ork de Rooij},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-demo-samt2006.pdf},
year = {2006},
date = {2006-12-01},
booktitle = {SAMT},
address = {Athens, Greece},
abstract = {In this technical demonstration we showcase the current version of the MediaMill system, a search engine that facilitates access to news video archives at a semantic level. The core of the system is a thesaurus of 500 automatically detected semantic concepts. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concepts based on the textual query and user-specified example images. The result set can be browsed easily to obtain the final result for the query.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this technical demonstration we showcase the current version of the MediaMill system, a search engine that facilitates access to news video archives at a semantic level. The core of the system is a thesaurus of 500 automatically detected semantic concepts. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concepts based on the textual query and user-specified example images. The result set can be browsed easily to obtain the final result for the query. |
 | Cees G M Snoek, Jan C van Gemert, Theo Gevers, Bouke Huurnink, Dennis C Koelma, Michiel van Liempt, Ork de Rooij, Koen E A van de Sande, Frank J Seinstra, Arnold W M Smeulders, Andrew H C Thean, Cor J Veenman, Marcel Worring: The MediaMill TRECVID 2006 Semantic Video Search Engine. In: TRECVID, Gaithersburg, USA, 2006. @inproceedings{SnoekTRECVID06,
title = {The MediaMill TRECVID 2006 Semantic Video Search Engine},
author = {Cees G M Snoek and Jan C van Gemert and Theo Gevers and Bouke Huurnink and Dennis C Koelma and Michiel van Liempt and Ork de Rooij and Koen E A van de Sande and Frank J Seinstra and Arnold W M Smeulders and Andrew H C Thean and Cor J Veenman and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/mediamill-TRECVID2006-final.pdf},
year = {2006},
date = {2006-11-01},
booktitle = {TRECVID},
address = {Gaithersburg, USA},
abstract = {In this paper we describe our TRECVID 2006 experiments. The MediaMill team participated in two tasks: concept detection and search. For concept detection we use the MediaMill Challenge as experimental platform. The MediaMill Challenge divides the generic video indexing problem into a visual-only, textual-only, early fusion, late fusion, and combined analysis experiment. We provide a baseline implementation for each experiment together with baseline results, which we made available for the TRECVID community. The Challenge package was downloaded more than 80 times and we anticipate that it has been used by several teams for their 2006 submission. Our Challenge experiments focus specifically on visual-only analysis of video (run id: B_MM). We extract image features, on global, regional, and keypoint level, which we combine with various supervised learners. A late fusion approach of visual-only analysis methods using geometric mean was our most successful run. With this run we conquer the Challenge baseline by more than 50%. Our concept detection experiments have resulted in the best score for three concepts: i.e. emphdesert, emphflag us, and emphcharts. What is more, using LSCOM annotations, our visual-only approach generalizes well to a set of 491 concept detectors. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concept detectors based on text matching and ontology querying. The suggestion engine is evaluated as part of the automatic search task (run id: A-MM) and forms the entry point for our interactive search experiments (run id: A-MM). Here we experiment with query by object matching and two browsers for interactive exploration: the CrossBrowser and the novel NovaBrowser. It was found that the NovaBrowser is able to produce the same results as the CrossBrowser, but with less user interaction. Similar to previous years our best interactive search runs yield top performance, ranking 2nd and 6th overall. Again a lot has been learned during this year's TRECVID campaign, we highlight the most important lessons at the end of this paper.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we describe our TRECVID 2006 experiments. The MediaMill team participated in two tasks: concept detection and search. For concept detection we use the MediaMill Challenge as experimental platform. The MediaMill Challenge divides the generic video indexing problem into a visual-only, textual-only, early fusion, late fusion, and combined analysis experiment. We provide a baseline implementation for each experiment together with baseline results, which we made available for the TRECVID community. The Challenge package was downloaded more than 80 times and we anticipate that it has been used by several teams for their 2006 submission. Our Challenge experiments focus specifically on visual-only analysis of video (run id: B_MM). We extract image features, on global, regional, and keypoint level, which we combine with various supervised learners. A late fusion approach of visual-only analysis methods using geometric mean was our most successful run. With this run we conquer the Challenge baseline by more than 50%. Our concept detection experiments have resulted in the best score for three concepts: i.e. emphdesert, emphflag us, and emphcharts. What is more, using LSCOM annotations, our visual-only approach generalizes well to a set of 491 concept detectors. To handle such a large thesaurus in retrieval, an engine is developed which automatically selects a set of relevant concept detectors based on text matching and ontology querying. The suggestion engine is evaluated as part of the automatic search task (run id: A-MM) and forms the entry point for our interactive search experiments (run id: A-MM). Here we experiment with query by object matching and two browsers for interactive exploration: the CrossBrowser and the novel NovaBrowser. It was found that the NovaBrowser is able to produce the same results as the CrossBrowser, but with less user interaction. Similar to previous years our best interactive search runs yield top performance, ranking 2nd and 6th overall. Again a lot has been learned during this year's TRECVID campaign, we highlight the most important lessons at the end of this paper. |
 | Cees G M Snoek, Marcel Worring, Jan C van Gemert, Jan-Mark Geusebroek, Arnold W M Smeulders: The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In: MM, pp. 421–430, Santa Barbara, USA, 2006. @inproceedings{SnoekACM06,
title = {The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia},
author = {Cees G M Snoek and Marcel Worring and Jan C van Gemert and Jan-Mark Geusebroek and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-challenge-acm2006.pdf
https://ivi.fnwi.uva.nl/isis/mediamill/challenge/index.php},
year = {2006},
date = {2006-10-01},
booktitle = {MM},
pages = {421--430},
address = {Santa Barbara, USA},
abstract = {We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that influence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at http://www.mediamill.nl/challenge/.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that influence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at http://www.mediamill.nl/challenge/. |
 | Jan C van Gemert, Cees G M Snoek, Cor J Veenman, Arnold W M Smeulders: The Influence of Cross-Validation on Video Classification Performance. In: MM, pp. 695–698, Santa Barbara, USA, 2006. @inproceedings{GemertACM06,
title = {The Influence of Cross-Validation on Video Classification Performance},
author = {Jan C van Gemert and Cees G M Snoek and Cor J Veenman and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/gemert-crossvalidation-acm2006.pdf},
year = {2006},
date = {2006-10-01},
booktitle = {MM},
pages = {695--698},
address = {Santa Barbara, USA},
abstract = {Digital video is sequential in nature. When video data is used in a semantic concept classification task, the episodes are usually summarized with shots. The shots are annotated as containing, or not containing, a certain concept resulting in a labeled dataset. These labeled shots can subsequently be used by supervised learning methods (classifiers) where they are trained to predict the absence or presence of the concept in unseen shots and episodes. The performance of such automatic classification systems is usually estimated with cross-validation. By taking random samples from the dataset for training and testing as such, part of the shots from an episode are in the training set and another part from the same episode is in the test set. Accordingly, data dependence between training and test set is introduced, resulting in too optimistic performance estimates. In this paper, we experimentally show this bias, and propose how this bias can be prevented using "episode-constrained" cross-validation. Moreover, we show that a 15% higher classifier performance can be achieved by using episode constrained cross-validation for classifier parameter tuning.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Digital video is sequential in nature. When video data is used in a semantic concept classification task, the episodes are usually summarized with shots. The shots are annotated as containing, or not containing, a certain concept resulting in a labeled dataset. These labeled shots can subsequently be used by supervised learning methods (classifiers) where they are trained to predict the absence or presence of the concept in unseen shots and episodes. The performance of such automatic classification systems is usually estimated with cross-validation. By taking random samples from the dataset for training and testing as such, part of the shots from an episode are in the training set and another part from the same episode is in the test set. Accordingly, data dependence between training and test set is introduced, resulting in too optimistic performance estimates. In this paper, we experimentally show this bias, and propose how this bias can be prevented using "episode-constrained" cross-validation. Moreover, we show that a 15% higher classifier performance can be achieved by using episode constrained cross-validation for classifier parameter tuning. |
 | Cees G M Snoek, Marcel Worring, Jan-Mark Geusebroek, Dennis C Koelma, Frank J Seinstra, Arnold W M Smeulders: The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1678–1689, 2006. @article{SnoekPAMI06,
title = {The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing},
author = {Cees G M Snoek and Marcel Worring and Jan-Mark Geusebroek and Dennis C Koelma and Frank J Seinstra and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-pathfinder-pami.pdf
http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-pathfinder-pami-groundtruth.zip},
year = {2006},
date = {2006-10-01},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume = {28},
number = {10},
pages = {1678--1689},
abstract = {This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The semantic pathfinder extracts semantic concepts from video by exploring different paths through three consecutive analysis steps, which we derive from the observation that produced video is the result of an authoring-driven process. We exploit this emphauthoring metaphor for machine-driven understanding. The pathfinder starts with the content analysis step. In this analysis step, we follow a data-driven approach of indexing semantics. The style analysis step is the second analysis step. Here we tackle the indexing problem by viewing a video from the perspective of production. Finally, in the context analysis step, we view semantics in context. The virtue of the semantic pathfinder is its ability to learn the best path of analysis steps on a per-concept basis. To show the generality of this novel indexing approach we develop detectors for a lexicon of 32 concepts and we evaluate the semantic pathfinder against the 2004 NIST TRECVID video retrieval benchmark, using a news archive of 64 hours. Top ranking performance in the semantic concept detection task indicates the merit of the semantic pathfinder for generic indexing of multimedia archives.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The semantic pathfinder extracts semantic concepts from video by exploring different paths through three consecutive analysis steps, which we derive from the observation that produced video is the result of an authoring-driven process. We exploit this emphauthoring metaphor for machine-driven understanding. The pathfinder starts with the content analysis step. In this analysis step, we follow a data-driven approach of indexing semantics. The style analysis step is the second analysis step. Here we tackle the indexing problem by viewing a video from the perspective of production. Finally, in the context analysis step, we view semantics in context. The virtue of the semantic pathfinder is its ability to learn the best path of analysis steps on a per-concept basis. To show the generality of this novel indexing approach we develop detectors for a lexicon of 32 concepts and we evaluate the semantic pathfinder against the 2004 NIST TRECVID video retrieval benchmark, using a news archive of 64 hours. Top ranking performance in the semantic concept detection task indicates the merit of the semantic pathfinder for generic indexing of multimedia archives. |
 | Marcel Worring, Cees G M Snoek, Bouke Huurnink, Jan C van Gemert, Dennis Koelma, Ork de Rooij: The MediaMill Large-lexicon Concept Suggestion Engine. In: MM, pp. 785–786, Santa Barbara, USA, 2006. @inproceedings{WorringACM06,
title = {The MediaMill Large-lexicon Concept Suggestion Engine},
author = {Marcel Worring and Cees G M Snoek and Bouke Huurnink and Jan C van Gemert and Dennis Koelma and Ork de Rooij},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/worring-demo-acm2006.pdf},
year = {2006},
date = {2006-10-01},
booktitle = {MM},
pages = {785--786},
address = {Santa Barbara, USA},
abstract = {In this technical demonstration we show the current version of the MediaMill system, a search engine that facilitates access to news video archives at a semantic level. The core of the system is a lexicon of 436 automatically detected semantic concepts. To handle such a large lexicon in retrieval, an engine is developed which automatically selects a set of relevant concepts based on the textual query and example images. The result set can be browsed easily to obtain the final result for the query.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this technical demonstration we show the current version of the MediaMill system, a search engine that facilitates access to news video archives at a semantic level. The core of the system is a lexicon of 436 automatically detected semantic concepts. To handle such a large lexicon in retrieval, an engine is developed which automatically selects a set of relevant concepts based on the textual query and example images. The result set can be browsed easily to obtain the final result for the query. |
 | Marcel Worring, Cees G M Snoek, Ork de Rooij, Giang P Nguyen, Dennis C Koelma: Lexicon-based Browsers for Searching in News Video Archives. In: ICPR, pp. 1256–1259, Hong Kong, China, 2006. @inproceedings{WorringICPR06,
title = {Lexicon-based Browsers for Searching in News Video Archives},
author = {Marcel Worring and Cees G M Snoek and Ork de Rooij and Giang P Nguyen and Dennis C Koelma},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/worring-browsers-icpr2006.pdf},
year = {2006},
date = {2006-08-01},
booktitle = {ICPR},
pages = {1256--1259},
address = {Hong Kong, China},
abstract = {In this paper we present the methods and visualizations used in the MediaMill video search engine. The basis for the engine is a semantic indexing process which derives a lexicon of 101 concepts. To support the user in navigating the collection, the system defines a visual similarity space, a semantic similarity space, a semantic thread space, and browsers to explore them. The search system is evaluated within the TRECVID benchmark. We obtain a top-3 result for 19 out of 24 search topics. In addition, we obtain the highest mean average precision of all search participants.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we present the methods and visualizations used in the MediaMill video search engine. The basis for the engine is a semantic indexing process which derives a lexicon of 101 concepts. To support the user in navigating the collection, the system defines a visual similarity space, a semantic similarity space, a semantic thread space, and browsers to explore them. The search system is evaluated within the TRECVID benchmark. We obtain a top-3 result for 19 out of 24 search topics. In addition, we obtain the highest mean average precision of all search participants. |
 | Cees G M Snoek, Marcel Worring, Dennis C Koelma, Arnold W M Smeulders: Learned Lexicon-driven Interactive Video Retrieval. In: Sundaram, H; others, (Ed.): CIVR, pp. 11–20, Springer-Verlag, Heidelberg, Germany, 2006. @inproceedings{SnoekCIVR06,
title = {Learned Lexicon-driven Interactive Video Retrieval},
author = {Cees G M Snoek and Marcel Worring and Dennis C Koelma and Arnold W M Smeulders},
editor = {H Sundaram and others},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-lexicon-civr2006.pdf},
year = {2006},
date = {2006-07-01},
booktitle = {CIVR},
volume = {4071},
pages = {11--20},
publisher = {Springer-Verlag},
address = {Heidelberg, Germany},
series = {LNCS},
abstract = {We combine in this paper automatic learning of a large lexicon of semantic concepts with traditional video retrieval methods into a novel approach to narrow the semantic gap. The core of the proposed solution is formed by the automatic detection of an unprecedented lexicon of 101 concepts. From there, we explore the combination of query-by-concept, query-by-example, query-by-keyword, and user interaction into the emphMediaMill semantic video search engine. We evaluate the search engine against the 2005 NIST TRECVID video retrieval benchmark, using an international broadcast news archive of 85 hours. Top ranking results show that the lexicon-driven search engine is highly effective for interactive video retrieval.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We combine in this paper automatic learning of a large lexicon of semantic concepts with traditional video retrieval methods into a novel approach to narrow the semantic gap. The core of the proposed solution is formed by the automatic detection of an unprecedented lexicon of 101 concepts. From there, we explore the combination of query-by-concept, query-by-example, query-by-keyword, and user interaction into the emphMediaMill semantic video search engine. We evaluate the search engine against the 2005 NIST TRECVID video retrieval benchmark, using an international broadcast news archive of 85 hours. Top ranking results show that the lexicon-driven search engine is highly effective for interactive video retrieval. |
 | Cees G M Snoek, Marcel Worring, Jan-Mark Geusebroek, Dennis C Koelma, Frank J Seinstra, Arnold W M Smeulders: The Semantic Pathfinder for Generic News Video Indexing. In: ICME, pp. 1469–1472, Toronto, Canada, 2006. @inproceedings{SnoekICME06,
title = {The Semantic Pathfinder for Generic News Video Indexing},
author = {Cees G M Snoek and Marcel Worring and Jan-Mark Geusebroek and Dennis C Koelma and Frank J Seinstra and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-pathfinder-icme2006.pdf},
year = {2006},
date = {2006-07-01},
booktitle = {ICME},
pages = {1469--1472},
address = {Toronto, Canada},
abstract = {This paper presents the semantic pathfinder architecture for generic indexing of video archives. The pathfinder automatically extracts semantic concepts from video based on the exploration of different paths through three consecutive analysis steps, closely linked to the video production process, namely: content analysis, style analysis, and context analysis. The virtue of the semantic pathfinder is its learned ability to find a best path of analysis steps on a per-concept basis. To show the generality of this indexing approach we develop detectors for a lexicon of 32 concepts and we evaluate the semantic pathfinder against the 2004 NIST TRECVID video retrieval benchmark, using a news archive of 64 hours. Top ranking performance indicates the merit of the semantic pathfinder.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents the semantic pathfinder architecture for generic indexing of video archives. The pathfinder automatically extracts semantic concepts from video based on the exploration of different paths through three consecutive analysis steps, closely linked to the video production process, namely: content analysis, style analysis, and context analysis. The virtue of the semantic pathfinder is its learned ability to find a best path of analysis steps on a per-concept basis. To show the generality of this indexing approach we develop detectors for a lexicon of 32 concepts and we evaluate the semantic pathfinder against the 2004 NIST TRECVID video retrieval benchmark, using a news archive of 64 hours. Top ranking performance indicates the merit of the semantic pathfinder. |
 | Marcel Worring, Cees G M Snoek, Ork de Rooij, Giang P Nguyen, Richard van Balen, Dennis C Koelma: MediaMill: Advanced Browsing in News Video Archives. In: Sundaram, H; others, (Ed.): CIVR, pp. 533–536, Springer-Verlag, Heidelberg, Germany, 2006. @inproceedings{WorringCIVR06,
title = {MediaMill: Advanced Browsing in News Video Archives},
author = {Marcel Worring and Cees G M Snoek and Ork de Rooij and Giang P Nguyen and Richard van Balen and Dennis C Koelma},
editor = {H Sundaram and others},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/worring-demo-civr2006.pdf},
year = {2006},
date = {2006-07-01},
booktitle = {CIVR},
volume = {4071},
pages = {533--536},
publisher = {Springer-Verlag},
address = {Heidelberg, Germany},
series = {LNCS},
abstract = {In this paper we present our Mediamill video search engine. The basis for the engine is a semantic indexing process which derives a lexicon of 101 concepts. To support the user in navigating the collection, the system defines a visual similarity space, a semantic similarity space, a semantic thread space, and browsers to explore them. It extends upon [1] with improved browsing tools. The search system is evaluated within the TRECVID benchmark [2]. We obtain a top-3 result for 19 out of 24 search topics. In addition, we obtain the highest mean average precision of all search participants.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we present our Mediamill video search engine. The basis for the engine is a semantic indexing process which derives a lexicon of 101 concepts. To support the user in navigating the collection, the system defines a visual similarity space, a semantic similarity space, a semantic thread space, and browsers to explore them. It extends upon [1] with improved browsing tools. The search system is evaluated within the TRECVID benchmark [2]. We obtain a top-3 result for 19 out of 24 search topics. In addition, we obtain the highest mean average precision of all search participants. |
 | Jan C van Gemert, Jan-Mark Geusebroek, Cor J Veenman, Cees G M Snoek, Arnold W M Smeulders: Robust Scene Categorization by Learning Image Statistics in Context. In: CVPR workshop, pp. 105–112, New York, USA, 2006. @inproceedings{GemertSLAM06,
title = {Robust Scene Categorization by Learning Image Statistics in Context},
author = {Jan C van Gemert and Jan-Mark Geusebroek and Cor J Veenman and Cees G M Snoek and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/gemert-scene-slam2006.pdf},
year = {2006},
date = {2006-06-01},
booktitle = {CVPR workshop},
pages = {105--112},
address = {New York, USA},
abstract = {We present a generic and robust approach for scene categorization. A complex scene is described by proto-concepts like vegetation, water, fire, sky etc. These proto-concepts are represented by low level features, where we use natural images statistics to compactly represent color invariant texture information by a Weibull distribution. We introduce the notion of contextures which preserve the context of textures in a visual scene with an occurrence histogram (context) of similarities to proto-concept descriptors (texture). In contrast to a codebook approach, we use the similarity to all vocabulary elements to generalize beyond the code words. Visual descriptors are attained by combining different types of contexts with different texture parameters. The visual scene descriptors are generalized to visual categories by training a support vector machine. We evaluate our approach on 3 different datasets: 1) 50 categories for the TRECVID video dataset; 2) the Caltech 101-object images; 3) 89 categories being the intersection of the Corel photo stock with the Art Explosion photo stock. Results show that our approach is robust over different datasets, while maintaining competitive performance.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We present a generic and robust approach for scene categorization. A complex scene is described by proto-concepts like vegetation, water, fire, sky etc. These proto-concepts are represented by low level features, where we use natural images statistics to compactly represent color invariant texture information by a Weibull distribution. We introduce the notion of contextures which preserve the context of textures in a visual scene with an occurrence histogram (context) of similarities to proto-concept descriptors (texture). In contrast to a codebook approach, we use the similarity to all vocabulary elements to generalize beyond the code words. Visual descriptors are attained by combining different types of contexts with different texture parameters. The visual scene descriptors are generalized to visual categories by training a support vector machine. We evaluate our approach on 3 different datasets: 1) 50 categories for the TRECVID video dataset; 2) the Caltech 101-object images; 3) 89 categories being the intersection of the Corel photo stock with the Art Explosion photo stock. Results show that our approach is robust over different datasets, while maintaining competitive performance. |
 | Cees G M Snoek, Marcel Worring, Alexander G Hauptmann: Learning Rich Semantics from News Video Archives by Style Analysis. In: ACM Transactions on Multimedia Computing, Communications and Applications, vol. 2, no. 2, pp. 91–108, 2006. @article{SnoekTOMCCAP06,
title = {Learning Rich Semantics from News Video Archives by Style Analysis},
author = {Cees G M Snoek and Marcel Worring and Alexander G Hauptmann},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-style-tomccap.pdf},
year = {2006},
date = {2006-05-01},
journal = {ACM Transactions on Multimedia Computing, Communications and Applications},
volume = {2},
number = {2},
pages = {91--108},
abstract = {We propose a generic and robust framework for news video indexing, which we found on a broadcast news production model. We identify within this model four production phases, each providing useful metadata for annotation. In contrast to semi-automatic indexing approaches, which exploit this information at production time, we adhere to an automatic data-driven approach. To that end, we analyze a digital news video using a separate set of multimodal detectors for each production phase. By combining the resulting production-derived features into a statistical classifier ensemble, the framework facilitates robust classification of several rich semantic concepts in news video; rich meaning that concepts share many similarities in their production process. Experiments on an archive of 120 hours of news video, from the 2003 TRECVID benchmark, show that a combined analysis of production phases yields the best results. In addition, we demonstrate that the accuracy of the proposed style analysis framework
for classification of several rich semantic concepts is state-of-the-art.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We propose a generic and robust framework for news video indexing, which we found on a broadcast news production model. We identify within this model four production phases, each providing useful metadata for annotation. In contrast to semi-automatic indexing approaches, which exploit this information at production time, we adhere to an automatic data-driven approach. To that end, we analyze a digital news video using a separate set of multimodal detectors for each production phase. By combining the resulting production-derived features into a statistical classifier ensemble, the framework facilitates robust classification of several rich semantic concepts in news video; rich meaning that concepts share many similarities in their production process. Experiments on an archive of 120 hours of news video, from the 2003 TRECVID benchmark, show that a combined analysis of production phases yields the best results. In addition, we demonstrate that the accuracy of the proposed style analysis framework
for classification of several rich semantic concepts is state-of-the-art. |
| Arnold W M Smeulders, Jan C van Gemert, Jan-Mark Geusebroek, Cees G M Snoek, Marcel Worring: Browsing for the National Dutch Video Archive. In: ISCCSP, Marrakech, Morocco, 2006. @inproceedings{SmeuldersISCCSP06,
title = {Browsing for the National Dutch Video Archive},
author = {Arnold W M Smeulders and Jan C van Gemert and Jan-Mark Geusebroek and Cees G M Snoek and Marcel Worring},
year = {2006},
date = {2006-03-01},
booktitle = {ISCCSP},
address = {Marrakech, Morocco},
abstract = {Pictures have always been a prime carrier of Dutch culture. But pictures take a new form. We live in times of broad- and narrowcasting through Internet, of passive and active viewers, of direct or delayed broadcast, and of digital pictures being delivered in the museum or at home. At the same time, the picture and television archives turn digital. Archives are going to be swamped with information requests unless they swiftly adapt to partially automatic annotation and digital retrieval. Our aim is to provide faster and more complete access to picture archives by digital analysis. Our approach consists of a multi-media analysis of features of pictures in tandem with the language that describes those pictures, under the guidance of a visual ontology. The general scientific paradigm we address is the detection of directly observables fused into semantic features learned from large repositories of digital video. We use invariant, natural-image statisticsbased contextual feature sets for capturing the concepts of images and integrate that as early as possible with text. The system consists of a large for science yet small for practice set of visual concepts permitting the retrieval of semantically formulated queries. We will demonstrate a PC-based, off-line trained state of the art system for browsing broadcast news-archives.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Pictures have always been a prime carrier of Dutch culture. But pictures take a new form. We live in times of broad- and narrowcasting through Internet, of passive and active viewers, of direct or delayed broadcast, and of digital pictures being delivered in the museum or at home. At the same time, the picture and television archives turn digital. Archives are going to be swamped with information requests unless they swiftly adapt to partially automatic annotation and digital retrieval. Our aim is to provide faster and more complete access to picture archives by digital analysis. Our approach consists of a multi-media analysis of features of pictures in tandem with the language that describes those pictures, under the guidance of a visual ontology. The general scientific paradigm we address is the detection of directly observables fused into semantic features learned from large repositories of digital video. We use invariant, natural-image statisticsbased contextual feature sets for capturing the concepts of images and integrate that as early as possible with text. The system consists of a large for science yet small for practice set of visual concepts permitting the retrieval of semantically formulated queries. We will demonstrate a PC-based, off-line trained state of the art system for browsing broadcast news-archives. |
| Cees G M Snoek, Marcel Worring, Dennis C Koelma, Arnold W M Smeulders: Learned Lexicon-driven Interactive Video Retrieval (reprint). In: Wilkinson, M; Pluim, J (Ed.): Fifth Quinquennial Review 2001-2006, Dutch Society for Pattern Recognition and Image Processing, 2006. @incollection{SnoekNVPHBV06,
title = {Learned Lexicon-driven Interactive Video Retrieval (reprint)},
author = {Cees G M Snoek and Marcel Worring and Dennis C Koelma and Arnold W M Smeulders},
editor = {M Wilkinson and J Pluim},
year = {2006},
date = {2006-01-01},
booktitle = {Fifth Quinquennial Review 2001-2006},
publisher = {Dutch Society for Pattern Recognition and Image Processing},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
|
2005
|
 | Cees G M Snoek, Marcel Worring, Arnold W M Smeulders: Early versus Late Fusion in Semantic Video Analysis. In: MM, pp. 399–402, Singapore, 2005, (SIGMM test of time paper award, honourable mention). @inproceedings{SnoekACM05a,
title = {Early versus Late Fusion in Semantic Video Analysis},
author = {Cees G M Snoek and Marcel Worring and Arnold W M Smeulders},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-earlylate-acm2005.pdf},
year = {2005},
date = {2005-11-01},
booktitle = {MM},
pages = {399--402},
address = {Singapore},
abstract = {Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant.},
note = {SIGMM test of time paper award, honourable mention},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant. |
 | Cees G M Snoek, Marcel Worring, Jan van Gemert, Jan-Mark Geusebroek, Dennis Koelma, Giang P Nguyen, Ork de Rooij, Frank Seinstra: MediaMill: Exploring News Video Archives based on Learned Semantics. In: MM, pp. 225–226, Singapore, 2005, (Best technical demonstration award). @inproceedings{SnoekACM05b,
title = {MediaMill: Exploring News Video Archives based on Learned Semantics},
author = {Cees G M Snoek and Marcel Worring and Jan van Gemert and Jan-Mark Geusebroek and Dennis Koelma and Giang P Nguyen and Ork de Rooij and Frank Seinstra},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-demo-acm2005.pdf},
year = {2005},
date = {2005-11-01},
booktitle = {MM},
pages = {225--226},
address = {Singapore},
abstract = {In this technical demonstration we showcase the MediaMill system. A search engine that facilitates access to news video archives at a semantic level. The core of the system is an unprecedented lexicon of 100 automatically detected semantic concepts. Based on this lexicon we demonstrate how users can obtain highly relevant retrieval results using query-by-concept. In addition, we show how the lexicon of concepts can be exploited for novel applications using advanced semantic visualizations. Several aspects of the MediaMill system are evaluated as part of our TRECVID 2005 efforts.},
note = {Best technical demonstration award},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this technical demonstration we showcase the MediaMill system. A search engine that facilitates access to news video archives at a semantic level. The core of the system is an unprecedented lexicon of 100 automatically detected semantic concepts. Based on this lexicon we demonstrate how users can obtain highly relevant retrieval results using query-by-concept. In addition, we show how the lexicon of concepts can be exploited for novel applications using advanced semantic visualizations. Several aspects of the MediaMill system are evaluated as part of our TRECVID 2005 efforts. |
 | Cees G M Snoek, Jan C van Gemert, Jan-Mark Geusebroek, Bouke Huurnink, Dennis C Koelma, Giang P Nguyen, Ork de Rooij, Frank J Seinstra, Arnold W M Smeulders, Cor J Veenman, Marcel Worring: The MediaMill TRECVID 2005 Semantic Video Search Engine. In: TRECVID, Gaithersburg, USA, 2005. @inproceedings{SnoekTRECVID05,
title = {The MediaMill TRECVID 2005 Semantic Video Search Engine},
author = {Cees G M Snoek and Jan C van Gemert and Jan-Mark Geusebroek and Bouke Huurnink and Dennis C Koelma and Giang P Nguyen and Ork de Rooij and Frank J Seinstra and Arnold W M Smeulders and Cor J Veenman and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/UvA-MM_TRECVID2005.pdf},
year = {2005},
date = {2005-11-01},
booktitle = {TRECVID},
address = {Gaithersburg, USA},
abstract = {In this paper we describe our TRECVID 2005 experiments. The UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A_CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information. Experiments indicate that average precision results increase drastically, especially for pan (+51%) and tilt (+28%). For concept detection we propose a generic approach using our semantic pathfinder. Most important novelty compared to last years system is the improved visual analysis using proto-concepts based on Wiccest features. In addition, the path selection mechanism was extended. Based on the semantic pathfinder architecture we are currently able to detect an unprecedented lexicon of 101 semantic concepts in a generic fashion. We performed a large set of experiments (runid: B_vA). The results show that an optimal strategy for generic multimedia analysis is one that learns from the training set on a per-concept basis which tactic to follow. Experiments also indicate that our visual analysis approach is highly promising. The lexicon of 101 semantic concepts forms the basis for our search experiments (runid: B_2_A-MM). We participated in automatic, manual (using only visual information), and interactive search. The lexicon-driven retrieval paradigm aids substantially in all search tasks. When coupled with interaction, exploiting several novel browsing schemes of our semantic video search engine, results are excellent. We obtain a top-3 result for 19 out of 24 search topics. In addition, we obtain the highest mean average precision of all search participants. We exploited the technology developed for the above tasks to explore the BBC rushes. Most intriguing result is that from the lexicon of 101 visual-only models trained for news data 25 concepts perform reasonably well on BBC data also.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we describe our TRECVID 2005 experiments. The UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A_CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information. Experiments indicate that average precision results increase drastically, especially for pan (+51%) and tilt (+28%). For concept detection we propose a generic approach using our semantic pathfinder. Most important novelty compared to last years system is the improved visual analysis using proto-concepts based on Wiccest features. In addition, the path selection mechanism was extended. Based on the semantic pathfinder architecture we are currently able to detect an unprecedented lexicon of 101 semantic concepts in a generic fashion. We performed a large set of experiments (runid: B_vA). The results show that an optimal strategy for generic multimedia analysis is one that learns from the training set on a per-concept basis which tactic to follow. Experiments also indicate that our visual analysis approach is highly promising. The lexicon of 101 semantic concepts forms the basis for our search experiments (runid: B_2_A-MM). We participated in automatic, manual (using only visual information), and interactive search. The lexicon-driven retrieval paradigm aids substantially in all search tasks. When coupled with interaction, exploiting several novel browsing schemes of our semantic video search engine, results are excellent. We obtain a top-3 result for 19 out of 24 search topics. In addition, we obtain the highest mean average precision of all search participants. We exploited the technology developed for the above tasks to explore the BBC rushes. Most intriguing result is that from the lexicon of 101 visual-only models trained for news data 25 concepts perform reasonably well on BBC data also. |
 | Cees G M Snoek: The Authoring Metaphor to Machine Understanding of Multimedia. Universiteit van Amsterdam, 2005, ISBN: 9057761432. @phdthesis{SnoekPHD05,
title = {The Authoring Metaphor to Machine Understanding of Multimedia},
author = {Cees G M Snoek},
url = {https://pure.uva.nl/ws/files/3927507/38070_snoek_thesis.pdf},
isbn = {9057761432},
year = {2005},
date = {2005-10-01},
school = {Universiteit van Amsterdam},
abstract = {This thesis makes a contribution to the field of multimedia understanding. Where our ultimate aim is to structure the digital multimedia chaos by bridging the semantic gap between computable data features on one end and the semantic interpretation of the data by a user on the other end. We distinguish between produced and non-produced multimedia or video documents. We depart from the view that a produced video is the result of an authoring-driven production process. This authoring process serves as a metaphor for machine-driven understanding. We present a step-by-step extrapolation of this authoring metaphor for automatic multimedia understanding. While doing so, we cover in this thesis an extensive overview of the field, a theoretical foundation for authoring-driven multimedia understanding, state-of-the-art benchmark validation, and practical semantic video retrieval applications.},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}
This thesis makes a contribution to the field of multimedia understanding. Where our ultimate aim is to structure the digital multimedia chaos by bridging the semantic gap between computable data features on one end and the semantic interpretation of the data by a user on the other end. We distinguish between produced and non-produced multimedia or video documents. We depart from the view that a produced video is the result of an authoring-driven production process. This authoring process serves as a metaphor for machine-driven understanding. We present a step-by-step extrapolation of this authoring metaphor for automatic multimedia understanding. While doing so, we cover in this thesis an extensive overview of the field, a theoretical foundation for authoring-driven multimedia understanding, state-of-the-art benchmark validation, and practical semantic video retrieval applications. |
 | Cees G M Snoek, Marcel Worring: Multimedia Event-Based Video Indexing using Time Intervals. In: IEEE Transactions on Multimedia, vol. 7, no. 4, pp. 638–647, 2005. @article{SnoekTMM05,
title = {Multimedia Event-Based Video Indexing using Time Intervals},
author = {Cees G M Snoek and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-time-mm.pdf},
year = {2005},
date = {2005-08-01},
journal = {IEEE Transactions on Multimedia},
volume = {7},
number = {4},
pages = {638--647},
abstract = {We propose the Time Interval Multimedia Event (TIME) framework as a robust approach for classification of semantic events in multimodal video documents. The representation used in TIME extends the Allen time relations and allows for proper inclusion of context and synchronization of the heterogeneous information sources involved in multimodal video analysis. To demonstrate the viability of our approach, it was evaluated on the domains of soccer and news broadcasts. For automatic classification of semantic events, we compare three different machine learning techniques, i.c. C4.5 decision tree, Maximum Entropy, and Support Vector Machine. The results show that semantic video indexing results significantly benefit from using the TIME framework.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We propose the Time Interval Multimedia Event (TIME) framework as a robust approach for classification of semantic events in multimodal video documents. The representation used in TIME extends the Allen time relations and allows for proper inclusion of context and synchronization of the heterogeneous information sources involved in multimodal video analysis. To demonstrate the viability of our approach, it was evaluated on the domains of soccer and news broadcasts. For automatic classification of semantic events, we compare three different machine learning techniques, i.c. C4.5 decision tree, Maximum Entropy, and Support Vector Machine. The results show that semantic video indexing results significantly benefit from using the TIME framework. |
 | Cees G M Snoek, Marcel Worring, Jan-Mark Geusebroek, Dennis C Koelma, Frank J Seinstra: On the Surplus Value of Semantic Video Analysis Beyond the Key Frame. In: ICME, Amsterdam, The Netherlands, 2005. @inproceedings{SnoekICME05a,
title = {On the Surplus Value of Semantic Video Analysis Beyond the Key Frame},
author = {Cees G M Snoek and Marcel Worring and Jan-Mark Geusebroek and Dennis C Koelma and Frank J Seinstra},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-surplus-icme2005.pdf},
year = {2005},
date = {2005-07-01},
booktitle = {ICME},
address = {Amsterdam, The Netherlands},
abstract = {Typical semantic video analysis methods aim for classification of camera shots based on extracted features from a single key frame only. In this paper, we sketch a video analysis scenario and evaluate the benefit of analysis beyond the key frame for semantic concept detection performance. We developed detectors for a lexicon of 26 concepts, and evaluated their performance on 120 hours of video data. Results show that, on average, detection performance can increase with almost 40% when the analysis method takes more visual content into account.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Typical semantic video analysis methods aim for classification of camera shots based on extracted features from a single key frame only. In this paper, we sketch a video analysis scenario and evaluate the benefit of analysis beyond the key frame for semantic concept detection performance. We developed detectors for a lexicon of 26 concepts, and evaluated their performance on 120 hours of video data. Results show that, on average, detection performance can increase with almost 40% when the analysis method takes more visual content into account. |
 | Cees G M Snoek, Dennis Koelma, Jeroen van Rest, Nellie Schipper, Frank J Seinstra, Andrew Thean, Marcel Worring: MediaMill: Searching Multimedia Archives based on Learned Semantics. In: ICME, Amsterdam, The Netherlands, 2005. @inproceedings{SnoekICME05b,
title = {MediaMill: Searching Multimedia Archives based on Learned Semantics},
author = {Cees G M Snoek and Dennis Koelma and Jeroen van Rest and Nellie Schipper and Frank J Seinstra and Andrew Thean and Marcel Worring},
url = {http://isis-data.science.uva.nl/cgmsnoek/pub/snoek-mediamill-icme2005.pdf},
year = {2005},
date = {2005-07-01},
booktitle = {ICME},
address = {Amsterdam, The Netherlands},
abstract = {Video is about to conquer the Internet. Real-time delivery of video content is technically possible to any desktop and mobile device, even with modest connections. The main problem hampering massive (re)usage of video content today is the lack of effective content based tools that provide semantic access. In this contribution we discuss systems for both video analysis and video retrieval that facilitate semantic access to video sources. Both systems were evaluated in the 2004 TRECVID benchmark as top performers in their task.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Video is about to conquer the Internet. Real-time delivery of video content is technically possible to any desktop and mobile device, even with modest connections. The main problem hampering massive (re)usage of video content today is the lack of effective content based tools that provide semantic access. In this contribution we discuss systems for both video analysis and video retrieval that facilitate semantic access to video sources. Both systems were evaluated in the 2004 TRECVID benchmark as top performers in their task. |