Our TRECVID 2008 workshop paper entitled “The MediaMill TRECVID 2008 Semantic Video Search Engine” is available online. In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors. To that end, our concept detection experiments emphasize in particular the role of sampling, the value of color invariant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search experiments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and active learning mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2008 edition of the TRECVID benchmark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept detection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most important lessons at the end of this paper.

mediamill-trecvid2008scheme

Two of my colleagues have just released, independently, software for computation of color descriptors. Jan-Mark Geusebroek released Color Sift, based on a recently accepted CVIU paper, and Koen van de Sande released the ColorDescriptor software, related to our 2008 CVPR and CIVR papers. These color descriptor have proven to be highly effective under many circumstances. See also our performance in the TRECVID 2008 video retrieval benchmark, and our winning position in the PASCAL VOC 2008 object classification competition. This software might be useful for many people working on visual concept detection, amongst others.

Back home after a week of multimedia. In my opinion, the prevalent themes of ACM Multimedia 2008 and the associated conference on multimedia information retrieval (MIR) seemed to be social multimedia retrieval (with a lot of Flickr and geo-tag related work) and affective multimedia retrieval (with a lot of music/video analysis using valence/arousal curves). Overall, the quality of the conference was quite high, certainly a worthwhile visit. I enjoyed especially the interaction with friends, the demo sessions, and the keynotes at MIR by Tom Huang, Ramesh Jain, and Shih-Fu Chang. Not surprisingly, they all seemed to agree that the semantic gap is still the major scientific problem to solve. Interestingly, however, they all three believe the solution might come from users interacting with multimedia on the Internet. I fully subscribe to their conclusion.

This week I am attending the ACM Multimedia conference in Vancouver. The ‘city of glass’ is proclaimed to be one of the best places to live in the world. This could be due to the many possibilities for outdoor activities in the direct viccinity of the city. One such activity is hiking, which we did today. Together with Xirong, Marcel, and Bouke we went to Lake Garibaldi. It took us some time to get there, but it was certainly worthwhile. The view was magnificent. Luckily the conference starts tomorrow, so that we have some time to recover. I wonder whether there are some interesting developments in the multimedia field to report the coming days.


The University of Amsterdam is on the lookout for a professor specializing in historic Dutch multimedia, or someone who studies the Dutch media developments in radio, television, and quite likely the Internet, over the past century. The vacancy is supported by the Netherlands Institute for Sound and Vision, not accidentally a close collaborator of our group also. I wonder if our semantic video search engine would be of any use for the new professor and her/his team. To be continued?

N’importe qui peut acheter des médicaments maniaco-mélancoliques sans prix dans la pharmacie régionale de votre propriété sous Lamictal ouvertement. Et obtenez le meilleur livre pour manic depression drugs. Cette pharmacie fournit des pilules de grande qualité. Dyspeptiques phénomènes, des selles et une diminution de l’activité enzymatique du foie sont possibles dans le tractus gastro-intestinal Un manque d’efficacité de la dose de Lamycatalum peut provoquer de rares cas de rhabdomyolyse, coagulation intravasculaire de cellules de sang, le syndrome de plusieurs défaillances d’organes Lamictal Les Interactions Médicamenteuses Compétitif du métabolisme de la lamotrigine avec les enzymes hépatiques, ralentit son assimilation.
The ACM MIR 2008 paper entitled “Learning Tag Relevance by Neighbor Voting for Social Image Retrieval” by Xirong Li, Cees Snoek, and Marcel Worring is available online. Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably interpret the relevance of a tag with respect to the visual content it is describing. Intuitively, if different persons label similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose a novel algorithm that scalably and reliably learns tag relevance by accumulating votes from visually similar neighbors. Further, treated as tag frequency, learned tag relevance is seamlessly embedded into current tag-based social image retrieval paradigms. Preliminary experiments on one million Flickr images demonstrate the potential of the proposed algorithm. Overall comparisons for both single-word queries and multiple-word queries show substantial improvement over the baseline by learning and using tag relevance. Specifically, compared with the baseline using the original tags, on average, retrieval using improved tags increases mean average precision by 24%, from 0.54 to 0.67. Moreover, simulated experiments indicate that performance can be improved further by scaling up the amount of images used in the proposed neighbor voting algorithm.

15. August 2008 · 1 comment · Categories: Science

Google (and YouTube for that matter)  searches video using closed captions, user-provided tags, or text embedded in web pages. Based on visual evidence obtained from Google tech talks by Alex Hauptmann and John Smith I have reason to believe that concept-based video retrieval, i.e. searching based on the visual content, is receiving more and more attention at Google. The video frame on the left shows the audience at the TechTalk by Alex on March 1st 2006, while the frame on the right shows the audience at John’s talk on October 25, 2007. Note the audience increase. I wonder how many people will attend the next Google TechTalk on this topic ;)

Video Search at Google

Concept-based video search receives more an more attention at Google. On the left the number of attendees during a talk on the topic in early 2006, on the right the number of attendees during a talk in late 2007.

NWO

Good news, my research proposal entitled: SEARCHER: Substituting Experts by Amateurs for Concept-based Video Retrieval was awarded a prestigious VENI Innovational Research Incentives Scheme grant by the Dutch organization for Scientific Research (NWO). See the official press release. I will be working on this project for at least the coming three years :)

14. July 2008 · 1 comment · Categories: Science · Tags:

VideOlympics 2008

We had another VideOlympics at CIVR 2008, check the pictures at Flickr. Credits for the pictures go to Robin Aly and Simon Laing, and Ork de Rooij for post-processing. Note the involvement of the audience and the guys playing (i)Bingo on the left ;)

The paper Balancing Thread Based Navigation for Targeted Video Search by Ork de Rooij, Cees Snoek, and Marcel Worring is available online now. Various query methods for video search exist. Because of the semantic gap each method has its limitations. We argue that for effective retrieval query methods need to be combined at retrieval time. However, switching query methods often involves a change in query and browsing interface, which puts a heavy burden on the user. In this paper, we propose a novel method for fast and effective search through large video collections by embedding multiple query methods into a single browsing environment. To that end we introduced the notion of query threads, which contain a shot-based ranking of the video collection according to some feature-based similarity measure. On top of these threads we define several thread-based visualizations, ranging from fast targeted search to very broad exploratory search, with the ForkBrowser as the balance between fast search and video space exploration. We compare the effectiveness and efficiency of the ForkBrowser with the CrossBrowser on the TRECVID 2007 interactive search task. Results show that different query methods are needed for different types of search topics, and that the ForkBrowser requires signifficantly less user interactions to achieve the same result as the CrossBrowser. In addition, both browsers rank among the best interactive retrieval systems currently available.