nwt121

More media attention for our research! This time in the form of a report on next-generation video search in the science-oriented magazine Natuurwetenschap & Techniek / Veen Magazines. Reporter Desiree Hoving managed to summarize the highlights, challenges, and limitations of video retrieval research in a very readable article, covering a lot of research performed at the University of Amsterdam. For those of you who are familar with the Dutch language, the report is available online here.


Validating the Detection of Everyday Concepts in Visual Lifelogs

The paper entitled “Validating the Detection of Everyday Concepts in Visual Lifelogs” by Daragh Byrne, Aiden R. Doherty, Cees G. M. Snoek, Gareth J. F. Jones, and Alan F. Smeaton, which will appear in the forthcoming SAMT 2008 conference, is available online now. It reports on experiments peformed on Microsoft SenseCam data. The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-today activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and highlevel semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging.

nl_header

After 9 years of research on broadcast video analysis and retrieval, I ‘finally’ made my debut on national TV. Yesterday, VARA Nieuwslicht had an item on using human cycles, in the form of computer games and social tagged content, for helping machines understand multimedia. Indeed, the topic of my granted VENI proposal. It was funny to see that the crew spent 2 hours in my office last week, but in the end only 1 minute remained. Unfortunately the demo of our video search engine for the Dutch television archive Beeld en Geluid was not broadcasted, maybe next time?

Our TRECVID 2008 workshop paper entitled “The MediaMill TRECVID 2008 Semantic Video Search Engine” is available online. In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors. To that end, our concept detection experiments emphasize in particular the role of sampling, the value of color invariant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search experiments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and active learning mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2008 edition of the TRECVID benchmark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept detection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most important lessons at the end of this paper.

mediamill-trecvid2008scheme

Two of my colleagues have just released, independently, software for computation of color descriptors. Jan-Mark Geusebroek released Color Sift, based on a recently accepted CVIU paper, and Koen van de Sande released the ColorDescriptor software, related to our 2008 CVPR and CIVR papers. These color descriptor have proven to be highly effective under many circumstances. See also our performance in the TRECVID 2008 video retrieval benchmark, and our winning position in the PASCAL VOC 2008 object classification competition. This software might be useful for many people working on visual concept detection, amongst others.

Back home after a week of multimedia. In my opinion, the prevalent themes of ACM Multimedia 2008 and the associated conference on multimedia information retrieval (MIR) seemed to be social multimedia retrieval (with a lot of Flickr and geo-tag related work) and affective multimedia retrieval (with a lot of music/video analysis using valence/arousal curves). Overall, the quality of the conference was quite high, certainly a worthwhile visit. I enjoyed especially the interaction with friends, the demo sessions, and the keynotes at MIR by Tom Huang, Ramesh Jain, and Shih-Fu Chang. Not surprisingly, they all seemed to agree that the semantic gap is still the major scientific problem to solve. Interestingly, however, they all three believe the solution might come from users interacting with multimedia on the Internet. I fully subscribe to their conclusion.