neighbor_voting_idea

The forthcoming IEEE Transactions on Multimedia paper by Xirong Li, Cees Snoek, and Marcel Worring, entitled “Learning Social Tag Relevance by Neighbor Voting” is available online now. Social image analysis and retrieval is important for helping people organize and access the increasing amount of user-tagged multimedia. Since user tagging is known to be uncontrolled, ambiguous, and overly personalized, a fundamental problem is how to interpret the relevance of a user-contributed tag with respect to the visual content the tag is describing. Intuitively, if different persons label visually similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose in this paper a neighbor voting algorithm which accurately and efficiently learns tag relevance by accumulating votes from visual neighbors. Under a set of well defined and realistic assumptions, we prove that our algorithm is a good tag relevance measurement for both image ranking and tag ranking. Three experiments on 3.5 million Flickr photos demonstrate the general applicability of our algorithm in both social image retrieval and image tag suggestion. Our tag relevance learning algorithm substantially improves upon baselines for all the experiments. The results suggest that the proposed algorithm is promising for real-world applications.

Farmacia In Linea Dal Canada. Prezzi A Basso Costo. Assistenza Web 24 Ore Su 24. Qualcuno In Particolare Vuole Diventare Genuino A Bologna A Casa Sotto go now Naturalmente! Con Grande Sconto. Fantastici Sconti Speciali Nei Fine Settimana.
Today I gave a talk at our neighbor’s institute: CWI, the national research center for mathematics and computer science in the Netherlands. It was my first talk on a new topic, coined for the moment as socio-video search, where we aim to connect machine-based multimedia tagging, with user-tagging of multimedia. Highlighting, in particular, recent papers by Arjan Setz, Xirong Li, Daragh Byrne, and Aiden Doherty. I would like to thank the CWI-colleague researchers for having me, and also for the lively discussion on the content, which I take as a positive sign. For those who are interested, the slides are available here. More comments welcome!

xirong-annotating-images

The paper entitled “Annotating Images by Harnessing Worldwide User-Tagged Photos” by Xirong Li, Cees Snoek, and Marcel Worring, which will appear in the proceedings of the forthcoming ICASSP2009 conference, is available online now. Automatic image tagging is important yet challenging due to the semantic gap and the lack of learning examples to model a tag’s visual diversity. Meanwhile, social user tagging is creating rich multimedia content on the web. In this paper, we propose to combine the two tagging approaches in a search-based framework. For an unlabeled image, we first retrieve its visual neighbors from a large user-tagged image database. We then select relevant tags from the result images to annotate the unlabeled image. To tackle the unreliability and sparsity of user tagging, we introduce a joint-modality tag relevance estimation method which efficiently addresses both textual and visual clues. Experiments on 1.5 million Flickr photos and 10 000 Corel images verify the proposed method.

nwt121

More media attention for our research! This time in the form of a report on next-generation video search in the science-oriented magazine Natuurwetenschap & Techniek / Veen Magazines. Reporter Desiree Hoving managed to summarize the highlights, challenges, and limitations of video retrieval research in a very readable article, covering a lot of research performed at the University of Amsterdam. For those of you who are familar with the Dutch language, the report is available online here.


Validating the Detection of Everyday Concepts in Visual Lifelogs

The paper entitled “Validating the Detection of Everyday Concepts in Visual Lifelogs” by Daragh Byrne, Aiden R. Doherty, Cees G. M. Snoek, Gareth J. F. Jones, and Alan F. Smeaton, which will appear in the forthcoming SAMT 2008 conference, is available online now. It reports on experiments peformed on Microsoft SenseCam data. The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-today activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and highlevel semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging.

nl_header

After 9 years of research on broadcast video analysis and retrieval, I ‘finally’ made my debut on national TV. Yesterday, VARA Nieuwslicht had an item on using human cycles, in the form of computer games and social tagged content, for helping machines understand multimedia. Indeed, the topic of my granted VENI proposal. It was funny to see that the crew spent 2 hours in my office last week, but in the end only 1 minute remained. Unfortunately the demo of our video search engine for the Dutch television archive Beeld en Geluid was not broadcasted, maybe next time?

Our TRECVID 2008 workshop paper entitled “The MediaMill TRECVID 2008 Semantic Video Search Engine” is available online. In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors. To that end, our concept detection experiments emphasize in particular the role of sampling, the value of color invariant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors necessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search experiments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval results further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and active learning mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2008 edition of the TRECVID benchmark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept detection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most important lessons at the end of this paper.

mediamill-trecvid2008scheme

Back home after a week of multimedia. In my opinion, the prevalent themes of ACM Multimedia 2008 and the associated conference on multimedia information retrieval (MIR) seemed to be social multimedia retrieval (with a lot of Flickr and geo-tag related work) and affective multimedia retrieval (with a lot of music/video analysis using valence/arousal curves). Overall, the quality of the conference was quite high, certainly a worthwhile visit. I enjoyed especially the interaction with friends, the demo sessions, and the keynotes at MIR by Tom Huang, Ramesh Jain, and Shih-Fu Chang. Not surprisingly, they all seemed to agree that the semantic gap is still the major scientific problem to solve. Interestingly, however, they all three believe the solution might come from users interacting with multimedia on the Internet. I fully subscribe to their conclusion.

This week I am attending the ACM Multimedia conference in Vancouver. The ‘city of glass’ is proclaimed to be one of the best places to live in the world. This could be due to the many possibilities for outdoor activities in the direct viccinity of the city. One such activity is hiking, which we did today. Together with Xirong, Marcel, and Bouke we went to Lake Garibaldi. It took us some time to get there, but it was certainly worthwhile. The view was magnificent. Luckily the conference starts tomorrow, so that we have some time to recover. I wonder whether there are some interesting developments in the multimedia field to report the coming days.


The University of Amsterdam is on the lookout for a professor specializing in historic Dutch multimedia, or someone who studies the Dutch media developments in radio, television, and quite likely the Internet, over the past century. The vacancy is supported by the Netherlands Institute for Sound and Vision, not accidentally a close collaborator of our group also. I wonder if our semantic video search engine would be of any use for the new professor and her/his team. To be continued?