The Chinese Government Award for Outstanding Self-financed Students Abroad was awarded to Xirong Li.

The PhD thesis of Xirong, entitled ‘Content-Based Visual Search Learned from Social Media’, reveals the value of socially tagged images for content-based visual search. To learn from social media, Xirong proposed algorithms which automatically determine whether a tag spontaneously assigned to a picture is factually relevant with respect to the visual content. By identifying relevant tags, he has found a way to transfer noisy social data into numerous well-labelled examples. This leads to an intelligent search engine which can find unlabelled images on the Internet, a smart phone, or a laptop. The increasing availability of labelled examples also enables the search engine to answer more complex queries, e.g., finding images of horse riders on the beach. Xirong’s work opens up promising avenues for search engines that provide access to the semantics of unlabelled images, without the need for expert labelling. Xirong successfully defended his thesis on 9 March 2012 and is currently an Assistant Professor at Renmin University of China.

The Chinese Government Award for Outstanding Self-financed Students Abroad was founded by the Chinese government in 2003 with the purpose of rewarding academic excellence among  self-financed Chinese students studying overseas. Only those with outstanding performance in their PhD studies are considered by the award selection committee. Each year, approximately 500 young Chinese talents worldwide are granted the award.

The ICMR2012 paper Fusing Concept Detection and Geo Context for Visual Search by Xirong Li, Cees Snoek, Marcel Worring and Arnold Smeulders is now available. Given the proliferation of geo-tagged images, the question of how to exploit geo tags and the underlying geo context for visual search is emerging. Based on the observation that the importance of geo context varies over concepts, we propose a concept-based image search engine which fuses visual concept detection and geo context in a concept-dependent manner. Compared to individual content-based and geo-based concept detectors and their uniform combination, concept-dependent fusion shows improvements. Moreover, since the proposed search engine is trained on social-tagged images alone without the need of human interaction, it is flexible to cope with many concepts. Search experiments on 101 popular visual concepts justify the viability of the proposed solution. In particular, for 79 out of the 101 concepts, the learned weights yield improvements over the uniform weights, with a relative gain of at least 5% in terms of average precision.

The forthcoming paper All Vehicles are Cars: Subclass Preferences in Container Concepts by Daan Vreeswijk, Koen van de Sande, Cees Snoek and Arnold Smeulders is now available. This paper investigates the natural bias humans display when labeling images with a container label like vehicle or carnivore. Using three container concepts as subtree root nodes, and all available concepts between these roots and the images from the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset, we analyze the differences between the images labeled at these varying levels of abstraction and the union of their constituting leaf nodes. We find that for many container concepts, a strong preference for one or a few different constituting leaf nodes occurs. These results indicate that care is needed when using hierarchical knowledge in image classification: if the aim is to classify vehicles the way humans do, then cars and buses may be the only correct results. The paper will be presented at ICMR2012 in Hong Kong on June 6.

The forthcoming CVPR paper on Convex Reduction of High-Dimensional Kernels for Visual Classification by Efstratios Gavves, Cees Snoek and Arnold Smeulders is now available. Limiting factors of fast and effective classifiers for large sets of images are their dependence on the number of images analyzed and the dimensionality of the image representation. Considering the growing number of images as a given, we aim to reduce the image feature dimensionality in this paper. We propose reduced linear kernels that use only a portion of the dimensions to reconstruct a linear kernel. We formulate the search for these dimensions as a convex optimization problem, which can be solved efficiently. Different from existing kernel reduction methods, our reduced kernels are faster and maintain the accuracy benefits from non-linear embedding methods that mimic non-linear SVMs. We show these properties on both the Scenes and PASCAL VOC 2007 datasets. In addition, we demonstrate how our reduced kernels allow to compress Fisher vector for use with non-linear embeddings, leading to high accuracy. What is more, without using any labeled examples the selected and weighed kernel dimensions appear to correspond to visually meaningful patches in the images.

We are organizing a special session on Socio-Video Semantics at the forthcoming ACM International Conference on Multimedia Retrieval in Hong Kong.

Aims and Scope
All of a sudden video became social. In just five years, individual and mostly inactive consumers transformed into active and connected prosumers, revolutionaries even, who create, share, and comment on massive amounts of video artifacts all over the world wide web 2.0. Pronounced manifestations of social video on the Internet include industry initiatives like YouTube, Vimeo, WikiPedia, and Flickr, who manage to attract millions of users, daily. It has been predicted that soon 91 percent of Internet data will be video, where smartphones will only accelerate the unstoppable momentum. In order to make sense of the massive amounts of video content, online social platforms rely on what other people say is in the image, which is known to be ambiguous, overly personalized, and limited. Hence, the lack of semantics currently associated with online video is seriously hampering retrieval, repurposing, and usage. In contrast to social video platforms, academic video sensemaking approaches rely on an analysis of the multimedia content. Such content-driven image search is important, if only to verify what people have said is factually in the video, or for (professional) archives which cannot be shared for crowdsourcing. Despite good progress, automated multimedia analysis of video content is still seriously hampered by the semantic gap, or the lack of correspondence between the low-level audiovisual features that machines extract from video and the high-level conceptual interpretations a human gives to multimedia data. For sensemaking, exploiting the social multimedia context of video has largely been ignored in the multimedia community. This special session provides a unique opportunity for high-quality papers connecting the social context of online video to video sensemaking.

Topics of Interest
Topics of interest include (but are not limited to):

Socio-video content analysis

  • Cross-modal (social / visual / audio) socio-video content analysis
  • Contextual models for socio-video analysis
  • Novel features for socio-video analysis
  • Complex event recognition in socio-videos
  • Socio-video copy detection
  • content-aware ads optimization in socio-video sharing sites
  • efficient learning and mining algorithms for scalable socio-video content analysis

 

Socio-video browsing and retrieval

  • Socio-video retrieval systems
  • Socio-video summarization
  • Recommender techniques for socio-video browsing
  • Mobile socio-video browsing and retrieval
  • User-centered interface and system design for socio-video browsing and retrieval

 

Socio-video benchmark construction and open-source software

  • Benchmark database construction for socio-video semantic analysis
  • Ontology construction for socio-video semantic analysis
  • Open-source software libraries for socio-video analysis

 

Paper Submission
All papers must be formatted according to the ACM conference style, cannot exceed 8 pages in 9 point font, and must be submitted as pdf files.

ACM ICMR 2012 follows double-blind review. Please make sure that the names and affiliations of the authors are excluded in the document. Also remember to avoid information that may identify the authors.

Either the Microsoft Word or LaTex can be used to prepare the manuscripts (but final submission file should be in pdf format). The paper templates can be downloaded directly from the ACM ICMR 2012 website:
http://www.icmr2012.org/submission.html

Selected manuscripts will also be invited for a special issue in IEEE Transactions on Multimedia on the same topic.

Important Dates
— Paper submission deadline: January 15, 2012
— Notification of acceptance: March 15, 2012
— Camera-ready manuscript: April 5, 2012

Organizers
Cees G. M. Snoek, University of Amsterdam (Netherlands)
Yu-Gang Jiang, Fudan University (China)

Purchase And Acquire Your Medication On The Web. On-line Pharmacy Store. Get Cost-free Drugs. Everybody Would Love To Obtain Cheaped In Drug Shop At Residence Beneath webpage By Natural Means. And Get The Very Best Provides For yasmin. You Can Do It Also If You Don`t Have A Doctor’s Prescription.

The paper Visual Synonyms for Landmark Image Retrieval by Efstratios Gavves, Cees Snoek and Arnold Smeulders, which is to appear in Computer Vision and Image Understanding is now available. In this paper, we address the incoherence problem of the visual words in bag-of-words vocabularies. Different from existing work, which assigns words based on closeness in descriptor space, we focus on identifying pairs of independent, distant words – the visual synonyms – that are likely to host image patches of similar visual reality. We focus on landmark images, where the image geometry guides the detection of synonym pairs. Image geometry is used to find those image features that lie in the nearly identical physical location, yet are assigned to different words of the visual vocabulary. Defined in this way, we evaluate the validity of visual synonyms. We also examine the closeness of synonyms in the L2-normalized feature space. We show that visual synonyms may successfully be used for vocabulary reduction. Furthermore, we show that combining the reduced visual vocabularies with synonym augmentation, we perform on par with the state-of-the-art bag-of-words approach, while having a 98% smaller vocabulary.

The paper Personalizing Automated Image Annotation using Cross-Entropy was presented today by Xirong Li et al.  at ACM Multimedia 2011. In this paper it is observed that annotating the increasing amounts of user-contributed images in a personalized manner is in great demand. However, this demand is largely ignored by the mainstream of automated image annotation research. In this paper we aim for personalizing automated image annotation by jointly exploiting personalized tag statistics and content-based image annotation. We propose a cross-entropy based learning algorithm which personalizes a generic annotation model by learning from a user’s multimedia tagging history. Using cross-entropy-minimization basedMonte Carlo sampling, the proposed algorithm optimizes the personalization process in terms of a performance measurement which can be flexibly chosen. Automatic image annotation experiments with 5,315 realistic users in the social web show that the proposed method compares favorably to a generic image annotation method and a method using personalized tag statistics only. For 4,442 users the performance improves, where for 1,088 users the absolute performance gain is at least 0.05 in terms of average precision. The results show the value of the proposed method.

This week, Barcelona hosts the IEEE International Conference on Computer Vision. Judging from the paper titles, the focus will be on learning to recognize objects in images.

The master thesis by Jeroen Steggink entitled Adding Semantics to Image-Region Annotations with the Name-It-Game is now published in the special issue of Multimedia Systems journal on Interactive multimedia computing. In this paper we present the Name-It-Game, an interactive multimedia game fostering the swift creation of a large data set of region-based image annotations. Compared to existing annotation games, we consider an added semantic structure, by means of the WordNet ontology, the main innovation of the Name-It-Game. Using an ontology-powered game, instead of the more traditional annotation tools, potentially makes region-based image labeling more fun and accessible for every type of user. However, the current games often present the players with hard-to-guess objects. To prevent this from happening in the Name-It-Game, we successfully identify WordNet categories which filter out hard-to-guess objects. To verify the speed of the annotation process, we compare the online Name-It-Game with a desktop tool with similar features. Results show that the Name-It-Game outperforms this tool for semantic region-based image labeling. Lastly, we measure the accuracy of the produced segmentations and compare them with carefully created LabelMe segmentations. Judging from the quantitative and qualitative results, we believe the segmentations are competitive to those of LabelMe, especially when averaged over multiple games. By adding semantics to region-based image annotations, using the Name-It-Game, we have opened up an efficient means to provide precious labels in a playful manner.

The paper Crowdsourcing Visual Detectors for Video Search by Bauke Freiburg, Jaap Kamps, and Cees Snoek, which will appear in the forthcoming ACM Multimedia conference is now available. In this paper, we study social tagging at the video fragment-level using a combination of automated content understanding and the wisdom of the crowds. We are interested in the question whether crowdsourcing can be beneficial to a video search engine that automatically recognizes video fragments on a semantic level. To answer this question, we perform a 3-month online field study with a concert video search engine targeted at a dedicated user-community of pop concert enthusiasts. We harvest the feedback of more than 500 active users and perform two experiments. In experiment 1 we measure user incentive to provide feedback, in experiment 2 we determine the tradeoff between feedback quality and quantity when aggregated over multiple users. Results show that users provide sufficient feedback, which becomes highly reliable when a crowd agreement of 67% is enforced.