Published April 25th, 2012

The forthcoming CVPR paper on Convex Reduction of High-Dimensional Kernels for Visual Classification by Efstratios Gavves, Cees Snoek and Arnold Smeulders is now available. Limiting factors of fast and effective classifiers for large sets of images are their dependence on the number of images analyzed and the dimensionality of the image representation. Considering the growing number of images as a given, we aim to reduce the image feature dimensionality in this paper. We propose reduced linear kernels that use only a portion of the dimensions to reconstruct a linear kernel. We formulate the search for these dimensions as a convex optimization problem, which can be solved efficiently. Different from existing kernel reduction methods, our reduced kernels are faster and maintain the accuracy benefits from non-linear embedding methods that mimic non-linear SVMs. We show these properties on both the Scenes and PASCAL VOC 2007 datasets. In addition, we demonstrate how our reduced kernels allow to compress Fisher vector for use with non-linear embeddings, leading to high accuracy. What is more, without using any labeled examples the selected and weighed kernel dimensions appear to correspond to visually meaningful patches in the images.
Category: Science | Be the First to Comment »
Published December 18th, 2011
We are organizing a special session on Socio-Video Semantics at the forthcoming ACM International Conference on Multimedia Retrieval in Hong Kong.
Aims and Scope
All of a sudden video became social. In just five years, individual and mostly inactive consumers transformed into active and connected prosumers, revolutionaries even, who create, share, and comment on massive amounts of video artifacts all over the world wide web 2.0. Pronounced manifestations of social video on the Internet include industry initiatives like YouTube, Vimeo, WikiPedia, and Flickr, who manage to attract millions of users, daily. It has been predicted that soon 91 percent of Internet data will be video, where smartphones will only accelerate the unstoppable momentum. In order to make sense of the massive amounts of video content, online social platforms rely on what other people say is in the image, which is known to be ambiguous, overly personalized, and limited. Hence, the lack of semantics currently associated with online video is seriously hampering retrieval, repurposing, and usage. In contrast to social video platforms, academic video sensemaking approaches rely on an analysis of the multimedia content. Such content-driven image search is important, if only to verify what people have said is factually in the video, or for (professional) archives which cannot be shared for crowdsourcing. Despite good progress, automated multimedia analysis of video content is still seriously hampered by the semantic gap, or the lack of correspondence between the low-level audiovisual features that machines extract from video and the high-level conceptual interpretations a human gives to multimedia data. For sensemaking, exploiting the social multimedia context of video has largely been ignored in the multimedia community. This special session provides a unique opportunity for high-quality papers connecting the social context of online video to video sensemaking.
Topics of Interest
Topics of interest include (but are not limited to):
Socio-video content analysis
- Cross-modal (social / visual / audio) socio-video content analysis
- Contextual models for socio-video analysis
- Novel features for socio-video analysis
- Complex event recognition in socio-videos
- Socio-video copy detection
- content-aware ads optimization in socio-video sharing sites
- efficient learning and mining algorithms for scalable socio-video content analysis
Socio-video browsing and retrieval
- Socio-video retrieval systems
- Socio-video summarization
- Recommender techniques for socio-video browsing
- Mobile socio-video browsing and retrieval
- User-centered interface and system design for socio-video browsing and retrieval
Socio-video benchmark construction and open-source software
- Benchmark database construction for socio-video semantic analysis
- Ontology construction for socio-video semantic analysis
- Open-source software libraries for socio-video analysis
Paper Submission
All papers must be formatted according to the ACM conference style, cannot exceed 8 pages in 9 point font, and must be submitted as pdf files.
ACM ICMR 2012 follows double-blind review. Please make sure that the names and affiliations of the authors are excluded in the document. Also remember to avoid information that may identify the authors.
Either the Microsoft Word or LaTex can be used to prepare the manuscripts (but final submission file should be in pdf format). The paper templates can be downloaded directly from the ACM ICMR 2012 website:
http://www.icmr2012.org/submission.html
Selected manuscripts will also be invited for a special issue in IEEE Transactions on Multimedia on the same topic.
Important Dates
– Paper submission deadline: January 15, 2012
– Notification of acceptance: March 15, 2012
– Camera-ready manuscript: April 5, 2012
Organizers
Cees G. M. Snoek, University of Amsterdam (Netherlands)
Yu-Gang Jiang, Fudan University (China)
Category: CfP, Science | Be the First to Comment »
Published December 10th, 2011

The paper Visual Synonyms for Landmark Image Retrieval by Efstratios Gavves, Cees Snoek and Arnold Smeulders, which is to appear in Computer Vision and Image Understanding is now available. In this paper, we address the incoherence problem of the visual words in bag-of-words vocabularies. Different from existing work, which assigns words based on closeness in descriptor space, we focus on identifying pairs of independent, distant words – the visual synonyms – that are likely to host image patches of similar visual reality. We focus on landmark images, where the image geometry guides the detection of synonym pairs. Image geometry is used to find those image features that lie in the nearly identical physical location, yet are assigned to different words of the visual vocabulary. Defined in this way, we evaluate the validity of visual synonyms. We also examine the closeness of synonyms in the L2-normalized feature space. We show that visual synonyms may successfully be used for vocabulary reduction. Furthermore, we show that combining the reduced visual vocabularies with synonym augmentation, we perform on par with the state-of-the-art bag-of-words approach, while having a 98% smaller vocabulary.
Category: Science | Be the First to Comment »
Published November 30th, 2011

The paper Personalizing Automated Image Annotation using Cross-Entropy was presented today by Xirong Li et al. at ACM Multimedia 2011. In this paper it is observed that annotating the increasing amounts of user-contributed images in a personalized manner is in great demand. However, this demand is largely ignored by the mainstream of automated image annotation research. In this paper we aim for personalizing automated image annotation by jointly exploiting personalized tag statistics and content-based image annotation. We propose a cross-entropy based learning algorithm which personalizes a generic annotation model by learning from a user’s multimedia tagging history. Using cross-entropy-minimization basedMonte Carlo sampling, the proposed algorithm optimizes the personalization process in terms of a performance measurement which can be flexibly chosen. Automatic image annotation experiments with 5,315 realistic users in the social web show that the proposed method compares favorably to a generic image annotation method and a method using personalized tag statistics only. For 4,442 users the performance improves, where for 1,088 users the absolute performance gain is at least 0.05 in terms of average precision. The results show the value of the proposed method.
Category: Science | Be the First to Comment »