Cees Snoek

MM13: Querying for Video Events by Semantic Signatures from Few Examples

22. October 2013 · Write a comment · Categories: Science

query-by-video The ACM Multimedia’13 paper on “Querying for Video Events by Semantic Signatures from Few Examples” by Masoud Mazloom, Amirhossein Habibian and Cees Snoek is now available. We aim to query web video for complex events using only a handful of video query examples, where the standard approach learns a ranker from hundreds of examples. We consider a semantic signature representation, consisting of off-the-shelf concept detectors, to capture the variance in semantic appearance of events. Since it is unknown what similarity metric and query fusion to use in such an event retrieval setting, we perform three experiments on unconstrained web videos from the TRECVID event detection task. It reveals that: retrieval with semantic signatures using normalized correlation as similarity metric outperforms a low-level bag-of-words alternative, multiple queries are best combined using late fusion with an average operator, and event retrieval is preferred over event classication when less than eight positive video examples are available.

MM13: Video2Sentence and Vice Versa

22. October 2013 · Write a comment · Categories: Science

The ACM Multimedia’13 demonstrator paper on “Video2Sentence and Vice Versa” by Amirhossein Habibian and Cees Snoek is now available. In this technical demonstration, we showcase a multimedia search engine that retrieves a video from a sentence, or a sentence from a video. The key novelty is our machine translation capability that exploits a cross-media representation for both the visual and textual modality using concept vocabularies. We will demonstrate the translations using arbitrary web videos and sentences related to everyday events. What is more, we will provide an automatically generated explanation, in terms of concept detectors, on why a particular video or sentence has been retrieved as the most likely translation.

ICCV13: Codemaps Segment, Classify and Search Objects Locally

21. October 2013 · Write a comment · Categories: Science

The ICCV13 paper entitled “Codemaps Segment, Classify and Search Objects Locally” by Zhenyang Li, Efstratios Gavves, Koen van de Sande, Cees Snoek, and Arnold Smeulders is now also available. In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce l2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PASCAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.

ICCV13: Fine-Grained Categorization by Alignments

12. October 2013 · Write a comment · Categories: Science

fine-grained-alignments

The ICCV13 paper entitled “Fine-Grained Categorization by Alignments” by Efstratios Gavves, Basura Fernando, Cees Snoek, Arnold Smeulders, and Tinne Tuytelaars is now available. The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of fine-grained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.

MM13: Classifying Tag Relevance with Relevant Positive and Negative Examples

29. September 2013 · Write a comment · Categories: Science

The ACM Multimedia 2013 paper “Classifying Tag Relevance with Relevant Positive and Negative Examples” by Xirong Li and Cees Snoek is now available. Image tag relevance estimation aims to automatically determine what people label about images is factually present in the pictorial content. Different from previous works, which either use only positive examples of a given tag or use positive and random negative examples, we argue the importance of relevant positive and relevant negative examples for tag relevance estimation. We propose a system that selects positive and negative examples, deemed most relevant with respect to the given tag from crowd-annotated images. While applying models for many tags could be cumbersome, our system trains efficient ensembles of Support Vector Machines per tag, enabling fast classification. Experiments on two benchmark sets show that the proposed system compares favorably against five present day methods. Given extracted visual features, for each image our system can process up to 3,787 tags per second. The new system is both effective and efficient for tag relevance estimation.

Evaluating Multimedia Features and Fusion for Example-based Event Detection

15. September 2013 · Write a comment · Categories: Science

Bassi Costi. In Farmacia Online Se Non Hai La Prescrizione Del Medico. Tariffe Più Convenienti. Ogni Persona Vorrebbe Scoprire Fondamentalmente A Basso Costo All’interno Del Rivenditore Di Farmaci In Italia Senza Prescrizione Su Questo Sito http://www.amptavolara.com/costo-micardis-40mg/. Con Un Fantastico Sconto. Ottieni Rapidamente Spedizione E Consegna Se Hai Pagato Con Carta Di Credito.
The paper “Evaluating Multimedia Features and Fusion for Example-based Event Detection” by the SESAME project consortium, which will appear in a forthcoming issue of Machine Vision and Applications is now available. In this paper we consider the problem of detecting events in Internet video, like the ones depicted: making a sandwich, repairing an appliance, birthday party, and parade. Multimedia event detection (MED)is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported.

event-examples

Xirong Li receives SIGMM Best Ph.D. Thesis Award 2013

03. September 2013 · Write a comment · Categories: Personal, Science

xirong

Congratulations to dr. Xirong Li for receiving the SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Applications 2013. The committee considered Xirong’s dissertation titled “Content-based visual search learned from social media” as worthy of the award as it substantially extends the boundaries for developing content-based multimedia indexing and retrieval solutions. In particular, it provides fresh new insights into the possibilities for realizing image retrieval solutions in the presence of vast information that can be drawn from the social media.

The committee considered the main innovation of Xirong’s work to be in the development of the theory and algorithms providing answers to the following challenging research questions:
(a) what determines the relevance of a social tag with respect to an image,
(b) how to fuse tag relevance estimators,
(c) which social images are the informative negative examples for concept learning,
(d) how to exploit socially tagged images for visual search and
(e) how to personalize automatic image tagging with respect to a user’s preferences.

The significance of the developed theory and algorithms lies in their power to enable effective and efficient deployment of the information collected from the social media to enhance the datasets that can be used to learn automatic image indexing mechanisms (visual concept detection) and to make this learning more personalized for the user.

Xirong’s thesis is available from the UvA digital academic repository.

Call for Papers: ACM ICMR 2014 at University of Glasgow

28. August 2013 · Write a comment · Categories: CfP, Science

Vacancy for PhD student in Computer Vision and Machine Learning

19. June 2013 · Write a comment · Categories: Science

Todos podem comprar drogas de próstata aumentadas de tamanho confiável na farmácia local em Charlotte pelo telefone aqui Cialis sem reservas! Você não precisará visitar o médico e obter uma receita. Negócio maravilhoso, peguei algumas coisas de marca e várias embalagens do medicamento genérico. O excesso de peso ou OBESIDADE, um excesso de peso pode afetar uma pessoa, o risco de desenvolver alguns problemas de saúde Como a disfunção erétil Abuso de drogas – a maioria das drogas (especialmente usados para tratar a pressão arterial alta, depressão e ansiedade e drogas) pode causar a disfunção erétil este.
[Application for the vacancy below was possible until July 31, 2013.]

We have a vacancy for a PhD student at the University of Amsterdam. The topic of the PhD is to recognize objects in a visual data stream. In such a stream the object classes of interest shift over time. Hence, the traditional approach to learn classifiers for a predefined set of objects is unsuited. A promising approach in classifying unseen objects into a novel category is to learn a semantic attribute image representation. The aim for this PhD is to develop new algorithms to learn such a high-level semantic representation from weakly annotated images and to learn the mapping to an unknown class from freely available (textual) sources. Another project aim is to model the visual data stream to understand which images or novel concepts could become a visual trend.

More information, requirements, appointment and application via this website.

Evaluating Sources and Strategies for Learning Video Concepts from Social Media

17. June 2013 · Write a comment · Categories: Science

The CBMI’13 paper “Evaluating Sources and Strategies for Learning Video Concepts from Social Media” by Svetlana Kordumova, Xirong Li and Cees Snoek is now available. Learning video concept detectors from social media sources, such as Flickr images and YouTube videos, has the potential to address a wide variety of concept queries for video search. While the potential has been recognized by many, and progress on the topic has been impressive, we argue that two key questions, i.e., What visual tagging source is most suited for selecting positive training examples to learn video concepts? and What strategy should be used for selecting positive examples from tagged sources?, remain open. As an initial attempt to answer the two questions, we conduct an experimental study using a video search engine which is capable of learning concept detectors from social media, be it socially tagged videos or socially tagged images.Within the video search engine we investigate six strategies of positive examples selection. The performance is evaluated on the challenging TRECVID benchmark 2011 with 400 hours of Internet videos. The new experiments lead to novel and nontrivial findings: (1) tagged images are a better source for learning video concepts from the web, (2) selecting tag relevant examples as positives for learning video concepts is always beneficial and it can be done automatically and (3) the best source and strategy compare favorably against several present-day methods.

sources-strategies