21. June 2014 · 2 comments · Categories: Science

The CVPR’14 paper Locality in Generic Instance Search from One Example by Ran Tao, Efstratios Gavves, Cees G. M. Snoek, and Arnold W. M. Smeulders is now available. This paper aims for generic instance search from a single example. Where the state-of-the-art relies on global image representation for the search, we proceed by including locality at all steps of the method. As the first novelty, we consider many boxes per database image as candidate targets to search locally in the picture using an efficient point-indexed representation. The same representation allows, as the second novelty, the application of very large vocabularies in the powerful Fisher vector and VLAD to search locally in the feature space. As the third novelty we propose an exponential similarity function to further emphasize locality in the feature space. Locality is advantageous in instance search as it will rest on the matching unique details. We demonstrate a substantial increase in generic instance search performance from one example on three standard datasets with buildings, logos, and scenes from 0.443 to 0.620 in mAP.

locality-instance-search

We Are A Discount On The Internet Pharmacy That Gives Very Best Price. Canadian Overall Health Inc. Complete Privacy. Absolutely Everyone Would Like To Buy Reliable In The Neighborhood Pharmacy In Long Beach On The Telephone Right Here http://www.nettersheim.de/voltaren-spray-price/ Unreservedly. With Wonderful Discount. Around The World Delivery.
correlation

The paper “Recommendations for Recognizing Video Events by Concept Vocabularies” by Amirhossein Habibian and Cees Snoek appearing in the July issue of Computer Vision and Image Understanding is now available. Representing videos using vocabularies composed of concept detectors appears promising for generic event recognition. While many have recently shown the benefits of concept vocabularies for recognition, studying the characteristics of a universal concept vocabulary suited for representing events is ignored. In this paper, we study how to create an effective vocabulary for arbitrary-event recognition in web video. We consider five research questions related to the number, the type, the specificity, the quality and the normalization of the detectors in concept vocabularies. A rigorous experimental protocol using a pool of 1346 concept detectors trained on publicly available annotations, two large arbitrary web video datasets and a common event recognition pipeline allow us to analyze the performance of various concept vocabulary definitions. From the analysis we arrive at the recommendation that for effective event recognition the concept vocabulary should (i) contain more than 200 concepts, (ii) be diverse by covering object, action, scene, people, animal and attribute concepts, (iii) include both general and specific concepts, (iv) increase the number of concepts rather than improve the quality of the individual detectors, and (v) contain detectors that are appropriately normalized. We consider the recommendations for recognizing video events by concept vocabularies the most important contribution of the paper, as they provide guidelines for future work.

FLAIR

The paper “Fisher and VLAD with FLAIR” by Koen van de Sande, Cees Snoek and Arnold Smeulders will be presented as poster at the forthcoming CVPR’14 conference in Columbus, Ohio. The paper considers efficient object detection, that is automatically determining what object appears where in an image. A major computational bottleneck in many current algorithms is the evaluation of arbitrary boxes. Dense local analysis and powerful bag-of-word encodings, such as Fisher vectors and VLAD, lead to improved accuracy at the expense of increased computation time. Where a simplification in the representation is tempting, we exploit novel representations while maintaining accuracy. We start from state-of-the-art, fast selective search, but our method will apply to any initial box-partitioning. By representing the picture as sparse integral images, one per codeword, we achieve a Fast Local Area Independent Representation. FLAIR allows for very fast evaluation of any box encoding and still enables spatial pooling. In FLAIR we achieve exact VLADs difference coding, even with l2 and power-norms. Finally, by multiple codeword assignments, we achieve exact and approximate Fisher vectors with FLAIR. The results are a 18x speedup, which enables us to set a new state-of-the- art on the challenging 2010 PASCAL VOC objects and the fine-grained categorization of the CUB-2011 200 bird species. Plus, we rank number one in the official ImageNet 2013 detection challenge.

best-practices

The paper “Best Practices for Learning Video Concept Detectors from Social Media Examples” by Svetlana Kordumova, Xirong Li, and Cees G. M. Snoek that will appear in a future special issue of Multimedia Tools and Applications is now available. Learning video concept detectors from social media sources, such as Flickr images and YouTube videos, has the potential to address a wide variety of concept queries for video search. While the potential has been recognized by many, and progress on the topic has been impressive, we argue that key questions crucial to know how to learn effective video concept detectors from social media examples? remain open. As an initial attempt to answer these questions, we conduct an experimental study using a video search engine which is capable of learning concept detectors from social media examples, be it socially tagged videos or socially tagged images. Within the video search engine we investigate three strategies for positive example selection, three negative example selection strategies and three learning strategies. The performance is evaluated on the challenging TRECVID 2012 benchmark consisting of 600 h of Internet video. From the experiments we derive four best practices: (1) tagged images are a better source for learning video concepts than tagged videos, (2) selecting tag relevant positive training examples is always beneficial, (3) selecting relevant negative examples is advantageous and should be treated differently for video and image sources, and (4) learning concept detectors with selected relevant training data before learning is better then incorporating the relevance during the learning process. The best practices within our video search engine lead to state-of-the-art performance in the TRECVID 2013 benchmark for concept detection without manually provided annotations.

09. May 2014 · 2 comments · Categories: Science

tubelets-pic

The CVPR 2014 paper entitled: Action Localization by Tubelets from Motion by Mihir Jain, Jan C. van Gemert, Hervé Jégou, Patrick Bouthemy, and Cees G. M. Snoek is now available. This paper considers the problem of action localization, where the objective is to determine when and where certain actions appear. We introduce a sampling strategy to produce 2D+t sequences of bounding boxes, called tubelets. Compared to state-of-the-art alternatives, this drastically reduces the number of hypotheses that are likely to include the action of interest. Our method is inspired by a recent technique introduced in the context of image localization. Beyond considering this technique for the first time for videos, we revisit this strategy for 2D+t sequences obtained from super-voxels. Our sampling strategy advantageously exploits a criterion that reflects how action related motion deviates from background motion. We demonstrate the interest of our approach by extensive experiments on two public datasets: UCF Sports and MSR-II. Our approach significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences.

COSTA

The CVPR 2014 paper entitled: COSTA: Co-Occurrence Statistics for Zero-Shot Classification by Thomas Mensink, Efstratios Gavves and Cees Snoek is now available. In this paper we aim for zero-shot classification, that is visual recognition of an unseen class by using knowledge transfer from known classes. Our main contribution is COSTA, which exploits co-occurrences of visual concepts in images for knowledge transfer. These inter-dependencies arise naturally between concepts, and are easy to obtain from existing annotations or web-search hit counts. We estimate a classifier for a new label, as a weighted combination of related classes, using the co-occurrences to define the weight. We propose various metrics to leverage these co-occurrences, and a regression model for learning a weight for each related class. We also show that our zero-shot classifiers can serve as priors for few-shot learning. Experiments on three multi-labeled datasets reveal that our proposed zero-shot methods, are approaching and occasionally outperforming fully supervised SVMs. We conclude that co-occurrence statistics suffice for zero-shot classification.

Multimedia in Asia
The Pacific-Rim Conference on Multimedia (PCM) is the major annual international conference in Asia organized as a forum for the dissemination of state-of-the-art technological advances and research re- sults in the fields of theoretical, experi- mental, and applied multimedia analysis and processing. The conference calls for research papers reporting original investi- gation results and industrial-track papers reporting real-world multimedia applica- tions and system development experi- ences. The conference also solicits pro- posals for tutorials on enabling multime- dia technologies, and calls for dedicated special session proposals focusing on emerging challenges in the multimedia arena.

An Unforgettable Experience
PCM 2014 will be held in Kuching, Sarawak, Malaysia. Sarawak, also know as the “Land of the Hornbills”, offers an un- forgettable mix of culture- and nature- based travel experiences. Tourists can indulge themselves in a wide range of attractions including enjoying the charms of historic Kuching, a beautiful waterfront city; exploring the massive cave systems of Mulu National Park; jungle trekking and wildlife watching at Bako National Park; traveling upriver to visit the Iban and ex- perience longhouse life.
Topics of Interests

Submissions
PCM 2014 is seeking high quality submissions in the broad field of multimedia. PCM 2014 accepts both full, short and demo papers. There will be awards for full, short and demo papers. See the website for further details: http://conference.cs.cityu.edu.hk/pcm2014/

compositeconcepts

The ICMR2014 paper Composite Concept Discovery for Zero-Shot Video Event Detection by Amirhossein Habibian, Thomas Mensink and Cees Snoek is now available. We consider automated detection of events in video without the use of any visual training examples. A common approach is to represent videos as classification scores obtained from a vocabulary of pre-trained concept classifiers. Where others construct the vocabulary by training individual concept classifiers, we propose to train classifiers for combination of concepts composed by Boolean logic operators. We call these concept combinations composite concepts and contribute an algorithm that automatically discovers them from existing video-level concept annotations. We discover composite concepts by jointly optimizing the accuracy of concept classifiers and their effectiveness for detecting events. We demonstrate that by combining concepts into composite concepts, we can train more accurate classifiers for the concept vocabulary, which leads to improved zero-shot event detection. Moreover, we demonstrate that by using different logic operators, namely “AND”, “OR”, we discover different types of composite concepts, which are complementary for zero-shot event detection. We perform a search for 20 events in 41K web videos from two test sets of the challenging TRECVID Multimedia Event Detection 2013 corpus. The experiments demonstrate the superior performance of the discovered composite concepts, compared to present-day alternatives, for zero-shot event detection.

stopframes

The ICMR2014 paper by Amirhossein Habibian and Cees Snoek entitled Stop-Frame Removal Improves Web Video Classification is now available. Web videos available in sharing sites like YouTube, are becoming an alternative to manually annotated training data, which are necessary for creating video classifiers. However, when looking into web videos, we observe they contain several irrelevant frames that may randomly appear in any video, i.e., blank and over exposed frames. We call these irrelevant frames stop-frames and propose a simple algorithm to identify and exclude them during classifier training. Stop-frames might appear in any video, so it is hard to recognize their category. Therefore we identify stop-frames as those frames, which are commonly misclassified by any concept classifier. Our experiments demonstrates that using our algorithm improves classification accuracy by 60% and 24% in terms of mean average precision for an event and concept detection benchmark.

The ICMR2014 paper Few-Example Video Event Retrieval Using Tag Propagation by Masoud Mazloom, Xirong Li and Cees Snoek is now available. An emerging topic in multimedia retrieval is to detect a complex event in video using only a handful of video examples. Different from existing work which learns a ranker from positive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few visual examples. To that end, we propose in this paper a tag-based video retrieval system which propagates tags from a tagged video source to an unlabeled video collection without the need of any training examples. Our algorithm is based on weighted frequency neighbor voting using concept vector similarity. Once tags are propagated to unlabeled video we can rely on off-the-shelf language models to rank these videos by the tag similarity. We study the behavior of our tag-based video event retrieval system by performing three experiments on web videos from the TRECVID multimedia event detection corpus, with zero, one and multiple query examples that beats a recent alternative.

TagPropagation