The CVPR 2014 paper entitled: COSTA: Co-Occurrence Statistics for Zero-Shot Classification by Thomas Mensink, Efstratios Gavves and Cees Snoek is now available. In this paper we aim for zero-shot classification, that is visual recognition of an unseen class by using knowledge transfer from known classes. Our main contribution is COSTA, which exploits co-occurrences of visual concepts in images for knowledge transfer. These inter-dependencies arise naturally between concepts, and are easy to obtain from existing annotations or web-search hit counts. We estimate a classifier for a new label, as a weighted combination of related classes, using the co-occurrences to define the weight. We propose various metrics to leverage these co-occurrences, and a regression model for learning a weight for each related class. We also show that our zero-shot classifiers can serve as priors for few-shot learning. Experiments on three multi-labeled datasets reveal that our proposed zero-shot methods, are approaching and occasionally outperforming fully supervised SVMs. We conclude that co-occurrence statistics suffice for zero-shot classification.
Multimedia in Asia
The Pacific-Rim Conference on Multimedia (PCM) is the major annual international conference in Asia organized as a forum for the dissemination of state-of-the-art technological advances and research re- sults in the fields of theoretical, experi- mental, and applied multimedia analysis and processing. The conference calls for research papers reporting original investi- gation results and industrial-track papers reporting real-world multimedia applica- tions and system development experi- ences. The conference also solicits pro- posals for tutorials on enabling multime- dia technologies, and calls for dedicated special session proposals focusing on emerging challenges in the multimedia arena.
An Unforgettable Experience
PCM 2014 will be held in Kuching, Sarawak, Malaysia. Sarawak, also know as the “Land of the Hornbills”, offers an un- forgettable mix of culture- and nature- based travel experiences. Tourists can indulge themselves in a wide range of attractions including enjoying the charms of historic Kuching, a beautiful waterfront city; exploring the massive cave systems of Mulu National Park; jungle trekking and wildlife watching at Bako National Park; traveling upriver to visit the Iban and ex- perience longhouse life.
Topics of Interests
PCM 2014 is seeking high quality submissions in the broad field of multimedia. PCM 2014 accepts both full, short and demo papers. There will be awards for full, short and demo papers. See the website for further details: http://conference.cs.cityu.edu.hk/pcm2014/
The ICMR2014 paper Composite Concept Discovery for Zero-Shot Video Event Detection by Amirhossein Habibian, Thomas Mensink and Cees Snoek is now available. We consider automated detection of events in video without the use of any visual training examples. A common approach is to represent videos as classification scores obtained from a vocabulary of pre-trained concept classifiers. Where others construct the vocabulary by training individual concept classifiers, we propose to train classifiers for combination of concepts composed by Boolean logic operators. We call these concept combinations composite concepts and contribute an algorithm that automatically discovers them from existing video-level concept annotations. We discover composite concepts by jointly optimizing the accuracy of concept classifiers and their effectiveness for detecting events. We demonstrate that by combining concepts into composite concepts, we can train more accurate classifiers for the concept vocabulary, which leads to improved zero-shot event detection. Moreover, we demonstrate that by using different logic operators, namely “AND”, “OR”, we discover different types of composite concepts, which are complementary for zero-shot event detection. We perform a search for 20 events in 41K web videos from two test sets of the challenging TRECVID Multimedia Event Detection 2013 corpus. The experiments demonstrate the superior performance of the discovered composite concepts, compared to present-day alternatives, for zero-shot event detection.
The ICMR2014 paper by Amirhossein Habibian and Cees Snoek entitled Stop-Frame Removal Improves Web Video Classification is now available. Web videos available in sharing sites like YouTube, are becoming an alternative to manually annotated training data, which are necessary for creating video classifiers. However, when looking into web videos, we observe they contain several irrelevant frames that may randomly appear in any video, i.e., blank and over exposed frames. We call these irrelevant frames stop-frames and propose a simple algorithm to identify and exclude them during classifier training. Stop-frames might appear in any video, so it is hard to recognize their category. Therefore we identify stop-frames as those frames, which are commonly misclassified by any concept classifier. Our experiments demonstrates that using our algorithm improves classification accuracy by 60% and 24% in terms of mean average precision for an event and concept detection benchmark.
The ICMR2014 paper Few-Example Video Event Retrieval Using Tag Propagation by Masoud Mazloom, Xirong Li and Cees Snoek is now available. An emerging topic in multimedia retrieval is to detect a complex event in video using only a handful of video examples. Different from existing work which learns a ranker from positive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few visual examples. To that end, we propose in this paper a tag-based video retrieval system which propagates tags from a tagged video source to an unlabeled video collection without the need of any training examples. Our algorithm is based on weighted frequency neighbor voting using concept vector similarity. Once tags are propagated to unlabeled video we can rely on off-the-shelf language models to rank these videos by the tag similarity. We study the behavior of our tag-based video event retrieval system by performing three experiments on web videos from the TRECVID multimedia event detection corpus, with zero, one and multiple query examples that beats a recent alternative.