Multimedia in Asia
The Pacific-Rim Conference on Multimedia (PCM) is the major annual international conference in Asia organized as a forum for the dissemination of state-of-the-art technological advances and research re- sults in the fields of theoretical, experi- mental, and applied multimedia analysis and processing. The conference calls for research papers reporting original investi- gation results and industrial-track papers reporting real-world multimedia applica- tions and system development experi- ences. The conference also solicits pro- posals for tutorials on enabling multime- dia technologies, and calls for dedicated special session proposals focusing on emerging challenges in the multimedia arena.

An Unforgettable Experience
PCM 2014 will be held in Kuching, Sarawak, Malaysia. Sarawak, also know as the “Land of the Hornbills”, offers an un- forgettable mix of culture- and nature- based travel experiences. Tourists can indulge themselves in a wide range of attractions including enjoying the charms of historic Kuching, a beautiful waterfront city; exploring the massive cave systems of Mulu National Park; jungle trekking and wildlife watching at Bako National Park; traveling upriver to visit the Iban and ex- perience longhouse life.
Topics of Interests

Submissions
PCM 2014 is seeking high quality submissions in the broad field of multimedia. PCM 2014 accepts both full, short and demo papers. There will be awards for full, short and demo papers. See the website for further details: http://conference.cs.cityu.edu.hk/pcm2014/

compositeconcepts

The ICMR2014 paper Composite Concept Discovery for Zero-Shot Video Event Detection by Amirhossein Habibian, Thomas Mensink and Cees Snoek is now available. We consider automated detection of events in video without the use of any visual training examples. A common approach is to represent videos as classification scores obtained from a vocabulary of pre-trained concept classifiers. Where others construct the vocabulary by training individual concept classifiers, we propose to train classifiers for combination of concepts composed by Boolean logic operators. We call these concept combinations composite concepts and contribute an algorithm that automatically discovers them from existing video-level concept annotations. We discover composite concepts by jointly optimizing the accuracy of concept classifiers and their effectiveness for detecting events. We demonstrate that by combining concepts into composite concepts, we can train more accurate classifiers for the concept vocabulary, which leads to improved zero-shot event detection. Moreover, we demonstrate that by using different logic operators, namely “AND”, “OR”, we discover different types of composite concepts, which are complementary for zero-shot event detection. We perform a search for 20 events in 41K web videos from two test sets of the challenging TRECVID Multimedia Event Detection 2013 corpus. The experiments demonstrate the superior performance of the discovered composite concepts, compared to present-day alternatives, for zero-shot event detection.

stopframes

The ICMR2014 paper by Amirhossein Habibian and Cees Snoek entitled Stop-Frame Removal Improves Web Video Classification is now available. Web videos available in sharing sites like YouTube, are becoming an alternative to manually annotated training data, which are necessary for creating video classifiers. However, when looking into web videos, we observe they contain several irrelevant frames that may randomly appear in any video, i.e., blank and over exposed frames. We call these irrelevant frames stop-frames and propose a simple algorithm to identify and exclude them during classifier training. Stop-frames might appear in any video, so it is hard to recognize their category. Therefore we identify stop-frames as those frames, which are commonly misclassified by any concept classifier. Our experiments demonstrates that using our algorithm improves classification accuracy by 60% and 24% in terms of mean average precision for an event and concept detection benchmark.

The ICMR2014 paper Few-Example Video Event Retrieval Using Tag Propagation by Masoud Mazloom, Xirong Li and Cees Snoek is now available. An emerging topic in multimedia retrieval is to detect a complex event in video using only a handful of video examples. Different from existing work which learns a ranker from positive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few visual examples. To that end, we propose in this paper a tag-based video retrieval system which propagates tags from a tagged video source to an unlabeled video collection without the need of any training examples. Our algorithm is based on weighted frequency neighbor voting using concept vector similarity. Once tags are propagated to unlabeled video we can rely on off-the-shelf language models to rank these videos by the tag similarity. We study the behavior of our tag-based video event retrieval system by performing three experiments on web videos from the TRECVID multimedia event detection corpus, with zero, one and multiple query examples that beats a recent alternative.

TagPropagation

ISOMER

The ICMR2014 paper ISOMER: Informative Segment Observations for Multimedia Event Recounting by Chen Sun, Brian Burns, Ram Nevatia, Cees G. M. Snoek, Bob Bolles, Greg Myers, Wen Wang and Eric Yeh is now available. This paper describes a system for multimedia event detection and recounting. The goal is to detect a high level event class in unconstrained web videos and generate event oriented summarization for display to users. For this purpose, we detect informative segments and collect observations for them, leading to our ISOMER system. We combine a large collection of both low level and semantic level visual and audio features for event detection. For event recounting, we propose a novel approach to identify event oriented discriminative video segments and their descriptions with a linear SVM event classifier. User friendly concepts including objects, actions, scenes, speech and optical character recognition are used in generating descriptions. We also develop several mapping and filtering strategies to cope with noisy concept detectors. Our system performed competitively in the TRECVID 2013 Multimedia Event Detection task with near 100,000 videos and was the highest performer in TRECVID 2013 Multimedia Event Recounting task.

We organize the first edition of the ASCI course on computer vision by learning. Forty PhD students from the Netherlands receive updates on invariants, deep nets, action localization, object tracking, attribute representations, and on Monday an invited tutorial by Shih-Fu Chang from Columbia University.

asci-computervisionbylearning2014

Four papers got accepted by the leading IEEE Conference on Computer Vision and Pattern Recognition (CVPR). This is a new Dutch record. CVPR is the only conference in the top-100 of most cited sources by Google Scholar, which further consists only of journals. The lists starts with Nature followed by many major journals from other fields such as PLoS One at 36, Nature Neuroscience at 73, and Astronomy and Astrophysics at 99. It is no surprise that the only conference is on computer science, as progress in this field is fast.

The accepted papers are:

  • Locality in Generic Instance Search from One Example. Ran Tao, Efstratios Gavves, Cees G. M. Snoek, and Arnold W.M. Smeulders.
  • Fisher and VLAD with FLAIR. Koen E.A. van de Sande, Cees G.M. Snoek, and Arnold W.M. Smeulders.
  • Co-Occurrence Statistics for Zero-Shot Classification. Thomas Mensink, Efstratios Gavves, and Cees G.M. Snoek.
  • Action Localization by Tubelets from Motion. Mihir Jain, Jan C. van Gemert, Patrick Bouthemy, Hervé Jégou and Cees G.M. Snoek.

The papers will be presented during the IEEE Conference on Computer Vision and Pattern Recognition, from 24- 27 June 24 in Columbus, Ohio, USA.

Arnold Smeulders, Laurens van der Maaten and myself are organizing a new Ph.D. course on Computer Vision by Learning. The first edition will happen from March 25 to March 31, in Amsterdam. This ASCI course is especially meant for Ph.D. students who have basic familiarity with computer vision, image processing, and pattern recognition and want to upsurge their knowledge and machinery to the state-of-the-art, with direct utility in their own research. The topic of attention is the challenges of computer vision by learning. We address the theoretical foundations of machine learning in conjunction with computer vision and present algorithms that achieve state-of-the-art performance while maintaining efficient execution with minimal supervision. We explain and emphasize machine learning for vision tasks like concept detection with deep learning, fine-grained categorization using kernel pooling, semantic segmentation with conditional random fields, object tracking by structured SVMs, event recognition by random forests and retrieval from a single image by metric learning. We give an overview of the latest developments and future trends in the field on the basis of several recent challenges, including the TRECVID and ImageNet competitions, the leading competitions for visual search engines based on computer vision by learning, and we indicate how to obtain improvements in the near future. The course will close with an invited tutorial by the renown prof. Shih-Fu Chang from Columbia University, USA.

ImageNet logo

UvA-Euvision Team Presents at ImageNet Workshop

Amidst fierce competition the UvA-Euvision team participated in the new ImageNet object detection task where the goal is to tell what object is in an image and where it is located. The organizers defined 200 basic-level categories for this task (e.g. accordion, airplane, ant, antelope and apple) . The categories were carefully chosen considering different factors such as object scale, level of image clutterness, average number of object instance, and several others.

The number of categories won by the University of Amsterdam – Euvision Technologies team is 130, out of 200.

The purpose of the workshop is to present the methods and results of the Image Net Large Scale Visual Recognition Challenge (ILSVRC) 2013. Challenge participants with the most successful and innovative entries are invited to present, and the UvA-Euvision team is amongst them.

The ImageNet 2013 Detection Task

To summarize our participation, for task 1, the ILSVRC2013 detection task on 200 classes, we submit two runs. Our runs utilize a new way of efficient encoding. The method is currently under submission, therefore we can not include identifying details on this part. The submission utilizes selective search (Uijlings et al. IJCV 2013) to create on many candidate boxes per image. These boxes are represented by extracting densely sampled color SIFT descriptors (van de Sande et al, PAMI 2010) at multiple scales. The box is then encoded with our new efficient coding. The method is faster than bag-of-words with hard assignment and outperforms it in terms of accuracy. Each box is encoded with a multi-level spatial pyramid. Training follows a standard negative mining procedure based on the previous work. The first run is context-free. The 200 models are trained independently of one another. The second run utilizes a convolutional network, trained on the DET dataset, to compute a prior for the presence of an object in the image.

The ImageNet 2013 Classification Task

For task 2, the ILSVRC2013 classification task on 1,000 classes, we submit two runs.Our showcase run performs all evaluations of the test set on an iPhone 5s at a rate of 2 images per second, whereas on the iPhone 4 it has a performance of 1 image per 10 seconds. The results in the main run are based on the fusion of convolutional networks. The networks are compatible to the networks that won this task last year (Krizhevsky et al, NIPS 2012), where our networks have 76M free parameters. The parameters have been trained for 300 epochs on a single GPU. For training in both runs we have used the ImageNet 1,000 dataset. No (pre-)training on other datasets has been performed.

Demo on iPhone Available

At the ILSVRC2013 workshop we will release an app in the App Store performing instant interactive photo classification (take a picture, see the top 5 ImageNet scores).  This app uses the same engine as our Impala app that is already available at: https://itunes.apple.com/us/app/impala/id736620048 . The Impala app user interface was designed for the experience that the iPhone works for you, but can still be optimized. The current results reflect the match of the training data with the personal data on the iPhone.

December 7 in Sydney, Australia

The ImageNet workshop is held December 7 in Sydney. The workshop is organized in conjunction with the International Conference on Computer Vision.

AUTOMATIC IMAGE CLASSIFICATION ON YOUR PHONE

Impala by Euvision is the first app in the world that automatically sorts the photos on your phone. You do not have to manually label each and every one of them. Impala “looks” into your images and videos and recognizes what they area about.

landscape-2