{"id":1571,"date":"2016-10-26T16:20:08","date_gmt":"2016-10-26T15:20:08","guid":{"rendered":"http:\/\/www.ceessnoek.info\/?p=1571"},"modified":"2016-10-26T20:18:06","modified_gmt":"2016-10-26T19:18:06","slug":"spot-on-action-localization-from-pointly-supervised-proposals","status":"publish","type":"post","link":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/","title":{"rendered":"Spot On: Action Localization from Pointly-Supervised Proposals"},"content":{"rendered":"<p>The ECCV 2016 paper\u00a0S<em>pot On: Action Localization from Pointly-Supervised Proposals<\/em> by <a href=\"https:\/\/staff.fnwi.uva.nl\/p.s.m.mettes\/\">Pascal Mettes<\/a>, Jan van Gemert and Cees Snoek is <a href=\"http:\/\/isis-data.science.uva.nl\/cgmsnoek\/pub\/mettes-pointly-eccv2016.pdf\">now\u00a0available<\/a>. We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier demanding carefully annotated box annotations at train time. Annotating action boxes in video is cumbersome, tedious, and error prone. Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frames only. We introduce an overlap measure between action proposals and points and incorporate them all into the objective of a non-convex Multiple Instance Learning optimization. Experimental evaluation on the UCF Sports and UCF 101 datasets shows that (i) spatio-temporal proposals can be used to train classifiers while retaining the localization performance, (ii) point annotations yield results comparable to box annotations while being significantly faster to annotate, (iii) with a minimum amount of supervision our approach is competitive to the state-of-the-art. Finally, we introduce spatio-temporal action annotations on the train and test videos of Hollywood2, resulting in Hollywood2Tubes, available at <a href=\"http:\/\/tinyurl.com\/hollywood2tubes\">tinyurl.com\/hollywood2tubes<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1572\" src=\"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png\" alt=\"screen-shot-2016-10-26-at-5-18-08-pm\" width=\"1380\" height=\"332\" srcset=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png 1380w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM-300x72.png 300w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM-768x185.png 768w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM-1024x246.png 1024w\" sizes=\"auto, (max-width: 1380px) 100vw, 1380px\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The ECCV 2016 paper\u00a0Spot On: Action Localization from Pointly-Supervised Proposals by Pascal Mettes, Jan van Gemert and Cees Snoek is now\u00a0available. We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier demanding carefully annotated box annotations at train time. [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-1571","post","type-post","status-publish","format-standard","hentry","category-science"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Spot On: Action Localization from Pointly-Supervised Proposals - Cees Snoek<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spot On: Action Localization from Pointly-Supervised Proposals - Cees Snoek\" \/>\n<meta property=\"og:description\" content=\"The ECCV 2016 paper\u00a0Spot On: Action Localization from Pointly-Supervised Proposals by Pascal Mettes, Jan van Gemert and Cees Snoek is now\u00a0available. We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier demanding carefully annotated box annotations at train time. [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/\" \/>\n<meta property=\"og:site_name\" content=\"Cees Snoek\" \/>\n<meta property=\"article:published_time\" content=\"2016-10-26T15:20:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-10-26T19:18:06+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png\" \/>\n<meta name=\"author\" content=\"Cees\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Cees\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/\",\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/\",\"name\":\"Spot On: Action Localization from Pointly-Supervised Proposals - Cees Snoek\",\"isPartOf\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png\",\"datePublished\":\"2016-10-26T15:20:08+00:00\",\"dateModified\":\"2016-10-26T19:18:06+00:00\",\"author\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#primaryimage\",\"url\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png\",\"contentUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png\",\"width\":1380,\"height\":332},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.ceessnoek.info\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Spot On: Action Localization from Pointly-Supervised Proposals\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.ceessnoek.info\/#website\",\"url\":\"https:\/\/www.ceessnoek.info\/\",\"name\":\"Cees Snoek\",\"description\":\"research on video and image ai\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.ceessnoek.info\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\",\"name\":\"Cees\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"caption\":\"Cees\"},\"sameAs\":[\"http:\/\/www.CeesSnoek.info\"],\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spot On: Action Localization from Pointly-Supervised Proposals - Cees Snoek","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/","og_locale":"en_US","og_type":"article","og_title":"Spot On: Action Localization from Pointly-Supervised Proposals - Cees Snoek","og_description":"The ECCV 2016 paper\u00a0Spot On: Action Localization from Pointly-Supervised Proposals by Pascal Mettes, Jan van Gemert and Cees Snoek is now\u00a0available. We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier demanding carefully annotated box annotations at train time. [&hellip;]","og_url":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/","og_site_name":"Cees Snoek","article_published_time":"2016-10-26T15:20:08+00:00","article_modified_time":"2016-10-26T19:18:06+00:00","og_image":[{"url":"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png","type":"","width":"","height":""}],"author":"Cees","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Cees","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/","url":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/","name":"Spot On: Action Localization from Pointly-Supervised Proposals - Cees Snoek","isPartOf":{"@id":"https:\/\/www.ceessnoek.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#primaryimage"},"image":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#primaryimage"},"thumbnailUrl":"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png","datePublished":"2016-10-26T15:20:08+00:00","dateModified":"2016-10-26T19:18:06+00:00","author":{"@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1"},"breadcrumb":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#primaryimage","url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png","contentUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2016\/10\/Screen-Shot-2016-10-26-at-5.18.08-PM.png","width":1380,"height":332},{"@type":"BreadcrumbList","@id":"https:\/\/www.ceessnoek.info\/index.php\/spot-on-action-localization-from-pointly-supervised-proposals\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ceessnoek.info\/"},{"@type":"ListItem","position":2,"name":"Spot On: Action Localization from Pointly-Supervised Proposals"}]},{"@type":"WebSite","@id":"https:\/\/www.ceessnoek.info\/#website","url":"https:\/\/www.ceessnoek.info\/","name":"Cees Snoek","description":"research on video and image ai","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ceessnoek.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1","name":"Cees","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","caption":"Cees"},"sameAs":["http:\/\/www.CeesSnoek.info"],"url":"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/1571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/comments?post=1571"}],"version-history":[{"count":3,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/1571\/revisions"}],"predecessor-version":[{"id":1575,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/1571\/revisions\/1575"}],"wp:attachment":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/media?parent=1571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/categories?post=1571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/tags?post=1571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}