{"id":2170,"date":"2020-03-31T07:53:47","date_gmt":"2020-03-31T06:53:47","guid":{"rendered":"http:\/\/www.ceessnoek.info\/?p=2170"},"modified":"2020-03-31T07:54:01","modified_gmt":"2020-03-31T06:54:01","slug":"cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions","status":"publish","type":"post","link":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/","title":{"rendered":"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions"},"content":{"rendered":"\n<p>The CVPR 2020 paper <em>ActionBytes: Learning from Trimmed Videos to Localize Actions<\/em> by Mihir Jain, Amir Ghodrati and Cees Snoek is <a href=\"http:\/\/isis-data.science.uva.nl\/cgmsnoek\/pub\/jain-actionbytes-cvpr2020.pdf\">now available<\/a>. This paper tackles the problem of localizing actions in long untrimmed videos. Different from existing works, which all use annotated untrimmed videos during training, we learn only from short trimmed videos. This enables learning from large-scale datasets originally designed for action classification. We propose a method to train an action localization network that segments a video into interpretable fragments, we call ActionBytes. Our method jointly learns to cluster ActionBytes and trains the localization network using the cluster assignments as pseudo-labels. By doing so, we train on short trimmed videos that become untrimmed for ActionBytes. In isolation, or when merged, the ActionBytes also serve as effective action proposals. Experiments demonstrate that our boundary-guided training generalizes to unknown action classes and localizes actions in long videos of Thumos14, MultiThumos, and ActivityNet1.2. Furthermore, we show the advantage of ActionBytes for zero-shot localization as well as traditional weakly supervised localization, that train on long videos, to achieve state-of-the-art results.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"375\" src=\"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-1024x375.png\" alt=\"\" class=\"wp-image-2171\" srcset=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-1024x375.png 1024w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-300x110.png 300w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-768x281.png 768w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020.png 1454w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The CVPR 2020 paper ActionBytes: Learning from Trimmed Videos to Localize Actions by Mihir Jain, Amir Ghodrati and Cees Snoek is now available. This paper tackles the problem of localizing actions in long untrimmed videos. Different from existing works, which all use annotated untrimmed videos during training, we learn only from short trimmed videos. This [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2170","post","type-post","status-publish","format-standard","hentry","category-science"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions - Cees Snoek<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions - Cees Snoek\" \/>\n<meta property=\"og:description\" content=\"The CVPR 2020 paper ActionBytes: Learning from Trimmed Videos to Localize Actions by Mihir Jain, Amir Ghodrati and Cees Snoek is now available. This paper tackles the problem of localizing actions in long untrimmed videos. Different from existing works, which all use annotated untrimmed videos during training, we learn only from short trimmed videos. This [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/\" \/>\n<meta property=\"og:site_name\" content=\"Cees Snoek\" \/>\n<meta property=\"article:published_time\" content=\"2020-03-31T06:53:47+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-03-31T06:54:01+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-1024x375.png\" \/>\n<meta name=\"author\" content=\"Cees\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Cees\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/\",\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/\",\"name\":\"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions - Cees Snoek\",\"isPartOf\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-1024x375.png\",\"datePublished\":\"2020-03-31T06:53:47+00:00\",\"dateModified\":\"2020-03-31T06:54:01+00:00\",\"author\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#primaryimage\",\"url\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020.png\",\"contentUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020.png\",\"width\":1454,\"height\":532},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.ceessnoek.info\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.ceessnoek.info\/#website\",\"url\":\"https:\/\/www.ceessnoek.info\/\",\"name\":\"Cees Snoek\",\"description\":\"research on video and image ai\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.ceessnoek.info\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\",\"name\":\"Cees\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"caption\":\"Cees\"},\"sameAs\":[\"http:\/\/www.CeesSnoek.info\"],\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions - Cees Snoek","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/","og_locale":"en_US","og_type":"article","og_title":"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions - Cees Snoek","og_description":"The CVPR 2020 paper ActionBytes: Learning from Trimmed Videos to Localize Actions by Mihir Jain, Amir Ghodrati and Cees Snoek is now available. This paper tackles the problem of localizing actions in long untrimmed videos. Different from existing works, which all use annotated untrimmed videos during training, we learn only from short trimmed videos. This [&hellip;]","og_url":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/","og_site_name":"Cees Snoek","article_published_time":"2020-03-31T06:53:47+00:00","article_modified_time":"2020-03-31T06:54:01+00:00","og_image":[{"url":"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-1024x375.png","type":"","width":"","height":""}],"author":"Cees","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Cees","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/","url":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/","name":"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions - Cees Snoek","isPartOf":{"@id":"https:\/\/www.ceessnoek.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#primaryimage"},"image":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#primaryimage"},"thumbnailUrl":"http:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020-1024x375.png","datePublished":"2020-03-31T06:53:47+00:00","dateModified":"2020-03-31T06:54:01+00:00","author":{"@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1"},"breadcrumb":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#primaryimage","url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020.png","contentUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2020\/03\/jain-actionbytes-cvpr2020.png","width":1454,"height":532},{"@type":"BreadcrumbList","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-4-4-actionbytes-learning-from-trimmed-videos-to-localize-actions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ceessnoek.info\/"},{"@type":"ListItem","position":2,"name":"CVPR 4\/4: ActionBytes: Learning from Trimmed Videos to Localize Actions"}]},{"@type":"WebSite","@id":"https:\/\/www.ceessnoek.info\/#website","url":"https:\/\/www.ceessnoek.info\/","name":"Cees Snoek","description":"research on video and image ai","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ceessnoek.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1","name":"Cees","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","caption":"Cees"},"sameAs":["http:\/\/www.CeesSnoek.info"],"url":"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/2170","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/comments?post=2170"}],"version-history":[{"count":1,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/2170\/revisions"}],"predecessor-version":[{"id":2172,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/2170\/revisions\/2172"}],"wp:attachment":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/media?parent=2170"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/categories?post=2170"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/tags?post=2170"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}