{"id":3886,"date":"2021-08-17T15:22:40","date_gmt":"2021-08-17T14:22:40","guid":{"rendered":"https:\/\/www.ceessnoek.info\/?p=3886"},"modified":"2021-08-17T15:36:06","modified_gmt":"2021-08-17T14:36:06","slug":"iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale","status":"publish","type":"post","link":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/","title":{"rendered":"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale"},"content":{"rendered":"\n<p>The ICCV 2021 paper &#8220;Motion-Augmented Self-Training for Video Recognition at Smaller Scale&#8221; by Kirill Gavrilyuk, Mihir Jain, Ilia Karmanov\u00a0and Cees Snoek is\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2108.03656\">n<\/a><a href=\"https:\/\/isis-data.science.uva.nl\/cgmsnoek\/pub\/gavrilyuk-motionfit-iccv2021.pdf\">ow available.<\/a> The goal of this paper is to self-train a 3D convolutional neural network on an unlabeled video collection for deployment on small-scale video collections. As smaller video datasets benefit more from motion than appearance, we strive to train our network using optical flow, but avoid its computation during inference. We propose the first motion-augmented self-training regime, we call MotionFit. We start with supervised training of a motion model on a small, and labeled, video collection. With the motion model we generate pseudo-labels for a large unlabeled video collection, which enables us to transfer knowledge by learning to predict these pseudo-labels with an appearance model. Moreover, we introduce a multi-clip loss as a simple yet efficient way to improve the quality of the pseudo-labeling, even without additional auxiliary tasks. We also take into consideration the temporal granularity of videos during self-training of the appearance model, which was missed in previous works. As a result we obtain a strong motion-augmented representation model suited for video downstream tasks like action recognition and clip retrieval. On small-scale video datasets, MotionFit outperforms alternatives for knowledge transfer by 5%-8%, video-only self-supervision by 1%-7% and semi-supervised learning by 9%-18% using the same amount of class labels.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"699\" src=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-1024x699.png\" alt=\"\" class=\"wp-image-3840\" srcset=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-1024x699.png 1024w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-300x205.png 300w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-768x524.png 768w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit.png 1052w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The ICCV 2021 paper &#8220;Motion-Augmented Self-Training for Video Recognition at Smaller Scale&#8221; by Kirill Gavrilyuk, Mihir Jain, Ilia Karmanov\u00a0and Cees Snoek is\u00a0now available. The goal of this paper is to self-train a 3D convolutional neural network on an unlabeled video collection for deployment on small-scale video collections. As smaller video datasets benefit more from motion [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-3886","post","type-post","status-publish","format-standard","hentry","category-science"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale - Cees Snoek<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale - Cees Snoek\" \/>\n<meta property=\"og:description\" content=\"The ICCV 2021 paper &#8220;Motion-Augmented Self-Training for Video Recognition at Smaller Scale&#8221; by Kirill Gavrilyuk, Mihir Jain, Ilia Karmanov\u00a0and Cees Snoek is\u00a0now available. The goal of this paper is to self-train a 3D convolutional neural network on an unlabeled video collection for deployment on small-scale video collections. As smaller video datasets benefit more from motion [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/\" \/>\n<meta property=\"og:site_name\" content=\"Cees Snoek\" \/>\n<meta property=\"article:published_time\" content=\"2021-08-17T14:22:40+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-08-17T14:36:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-1024x699.png\" \/>\n<meta name=\"author\" content=\"Cees\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Cees\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/\",\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/\",\"name\":\"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale - Cees Snoek\",\"isPartOf\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-1024x699.png\",\"datePublished\":\"2021-08-17T14:22:40+00:00\",\"dateModified\":\"2021-08-17T14:36:06+00:00\",\"author\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#primaryimage\",\"url\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit.png\",\"contentUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit.png\",\"width\":1052,\"height\":718},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.ceessnoek.info\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.ceessnoek.info\/#website\",\"url\":\"https:\/\/www.ceessnoek.info\/\",\"name\":\"Cees Snoek\",\"description\":\"research on video and image ai\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.ceessnoek.info\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\",\"name\":\"Cees\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"caption\":\"Cees\"},\"sameAs\":[\"http:\/\/www.CeesSnoek.info\"],\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale - Cees Snoek","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/","og_locale":"en_US","og_type":"article","og_title":"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale - Cees Snoek","og_description":"The ICCV 2021 paper &#8220;Motion-Augmented Self-Training for Video Recognition at Smaller Scale&#8221; by Kirill Gavrilyuk, Mihir Jain, Ilia Karmanov\u00a0and Cees Snoek is\u00a0now available. The goal of this paper is to self-train a 3D convolutional neural network on an unlabeled video collection for deployment on small-scale video collections. As smaller video datasets benefit more from motion [&hellip;]","og_url":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/","og_site_name":"Cees Snoek","article_published_time":"2021-08-17T14:22:40+00:00","article_modified_time":"2021-08-17T14:36:06+00:00","og_image":[{"url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-1024x699.png","type":"","width":"","height":""}],"author":"Cees","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Cees","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/","url":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/","name":"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale - Cees Snoek","isPartOf":{"@id":"https:\/\/www.ceessnoek.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#primaryimage"},"image":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit-1024x699.png","datePublished":"2021-08-17T14:22:40+00:00","dateModified":"2021-08-17T14:36:06+00:00","author":{"@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1"},"breadcrumb":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#primaryimage","url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit.png","contentUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/05\/gavrilyuk-motionfit.png","width":1052,"height":718},{"@type":"BreadcrumbList","@id":"https:\/\/www.ceessnoek.info\/index.php\/iccv-2021-motion-augmented-self-training-for-video-recognition-at-smaller-scale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ceessnoek.info\/"},{"@type":"ListItem","position":2,"name":"ICCV 2021: Motion-Augmented Self-Training for Video Recognition at Smaller Scale"}]},{"@type":"WebSite","@id":"https:\/\/www.ceessnoek.info\/#website","url":"https:\/\/www.ceessnoek.info\/","name":"Cees Snoek","description":"research on video and image ai","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ceessnoek.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1","name":"Cees","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","caption":"Cees"},"sameAs":["http:\/\/www.CeesSnoek.info"],"url":"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/3886","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/comments?post=3886"}],"version-history":[{"count":2,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/3886\/revisions"}],"predecessor-version":[{"id":3891,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/3886\/revisions\/3891"}],"wp:attachment":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/media?parent=3886"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/categories?post=3886"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/tags?post=3886"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}