{"id":4050,"date":"2022-06-13T08:43:29","date_gmt":"2022-06-13T07:43:29","guid":{"rendered":"https:\/\/www.ceessnoek.info\/?p=4050"},"modified":"2022-06-13T08:44:13","modified_gmt":"2022-06-13T07:44:13","slug":"cvpr-2022-audio-adaptive-activity-recognition-across-video-domains","status":"publish","type":"post","link":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/","title":{"rendered":"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains"},"content":{"rendered":"\n<p>The CVPR 2022 cam-ready for\u00a0<em>Audio-Adaptive Activity Recognition Across Video Domains<\/em> by <a href=\"https:\/\/xiaobai1217.github.io\">Yunhua Zhang<\/a>, Hazel Doughty, Ling Shao, Cees G. M. Snoek is <a href=\"https:\/\/arxiv.org\/abs\/2203.14240\">now available<\/a>. This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training and self-supervised learning. Different from these vision-focused works we leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening. We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation as well as addressing shifts in the semantic distribution. To further eliminate domain-specific features and include domain-invariant activity sounds for recognition, an audio-infused recognizer is proposed, which effectively models the cross-modal interaction across domains. We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically. Experiments on this dataset, EPIC-Kitchens and CharadesEgo show the effectiveness of our approach. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png\"><img loading=\"lazy\" decoding=\"async\" width=\"828\" height=\"786\" src=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png\" alt=\"\" class=\"wp-image-3998\" srcset=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png 828w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive-300x285.png 300w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive-768x729.png 768w\" sizes=\"auto, (max-width: 828px) 100vw, 828px\" \/><\/a><\/figure>\n\n\n\n<p>Project page:\u00a0<a href=\"https:\/\/xiaobai1217. github.io\/Domain\">https:\/\/xiaobai1217. github.io\/DomainAdaptation<\/a><em>.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The CVPR 2022 cam-ready for\u00a0Audio-Adaptive Activity Recognition Across Video Domains by Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek is now available. This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-4050","post","type-post","status-publish","format-standard","hentry","category-science"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains - Cees Snoek<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains - Cees Snoek\" \/>\n<meta property=\"og:description\" content=\"The CVPR 2022 cam-ready for\u00a0Audio-Adaptive Activity Recognition Across Video Domains by Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek is now available. This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/\" \/>\n<meta property=\"og:site_name\" content=\"Cees Snoek\" \/>\n<meta property=\"article:published_time\" content=\"2022-06-13T07:43:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-06-13T07:44:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png\" \/>\n<meta name=\"author\" content=\"Cees\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Cees\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/\",\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/\",\"name\":\"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains - Cees Snoek\",\"isPartOf\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png\",\"datePublished\":\"2022-06-13T07:43:29+00:00\",\"dateModified\":\"2022-06-13T07:44:13+00:00\",\"author\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#primaryimage\",\"url\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png\",\"contentUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png\",\"width\":828,\"height\":786},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.ceessnoek.info\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.ceessnoek.info\/#website\",\"url\":\"https:\/\/www.ceessnoek.info\/\",\"name\":\"Cees Snoek\",\"description\":\"research on video and image ai\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.ceessnoek.info\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\",\"name\":\"Cees\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"caption\":\"Cees\"},\"sameAs\":[\"http:\/\/www.CeesSnoek.info\"],\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains - Cees Snoek","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/","og_locale":"en_US","og_type":"article","og_title":"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains - Cees Snoek","og_description":"The CVPR 2022 cam-ready for\u00a0Audio-Adaptive Activity Recognition Across Video Domains by Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek is now available. This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training [&hellip;]","og_url":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/","og_site_name":"Cees Snoek","article_published_time":"2022-06-13T07:43:29+00:00","article_modified_time":"2022-06-13T07:44:13+00:00","og_image":[{"url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png","type":"","width":"","height":""}],"author":"Cees","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Cees","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/","url":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/","name":"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains - Cees Snoek","isPartOf":{"@id":"https:\/\/www.ceessnoek.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#primaryimage"},"image":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#primaryimage"},"thumbnailUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png","datePublished":"2022-06-13T07:43:29+00:00","dateModified":"2022-06-13T07:44:13+00:00","author":{"@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1"},"breadcrumb":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#primaryimage","url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png","contentUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/03\/yunhua-audio-adaptive.png","width":828,"height":786},{"@type":"BreadcrumbList","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-audio-adaptive-activity-recognition-across-video-domains\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ceessnoek.info\/"},{"@type":"ListItem","position":2,"name":"CVPR 2022: Audio-Adaptive Activity Recognition Across Video Domains"}]},{"@type":"WebSite","@id":"https:\/\/www.ceessnoek.info\/#website","url":"https:\/\/www.ceessnoek.info\/","name":"Cees Snoek","description":"research on video and image ai","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ceessnoek.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1","name":"Cees","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","caption":"Cees"},"sameAs":["http:\/\/www.CeesSnoek.info"],"url":"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/4050","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/comments?post=4050"}],"version-history":[{"count":1,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/4050\/revisions"}],"predecessor-version":[{"id":4051,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/4050\/revisions\/4051"}],"wp:attachment":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/media?parent=4050"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/categories?post=4050"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/tags?post=4050"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}