{"id":4047,"date":"2022-06-13T08:36:34","date_gmt":"2022-06-13T07:36:34","guid":{"rendered":"https:\/\/www.ceessnoek.info\/?p=4047"},"modified":"2022-06-13T08:38:07","modified_gmt":"2022-06-13T07:38:07","slug":"cvpr-2022-tuber-tubelet-transformer-for-video-action-detection","status":"publish","type":"post","link":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/","title":{"rendered":"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection"},"content":{"rendered":"\n<p>The CVPR 2022 cam-ready for <em>TubeR: Tubelet Transformer for Video Action Detection<\/em> by <a href=\"https:\/\/staff.fnwi.uva.nl\/j.zhao3\/\">Jiaojiao Zhao<\/a> et al. is <a href=\"https:\/\/arxiv.org\/abs\/2104.00969\">now available<\/a>. We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns a set of tubelet- queries and utilizes a tubelet-attention module to model the dynamic spatio-temporal nature of a video clip, which effectively reinforces the model capacity compared to using actor-positional hypotheses in the spatio-temporal space. For videos containing transitional states or scene changes, we propose a context aware classification head to utilize short-term and long-term context to strengthen action classification, and an action switch regression head for detecting the precise temporal action extent. TubeR directly produces action tubelets with variable lengths and even maintains good results for long video clips. TubeR outperforms the previous state-of-the-art on commonly used action detection datasets AVA, UCF101-24 and JHMDB51-21. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1034\" height=\"660\" src=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png\" alt=\"\" class=\"wp-image-4014\" srcset=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png 1034w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber-300x191.png 300w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber-1024x654.png 1024w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber-768x490.png 768w\" sizes=\"auto, (max-width: 1034px) 100vw, 1034px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The CVPR 2022 cam-ready for TubeR: Tubelet Transformer for Video Action Detection by Jiaojiao Zhao et al. is now available. We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-4047","post","type-post","status-publish","format-standard","hentry","category-science"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection - Cees Snoek<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection - Cees Snoek\" \/>\n<meta property=\"og:description\" content=\"The CVPR 2022 cam-ready for TubeR: Tubelet Transformer for Video Action Detection by Jiaojiao Zhao et al. is now available. We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/\" \/>\n<meta property=\"og:site_name\" content=\"Cees Snoek\" \/>\n<meta property=\"article:published_time\" content=\"2022-06-13T07:36:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-06-13T07:38:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png\" \/>\n<meta name=\"author\" content=\"Cees\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Cees\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/\",\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/\",\"name\":\"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection - Cees Snoek\",\"isPartOf\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png\",\"datePublished\":\"2022-06-13T07:36:34+00:00\",\"dateModified\":\"2022-06-13T07:38:07+00:00\",\"author\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#primaryimage\",\"url\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png\",\"contentUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png\",\"width\":1034,\"height\":660},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.ceessnoek.info\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.ceessnoek.info\/#website\",\"url\":\"https:\/\/www.ceessnoek.info\/\",\"name\":\"Cees Snoek\",\"description\":\"research on video and image ai\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.ceessnoek.info\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\",\"name\":\"Cees\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"caption\":\"Cees\"},\"sameAs\":[\"http:\/\/www.CeesSnoek.info\"],\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection - Cees Snoek","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/","og_locale":"en_US","og_type":"article","og_title":"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection - Cees Snoek","og_description":"The CVPR 2022 cam-ready for TubeR: Tubelet Transformer for Video Action Detection by Jiaojiao Zhao et al. is now available. We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly [&hellip;]","og_url":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/","og_site_name":"Cees Snoek","article_published_time":"2022-06-13T07:36:34+00:00","article_modified_time":"2022-06-13T07:38:07+00:00","og_image":[{"url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png","type":"","width":"","height":""}],"author":"Cees","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Cees","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/","url":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/","name":"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection - Cees Snoek","isPartOf":{"@id":"https:\/\/www.ceessnoek.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#primaryimage"},"image":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#primaryimage"},"thumbnailUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png","datePublished":"2022-06-13T07:36:34+00:00","dateModified":"2022-06-13T07:38:07+00:00","author":{"@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1"},"breadcrumb":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#primaryimage","url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png","contentUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2022\/04\/jj-tuber.png","width":1034,"height":660},{"@type":"BreadcrumbList","@id":"https:\/\/www.ceessnoek.info\/index.php\/cvpr-2022-tuber-tubelet-transformer-for-video-action-detection\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ceessnoek.info\/"},{"@type":"ListItem","position":2,"name":"CVPR 2022: TubeR: Tubelet Transformer for Video Action Detection"}]},{"@type":"WebSite","@id":"https:\/\/www.ceessnoek.info\/#website","url":"https:\/\/www.ceessnoek.info\/","name":"Cees Snoek","description":"research on video and image ai","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ceessnoek.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1","name":"Cees","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","caption":"Cees"},"sameAs":["http:\/\/www.CeesSnoek.info"],"url":"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/4047","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/comments?post=4047"}],"version-history":[{"count":2,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/4047\/revisions"}],"predecessor-version":[{"id":4049,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/4047\/revisions\/4049"}],"wp:attachment":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/media?parent=4047"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/categories?post=4047"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/tags?post=4047"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}