{"id":3883,"date":"2021-08-12T12:27:41","date_gmt":"2021-08-12T11:27:41","guid":{"rendered":"https:\/\/www.ceessnoek.info\/?p=3883"},"modified":"2021-08-12T12:27:43","modified_gmt":"2021-08-12T11:27:43","slug":"mm-2021-skeleton-contrastive-3d-action-representation-learning","status":"publish","type":"post","link":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/","title":{"rendered":"MM 2021: Skeleton-Contrastive 3D Action Representation Learning"},"content":{"rendered":"\n<p>The ACM Multimedia 2021 paper Skeleton-Contrastive 3D Action Representation Learning by <a href=\"https:\/\/fmthoker.github.io\">Fida Thoker<\/a>, <a href=\"https:\/\/hazeldoughty.github.io\">Hazel Doughty<\/a> and Cees Snoek is <a href=\"https:\/\/arxiv.org\/abs\/2108.03656\">now available<\/a>. This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a noise contrastive estimation. In particular, we propose inter-skeleton contrastive learning, which learns from multiple different input skeleton representations in a cross-contrastive manner. In addition, we contribute several skeleton-specific spatial and temporal augmentations which further encourage the model to learn the spatio-temporal dynamics of skeleton data. By learning similarities between different skeleton representations as well as augmented views of the same sequence, the network is encouraged to learn higher-level semantics of the skeleton data than when only using the augmented views. Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets with multiple downstream tasks, including action recognition, action retrieval and semi-supervised learning. <a href=\"https:\/\/github.com\/fmthoker\/skeleton-contrast\">Code is available<\/a> as well.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"509\" src=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-1024x509.png\" alt=\"\" class=\"wp-image-3873\" srcset=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-1024x509.png 1024w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-300x149.png 300w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-768x382.png 768w, https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive.png 1082w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The ACM Multimedia 2021 paper Skeleton-Contrastive 3D Action Representation Learning by Fida Thoker, Hazel Doughty and Cees Snoek is now available. This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a noise contrastive [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3883","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>MM 2021: Skeleton-Contrastive 3D Action Representation Learning - Cees Snoek<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"MM 2021: Skeleton-Contrastive 3D Action Representation Learning - Cees Snoek\" \/>\n<meta property=\"og:description\" content=\"The ACM Multimedia 2021 paper Skeleton-Contrastive 3D Action Representation Learning by Fida Thoker, Hazel Doughty and Cees Snoek is now available. This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a noise contrastive [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Cees Snoek\" \/>\n<meta property=\"article:published_time\" content=\"2021-08-12T11:27:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-08-12T11:27:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-1024x509.png\" \/>\n<meta name=\"author\" content=\"Cees\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Cees\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/\",\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/\",\"name\":\"MM 2021: Skeleton-Contrastive 3D Action Representation Learning - Cees Snoek\",\"isPartOf\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-1024x509.png\",\"datePublished\":\"2021-08-12T11:27:41+00:00\",\"dateModified\":\"2021-08-12T11:27:43+00:00\",\"author\":{\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#primaryimage\",\"url\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive.png\",\"contentUrl\":\"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive.png\",\"width\":1082,\"height\":538},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.ceessnoek.info\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"MM 2021: Skeleton-Contrastive 3D Action Representation Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.ceessnoek.info\/#website\",\"url\":\"https:\/\/www.ceessnoek.info\/\",\"name\":\"Cees Snoek\",\"description\":\"research on video and image ai\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.ceessnoek.info\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1\",\"name\":\"Cees\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g\",\"caption\":\"Cees\"},\"sameAs\":[\"http:\/\/www.CeesSnoek.info\"],\"url\":\"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"MM 2021: Skeleton-Contrastive 3D Action Representation Learning - Cees Snoek","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/","og_locale":"en_US","og_type":"article","og_title":"MM 2021: Skeleton-Contrastive 3D Action Representation Learning - Cees Snoek","og_description":"The ACM Multimedia 2021 paper Skeleton-Contrastive 3D Action Representation Learning by Fida Thoker, Hazel Doughty and Cees Snoek is now available. This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a noise contrastive [&hellip;]","og_url":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/","og_site_name":"Cees Snoek","article_published_time":"2021-08-12T11:27:41+00:00","article_modified_time":"2021-08-12T11:27:43+00:00","og_image":[{"url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-1024x509.png","type":"","width":"","height":""}],"author":"Cees","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Cees","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/","url":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/","name":"MM 2021: Skeleton-Contrastive 3D Action Representation Learning - Cees Snoek","isPartOf":{"@id":"https:\/\/www.ceessnoek.info\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#primaryimage"},"image":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive-1024x509.png","datePublished":"2021-08-12T11:27:41+00:00","dateModified":"2021-08-12T11:27:43+00:00","author":{"@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1"},"breadcrumb":{"@id":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#primaryimage","url":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive.png","contentUrl":"https:\/\/www.ceessnoek.info\/wp-content\/uploads\/2021\/07\/fida-skeleton-contrastive.png","width":1082,"height":538},{"@type":"BreadcrumbList","@id":"https:\/\/www.ceessnoek.info\/index.php\/mm-2021-skeleton-contrastive-3d-action-representation-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ceessnoek.info\/"},{"@type":"ListItem","position":2,"name":"MM 2021: Skeleton-Contrastive 3D Action Representation Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.ceessnoek.info\/#website","url":"https:\/\/www.ceessnoek.info\/","name":"Cees Snoek","description":"research on video and image ai","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ceessnoek.info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/4bca975b7c432aeb5dced40bdbc204c1","name":"Cees","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ceessnoek.info\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/756ccb993852c1e8e3af39a228d11a7305b2a937750f26dc5799d5df019b0f51?s=96&d=mm&r=g","caption":"Cees"},"sameAs":["http:\/\/www.CeesSnoek.info"],"url":"https:\/\/www.ceessnoek.info\/index.php\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/3883","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/comments?post=3883"}],"version-history":[{"count":1,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/3883\/revisions"}],"predecessor-version":[{"id":3884,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/posts\/3883\/revisions\/3884"}],"wp:attachment":[{"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/media?parent=3883"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/categories?post=3883"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ceessnoek.info\/index.php\/wp-json\/wp\/v2\/tags?post=3883"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}