{"id":28903,"date":"2022-04-09T06:00:00","date_gmt":"2022-04-09T13:00:00","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=28903"},"modified":"2023-06-23T12:41:52","modified_gmt":"2023-06-23T19:41:52","slug":"research-highlights-deep-neural-networks-and-tabular-data-a-survey","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/","title":{"rendered":"Research Highlights: Deep Neural Networks and Tabular Data: A Survey"},"content":{"rendered":"\n<p><strong>Title of Paper:<\/strong> <a href=\"https:\/\/arxiv.org\/pdf\/2110.01889.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Deep Neural Networks and Tabular Data: A Survey<\/a><\/p>\n\n\n\n<p><strong>Author(s):<\/strong> Vadim Borisov, Tobias Leemann, Kathrin Se\u00dfler, Johannes Haug, Martin Pawelczyk and Gjergji Kasneci<\/p>\n\n\n\n<p><strong>Abstract<\/strong>: Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous data sets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their application to modeling tabular data (inference or generation) remains highly challenging. This work provides an overview of state of the art deep learning methods for tabular data. We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models. We then provide a comprehensive overview of the main approaches in each group. A discussion of deep learning approaches for generating tabular data is complemented by strategies for explaining deep models on tabular data. Our primary contribution is to address the main research streams and existing methodologies in this area, while highlighting relevant challenges and open research questions. We also provide an empirical comparison of traditional machine learning methods with deep learning approaches on real tabular data sets of different sizes and with different learning objectives. Our results indicate that algorithms based on gradient-boosted tree ensembles still outperform the deep learning models. To the best of our knowledge, this is the first in-depth look at deep learning approaches for tabular data. This work can serve as a valuable starting point and guide for researchers and practitioners interested in deep learning with tabular data.<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"422\" height=\"566\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/04\/Research_highlights_2.png\" alt=\"\" class=\"wp-image-28904\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/04\/Research_highlights_2.png 422w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/04\/Research_highlights_2-112x150.png 112w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/04\/Research_highlights_2-224x300.png 224w\" sizes=\"(max-width: 422px) 100vw, 422px\" \/><\/figure><\/div>\n\n\n<p><em>Sign up for the free insideBIGDATA&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/insidebigdata.com\/newsletter\/\" target=\"_blank\">newsletter<\/a>.<\/em><\/p>\n\n\n\n<p><em>Join us on Twitter:&nbsp;@InsideBigData1 \u2013 <a href=\"https:\/\/twitter.com\/InsideBigData1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/twitter.com\/InsideBigData1<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this regular column, we take a look at highlights for important research topics of the day for big data, data science, machine learning, AI and deep learning. It\u2019s important to keep connected with the research arm of the field in order to see where we\u2019re headed. In this edition, we feature a new paper showing that for tabular data, algorithms based on gradient-boosted tree ensembles still outperform the deep learning models. Enjoy!<\/p>\n","protected":false},"author":37,"featured_media":22835,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[526,182,87,180,67,56,84,1303,1],"tags":[741,264,1127,277,652,1126,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Research Highlights: Deep Neural Networks and Tabular Data: A Survey - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Research Highlights: Deep Neural Networks and Tabular Data: A Survey - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"In this regular column, we take a look at highlights for important research topics of the day for big data, data science, machine learning, AI and deep learning. It\u2019s important to keep connected with the research arm of the field in order to see where we\u2019re headed. In this edition, we feature a new paper showing that for tabular data, algorithms based on gradient-boosted tree ensembles still outperform the deep learning models. Enjoy!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2022-04-09T13:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-23T19:41:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/06\/Data-Scientist-shutterstock_768047488.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Daniel Gutierrez\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AMULETAnalytics\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Gutierrez\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/\",\"url\":\"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/\",\"name\":\"Research Highlights: Deep Neural Networks and Tabular Data: A Survey - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2022-04-09T13:00:00+00:00\",\"dateModified\":\"2023-06-23T19:41:52+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research Highlights: Deep Neural Networks and Tabular Data: A Survey\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\",\"name\":\"Daniel Gutierrez\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"caption\":\"Daniel Gutierrez\"},\"description\":\"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \\\"data scientist\\\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.\",\"sameAs\":[\"http:\/\/www.insidebigdata.com\",\"https:\/\/twitter.com\/@AMULETAnalytics\"],\"url\":\"https:\/\/insidebigdata.com\/author\/dangutierrez\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Research Highlights: Deep Neural Networks and Tabular Data: A Survey - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/","og_locale":"en_US","og_type":"article","og_title":"Research Highlights: Deep Neural Networks and Tabular Data: A Survey - insideBIGDATA","og_description":"In this regular column, we take a look at highlights for important research topics of the day for big data, data science, machine learning, AI and deep learning. It\u2019s important to keep connected with the research arm of the field in order to see where we\u2019re headed. In this edition, we feature a new paper showing that for tabular data, algorithms based on gradient-boosted tree ensembles still outperform the deep learning models. Enjoy!","og_url":"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2022-04-09T13:00:00+00:00","article_modified_time":"2023-06-23T19:41:52+00:00","og_image":[{"width":300,"height":200,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/06\/Data-Scientist-shutterstock_768047488.jpg","type":"image\/jpeg"}],"author":"Daniel Gutierrez","twitter_card":"summary_large_image","twitter_creator":"@AMULETAnalytics","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Daniel Gutierrez","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/","url":"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/","name":"Research Highlights: Deep Neural Networks and Tabular Data: A Survey - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2022-04-09T13:00:00+00:00","dateModified":"2023-06-23T19:41:52+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2022\/04\/09\/research-highlights-deep-neural-networks-and-tabular-data-a-survey\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Research Highlights: Deep Neural Networks and Tabular Data: A Survey"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed","name":"Daniel Gutierrez","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","caption":"Daniel Gutierrez"},"description":"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \"data scientist\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.","sameAs":["http:\/\/www.insidebigdata.com","https:\/\/twitter.com\/@AMULETAnalytics"],"url":"https:\/\/insidebigdata.com\/author\/dangutierrez\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/06\/Data-Scientist-shutterstock_768047488.jpg","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-7wb","jetpack-related-posts":[{"id":29923,"url":"https:\/\/insidebigdata.com\/2022\/07\/27\/research-highlights-why-do-tree-based-models-still-outperform-deep-learning-on-tabular-data\/","url_meta":{"origin":28903,"position":0},"title":"Research Highlights: Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data?","date":"July 27, 2022","format":false,"excerpt":"In this regular column we take a look at highlights for breaking research topics of the day in the areas of big data, data science, machine learning, AI and deep learning. For data scientists, it\u2019s important to keep connected with the research arm of the field in order to understand\u2026","rel":"","context":"In &quot;Academic&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2022\/07\/Research_highlights_4.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":26685,"url":"https:\/\/insidebigdata.com\/2021\/07\/19\/best-of-arxiv-org-for-ai-machine-learning-and-deep-learning-june-2021\/","url_meta":{"origin":28903,"position":1},"title":"Best of arXiv.org for AI, Machine Learning, and Deep Learning \u2013 June 2021","date":"July 19, 2021","format":false,"excerpt":"In this recurring monthly feature, we will filter all the recent research papers appearing in the arXiv.org preprint server for subjects relating to AI, machine learning and deep learning \u2013 from disciplines including statistics, mathematics and computer science \u2013 and provide you with a useful \u201cbest of\u201d list for the\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/arxiv.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":26235,"url":"https:\/\/insidebigdata.com\/2021\/05\/17\/best-of-arxiv-org-for-ai-machine-learning-and-deep-learning-april-2021\/","url_meta":{"origin":28903,"position":2},"title":"Best of arXiv.org for AI, Machine Learning, and Deep Learning \u2013 April 2021","date":"May 17, 2021","format":false,"excerpt":"In this recurring monthly feature, we will filter all the recent research papers appearing in the arXiv.org preprint server for subjects relating to AI, machine learning and deep learning \u2013 from disciplines including statistics, mathematics and computer science \u2013 and provide you with a useful \u201cbest of\u201d list for the\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/arxiv.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":21207,"url":"https:\/\/insidebigdata.com\/2018\/10\/07\/introduction-deep-learning-neural-networks\/","url_meta":{"origin":28903,"position":3},"title":"An Introduction to Deep Learning and Neural Networks","date":"October 7, 2018","format":false,"excerpt":"In this contributed article, Agile SEO technical writer and editor Limor Wainstein outlines how deep learning, neural networks, and machine learning are not interchangeable terms. This article helps to clarify the definitions for you with an introduction to deep learning and neural networks.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2018\/10\/Neural-Network-diagram.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":32788,"url":"https:\/\/insidebigdata.com\/2023\/07\/05\/research-highlights-scaling-mlps-a-tale-of-inductive-bias\/","url_meta":{"origin":28903,"position":4},"title":"Research Highlights: Scaling MLPs: A Tale of Inductive Bias","date":"July 5, 2023","format":false,"excerpt":"Multi-layer Perceptrons (MLPs) are the most fundamental type of neural network, so they play an important role in many machine learning systems and are the most theoretically studied type of neural network. A new paper from researchers at ETH Zurich pushes the limits of pure MLPs, and shows that scaling\u2026","rel":"","context":"In &quot;Data Science&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/12\/Machine_Learning_shutterstock_344688470.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":21994,"url":"https:\/\/insidebigdata.com\/2019\/01\/16\/best-of-arxiv-org-for-ai-machine-learning-and-deep-learning-december-2018\/","url_meta":{"origin":28903,"position":5},"title":"Best of arXiv.org for AI, Machine Learning, and Deep Learning \u2013 December 2018","date":"January 16, 2019","format":false,"excerpt":"In this recurring monthly feature, we will filter all the recent research papers appearing in the arXiv.org preprint server for subjects relating to AI, machine learning and deep learning \u2013 from disciplines including statistics, mathematics and computer science \u2013 and provide you with a useful \u201cbest of\u201d list for the\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/arxiv.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/28903"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=28903"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/28903\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/22835"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=28903"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=28903"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=28903"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}