{"id":28496,"date":"2022-02-17T06:00:00","date_gmt":"2022-02-17T14:00:00","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=28496"},"modified":"2023-06-23T12:43:30","modified_gmt":"2023-06-23T19:43:30","slug":"research-highlights-autodc-automated-data-centric-processing","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/","title":{"rendered":"Research Highlights: AutoDC: Automated Data-centric Processing"},"content":{"rendered":"<div class=\"wp-block-image is-style-default\">\n<figure class=\"alignright size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/02\/Hypergiant_research_highlights.png\" alt=\"\" class=\"wp-image-28498\" width=\"241\" height=\"300\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/02\/Hypergiant_research_highlights.png 482w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/02\/Hypergiant_research_highlights-120x150.png 120w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/02\/Hypergiant_research_highlights-241x300.png 241w\" sizes=\"(max-width: 241px) 100vw, 241px\" \/><\/figure><\/div>\n\n\n<p>Most AutoML solutions are developed with a model-centric approach, however, according to a research paper, &#8220;<a href=\"https:\/\/arxiv.org\/abs\/2111.12548\" target=\"_blank\" rel=\"noreferrer noopener\">AutoDC Automated Data-centric Processing<\/a>,&#8221; that was accepted into last year\u2019s highly selective NeurIPS conference on the development of&nbsp;an automated data-centric tool (AutoDC), it was found to save an estimated 80% of the manual time needed for data set improvement&nbsp;\u2013 typically a bespoke and costly process.<\/p>\n\n\n\n<p>One of the authors, Zac Yung-Chun Liu,&nbsp;Chief Data Scientist at <a href=\"https:\/\/www.hypergiant.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Hypergiant<\/a>&nbsp;and part-time&nbsp;<a href=\"https:\/\/profiles.stanford.edu\/zacycliu\" target=\"_blank\" rel=\"noreferrer noopener\">research associate at Stanford,<\/a>&nbsp;is an expert in the field of AutoML. The paper offers a perspective for practical data-centric AutoML solutions, and how this new no-code and low-code solution is one of many R&amp;D efforts that will push the field forward.<\/p>\n\n\n\n<p><em>Sign up for the free insideBIGDATA&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/insidebigdata.com\/newsletter\/\" target=\"_blank\">newsletter<\/a>.<\/em><\/p>\n\n\n\n<p><em>Join us on Twitter:&nbsp;@InsideBigData1 \u2013 <a href=\"https:\/\/twitter.com\/InsideBigData1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/twitter.com\/InsideBigData1<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most AutoML solutions are developed with a model-centric approach, however, according to a research paper, &#8220;AutoDC Automated Data-centric Processing,&#8221; that was accepted into last year\u2019s highly selective NeurIPS conference on the development of\u00a0an automated data-centric tool (AutoDC), it was found to save an estimated 80% of the manual time needed for data set improvement\u00a0\u2013 typically a bespoke and costly process.<\/p>\n","protected":false},"author":37,"featured_media":28497,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[526,115,182,87,180,67,56,84,1303,1],"tags":[740,277,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Research Highlights: AutoDC: Automated Data-centric Processing - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Research Highlights: AutoDC: Automated Data-centric Processing - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"Most AutoML solutions are developed with a model-centric approach, however, according to a research paper, &quot;AutoDC Automated Data-centric Processing,&quot; that was accepted into last year\u2019s highly selective NeurIPS conference on the development of\u00a0an automated data-centric tool (AutoDC), it was found to save an estimated 80% of the manual time needed for data set improvement\u00a0\u2013 typically a bespoke and costly process.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2022-02-17T14:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-23T19:43:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/02\/Hypergiant_logo.png\" \/>\n\t<meta property=\"og:image:width\" content=\"157\" \/>\n\t<meta property=\"og:image:height\" content=\"129\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Daniel Gutierrez\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AMULETAnalytics\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Gutierrez\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/\",\"url\":\"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/\",\"name\":\"Research Highlights: AutoDC: Automated Data-centric Processing - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2022-02-17T14:00:00+00:00\",\"dateModified\":\"2023-06-23T19:43:30+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research Highlights: AutoDC: Automated Data-centric Processing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\",\"name\":\"Daniel Gutierrez\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"caption\":\"Daniel Gutierrez\"},\"description\":\"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \\\"data scientist\\\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.\",\"sameAs\":[\"http:\/\/www.insidebigdata.com\",\"https:\/\/twitter.com\/@AMULETAnalytics\"],\"url\":\"https:\/\/insidebigdata.com\/author\/dangutierrez\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Research Highlights: AutoDC: Automated Data-centric Processing - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/","og_locale":"en_US","og_type":"article","og_title":"Research Highlights: AutoDC: Automated Data-centric Processing - insideBIGDATA","og_description":"Most AutoML solutions are developed with a model-centric approach, however, according to a research paper, \"AutoDC Automated Data-centric Processing,\" that was accepted into last year\u2019s highly selective NeurIPS conference on the development of\u00a0an automated data-centric tool (AutoDC), it was found to save an estimated 80% of the manual time needed for data set improvement\u00a0\u2013 typically a bespoke and costly process.","og_url":"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2022-02-17T14:00:00+00:00","article_modified_time":"2023-06-23T19:43:30+00:00","og_image":[{"width":157,"height":129,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/02\/Hypergiant_logo.png","type":"image\/png"}],"author":"Daniel Gutierrez","twitter_card":"summary_large_image","twitter_creator":"@AMULETAnalytics","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Daniel Gutierrez","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/","url":"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/","name":"Research Highlights: AutoDC: Automated Data-centric Processing - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2022-02-17T14:00:00+00:00","dateModified":"2023-06-23T19:43:30+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2022\/02\/17\/research-highlights-autodc-automated-data-centric-processing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Research Highlights: AutoDC: Automated Data-centric Processing"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed","name":"Daniel Gutierrez","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","caption":"Daniel Gutierrez"},"description":"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \"data scientist\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.","sameAs":["http:\/\/www.insidebigdata.com","https:\/\/twitter.com\/@AMULETAnalytics"],"url":"https:\/\/insidebigdata.com\/author\/dangutierrez\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2022\/02\/Hypergiant_logo.png","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-7pC","jetpack-related-posts":[{"id":31238,"url":"https:\/\/insidebigdata.com\/2022\/12\/28\/automl-the-future-of-machine-learning\/","url_meta":{"origin":28496,"position":0},"title":"AutoML- The Future of Machine Learning","date":"December 28, 2022","format":false,"excerpt":"In this contributed article, Ankush Gupta and Kavya Shree of FischerJordan, explore the scope, use cases and challenges of AutoML and how data scientists and AutoML can have a future together. The authors discuss the causes driving the use of AutoML, the benefits and challenges associated, and major providers in\u2026","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2022\/12\/FischerJordan_1.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":32588,"url":"https:\/\/insidebigdata.com\/2023\/06\/10\/what-is-automated-machine-learning-automl-how-it-works-and-best-practices\/","url_meta":{"origin":28496,"position":1},"title":"What is Automated Machine Learning (AutoML): How it Works and Best Practices","date":"June 10, 2023","format":false,"excerpt":"In this contributed article, AI, and computer vision enthusiast Melanie Johnson believes that as AutoML continues to progress, it holds the promise of enhancing efficiency and accuracy in machine learning tasks. However, it is crucial to strike a balance between automation and human expertise, leveraging AutoML as a valuable tool\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/12\/Machine_Learning_shutterstock_344688470.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":23654,"url":"https:\/\/insidebigdata.com\/2019\/12\/08\/qeexo-automl-demo-automating-machine-learning-for-embedded-devices\/","url_meta":{"origin":28496,"position":2},"title":"Qeexo AutoML Demo: Automating Machine Learning for Embedded Devices","date":"December 8, 2019","format":false,"excerpt":"Qeexo spun out of Carnegie Mellon University, has for a long time developed multi-touch technology for handset manufacturers which does ML on the device level. It has applied this approach to a new AutoML technology that allows companies to embed ML into hardware and conduct learning on sensor data.","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/12\/Machine_Learning_shutterstock_344688470.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":31500,"url":"https:\/\/insidebigdata.com\/2023\/01\/30\/why-automl-isnt-enough-to-democratize-data-science\/","url_meta":{"origin":28496,"position":3},"title":"Why AutoML Isn\u2019t Enough to Democratize Data Science\u00a0","date":"January 30, 2023","format":false,"excerpt":"In this contributed article, Noam\u00a0Brezis, co-founder and CTO of\u00a0Pecan AI,\u00a0explores that because AutoML was born out of academia, in its current incarnation it is only built to simplify the model building process. This is likely the reason why existing AutoML solutions are finding challenges with scaling. Plus, these types of\u2026","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/04\/DataScience_shutterstock_1054542323.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":23497,"url":"https:\/\/insidebigdata.com\/2019\/10\/30\/automl-in-practice\/","url_meta":{"origin":28496,"position":4},"title":"AutoML in Practice","date":"October 30, 2019","format":false,"excerpt":"The compelling Oct. 15, 2019 presentation below is on behalf of one of my favorite Meetup groups: LA Machine Learning. The talk, \"AutoML in Practice,\" is by Danny D. Leybzon, a Solutions Architect at Qubole, a cloud-native big data platform. Automated Machine Learning (AutoML) is one of the hottest topics\u2026","rel":"","context":"In &quot;Featured&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/03\/machine-learning_SHUTTERSTOCK.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":27462,"url":"https:\/\/insidebigdata.com\/2021\/10\/22\/is-data-a-differentiator-for-your-business-if-so-traditional-ocr-cannot-be-an-answer\/","url_meta":{"origin":28496,"position":5},"title":"Is Data a Differentiator for Your Business? If So, Traditional OCR Cannot Be An Answer","date":"October 22, 2021","format":false,"excerpt":"In this contributed article, Ankur Goyal, CEO and co-founder of Impira, discusses how to make the most of your OCR investment with AutoML. Automated Machine Learning is a nascent AI technology that exposes the power of machine learning (ML) to a much broader audience than data scientists and technologists.","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2021\/10\/algorithms_shutterstock_531875605.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/28496"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=28496"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/28496\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/28497"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=28496"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=28496"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=28496"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}