{"id":25052,"date":"2020-09-29T06:00:00","date_gmt":"2020-09-29T13:00:00","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=25052"},"modified":"2020-09-30T08:37:21","modified_gmt":"2020-09-30T15:37:21","slug":"how-companies-can-gain-value-from-small-data","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/","title":{"rendered":"How Companies Can Gain Value From Small Data"},"content":{"rendered":"\n<p>Big data is all the rage today, and rightfully so. State-of-the-art language models powered by big data, like <a href=\"https:\/\/arxiv.org\/abs\/2005.14165\" target=\"_blank\" rel=\"noreferrer noopener\">GPT-3<\/a>, can write beautiful prose, create realistic news articles, translate text, write functional code in any language, and more. Further, state-of-the-art vision models trained on massive datasets are bringing us towards level 5\u2014or fully autonomous\u2014<a href=\"https:\/\/www.reuters.com\/article\/us-tesla-autonomous\/tesla-very-close-to-level-5-autonomous-driving-technology-musk-says-idUSKBN24A0HE\" target=\"_blank\" rel=\"noreferrer noopener\">self driving cars<\/a>.<\/p>\n\n\n\n<p>While big data can fuel astonishing results, organizations can gain value from \u201csmall data\u201d as well. In this article, I\u2019ll highlight four ways to circumvent the need for big data.<\/p>\n\n\n\n<p><strong>1. Exploratory Analysis<\/strong><\/p>\n\n\n\n<p>Whether you\u2019re working with big or small data, you should understand your data before you try to gain deep insights from it. This includes calculating simple descriptive statistics, like count, mean, quartiles, the minimum, the maximum, and so on.<\/p>\n\n\n\n<p>Slightly more complex analyses include histograms, scatterplots, pie charts, and so forth. Further, correlation analyses can be done to confirm or reject hypotheses about how the data is related. You\u2019ll also want to analyze data quality, and deal with problems like missing data and outliers.<\/p>\n\n\n\n<p>Anything that helps you understand the data itself should be done at this stage.<\/p>\n\n\n\n<p><strong>2. Basic Machine Learning Models<\/strong><\/p>\n\n\n\n<p>Machine learning is a lot more than just deep learning, and alternative techniques like decision trees are far simpler, more explainable, and more resource efficient, while working well with less data.<\/p>\n\n\n\n<p>Slightly more complex techniques, like Random Forest and Support Vector Machines, also work great on smaller datasets, while still being much easier to set up than neural networks.<\/p>\n\n\n\n<p>Highly complex techniques like deep learning shine for tasks like image classification and natural language processing. For these kinds of problems, having more data is almost always better. That being said, there are even ways to combine these approaches, such as with<a href=\"https:\/\/github.com\/alvinwan\/neural-backed-decision-trees\"> neural-backed decision trees<\/a>, that offer high accuracy on tasks like image classification, while maintaining the relative simplicity and explainability of decision trees.<\/p>\n\n\n\n<p><strong>3.Transfer Learning<\/strong><\/p>\n\n\n\n<p>Another method is transfer learning, which allows you to transfer the knowledge learned in one dataset and apply it to another dataset. As a result, you don\u2019t have to start from scratch, and you can train machine learning models with far less data.<\/p>\n\n\n\n<p>For example, companies can currently beta test OpenAI\u2019s GPT-3 model, which allows you to generate natural language of any kind, without needing to train on any data at all. This is an example of zero-shot learning. To increase the model&#8217;s accuracy for your specific use-case, you can train the model on a small amount of your own data, known as few-shot learning.<\/p>\n\n\n\n<p>In either case, the model is already trained on a corpus of almost the entire Internet\u2019s text, and the learning is available for you to get an accurate language model out-of-the-box.<\/p>\n\n\n\n<p>For other tasks, like image classification, you can apply transfer learning using models like VGG16 or ResNet50.<\/p>\n\n\n\n<p><strong>4. AutoML<\/strong><\/p>\n\n\n\n<p>Another method to quickly deploy AI, without needing big data, is by using turn-key automated machine learning solutions that are pre-trained on big datasets.<\/p>\n\n\n\n<p>Some products include Google Cloud\u2019s AutoML, Salesforce Einstein AutoML, Microsoft Azure AI, and Amazon AutoGluon. With so many options to choose from, AutoML is a great way to implement AI in your organization, even if you don\u2019t have big data.<\/p>\n\n\n\n<p><strong>Conclusion<\/strong><\/p>\n\n\n\n<p>It\u2019s a common misconception that machine learning needs big data. Statisticians have been working with small data for decades, and techniques like exploratory analysis, classical machine learning, and AutoML are great ways to gain insights from any data set, no matter the size.<\/p>\n\n\n\n<p><strong>About the Author<\/strong><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignleft size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"125\" height=\"128\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2020\/09\/Shanif-Dhanani.jpeg\" alt=\"\" class=\"wp-image-25053\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2020\/09\/Shanif-Dhanani.jpeg 125w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2020\/09\/Shanif-Dhanani-50x50.jpeg 50w\" sizes=\"(max-width: 125px) 100vw, 125px\" \/><\/figure><\/div>\n\n\n\n<p><em>Shanif Dhanani is a former Twitter data scientist and engineer turned CEO of <a rel=\"noreferrer noopener\" href=\"https:\/\/www.apteo.co\/\" target=\"_blank\">Apteo<\/a>. Apteo is a no-code analytics platform anyone can use in a matter of minutes to extract deep insights in their data.<\/em><\/p>\n\n\n\n<p><em>Sign up for the free insideBIGDATA&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/insidebigdata.com\/newsletter\/\" target=\"_blank\">newsletter<\/a>.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this contributed article, Shanif Dhanani, CEO of Apteo, highlights four ways to circumvent the need for big data. While big data can fuel astonishing results, organizations can gain value from \u201csmall data\u201d as well. <\/p>\n","protected":false},"author":10513,"featured_media":22835,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[526,115,182,87,180,67,56,97,1],"tags":[740,935,277,934,936,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How Companies Can Gain Value From Small Data - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Companies Can Gain Value From Small Data - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"In this contributed article, Shanif Dhanani, CEO of Apteo, highlights four ways to circumvent the need for big data. While big data can fuel astonishing results, organizations can gain value from \u201csmall data\u201d as well.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2020-09-29T13:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-09-30T15:37:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/06\/Data-Scientist-shutterstock_768047488.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/\",\"url\":\"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/\",\"name\":\"How Companies Can Gain Value From Small Data - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2020-09-29T13:00:00+00:00\",\"dateModified\":\"2020-09-30T15:37:21+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How Companies Can Gain Value From Small Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\",\"name\":\"Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"caption\":\"Editorial Team\"},\"sameAs\":[\"http:\/\/www.insidebigdata.com\"],\"url\":\"https:\/\/insidebigdata.com\/author\/editorial\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How Companies Can Gain Value From Small Data - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/","og_locale":"en_US","og_type":"article","og_title":"How Companies Can Gain Value From Small Data - insideBIGDATA","og_description":"In this contributed article, Shanif Dhanani, CEO of Apteo, highlights four ways to circumvent the need for big data. While big data can fuel astonishing results, organizations can gain value from \u201csmall data\u201d as well.","og_url":"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2020-09-29T13:00:00+00:00","article_modified_time":"2020-09-30T15:37:21+00:00","og_image":[{"width":300,"height":200,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/06\/Data-Scientist-shutterstock_768047488.jpg","type":"image\/jpeg"}],"author":"Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@insideBigData","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Editorial Team","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/","url":"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/","name":"How Companies Can Gain Value From Small Data - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2020-09-29T13:00:00+00:00","dateModified":"2020-09-30T15:37:21+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2020\/09\/29\/how-companies-can-gain-value-from-small-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"How Companies Can Gain Value From Small Data"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9","name":"Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","caption":"Editorial Team"},"sameAs":["http:\/\/www.insidebigdata.com"],"url":"https:\/\/insidebigdata.com\/author\/editorial\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/06\/Data-Scientist-shutterstock_768047488.jpg","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-6w4","jetpack-related-posts":[{"id":12317,"url":"https:\/\/insidebigdata.com\/2014\/11\/09\/ask-data-scientist-importance-exploratory-data-analysis\/","url_meta":{"origin":25052,"position":0},"title":"Ask a Data Scientist: The Importance of Exploratory Data Analysis","date":"November 9, 2014","format":false,"excerpt":"Q: What is the role of exploratory data analysis in data science?","rel":"","context":"In &quot;Ask a Data Scientist&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":9648,"url":"https:\/\/insidebigdata.com\/2014\/06\/05\/data-munging-exploratory-data-analysis-feature-engineering\/","url_meta":{"origin":25052,"position":1},"title":"Data Munging, Exploratory Data Analysis, and Feature Engineering","date":"June 5, 2014","format":false,"excerpt":"To help our audience leverage the power of machine learning, the editors of insideBIGDATA have created this weekly article series called \u201cThe insideBIGDATA Guide to Machine Learning.\u201d This is our fourth installment, \"Data Munging, Exploratory Data Analysis, and Feature Engineering.\"","rel":"","context":"In &quot;Featured&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2014\/05\/inisde-big-data-guide-to-machine-learning.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":25543,"url":"https:\/\/insidebigdata.com\/2021\/01\/27\/book-review-hands-on-exploratory-data-analysis-with-python\/","url_meta":{"origin":25052,"position":2},"title":"Book Review: Hands-On Exploratory Data Analysis with Python","date":"January 27, 2021","format":false,"excerpt":"The new data science title \"Hands-On Exploratory Data Analysis with Python,\" by Suresh Kumar Mukhiya and Usman Ahmed from Packt Publshing is a welcome addition to the growing list of books directed to help newbie data scientists improve their skills. I'm always on the lookout for texts that can help\u2026","rel":"","context":"In &quot;Analytics&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2021\/01\/Packt_Hands-on-EDA-with-Python.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":9167,"url":"https:\/\/insidebigdata.com\/2014\/05\/13\/insidebigdata-guide-machine-learning\/","url_meta":{"origin":25052,"position":3},"title":"insideBIGDATA Guide to Machine Learning","date":"May 13, 2014","format":false,"excerpt":"As the primary facilitator of data science and big data, machine learning has garnered much interest by a broad range of industries as a way to increase value of enterprise data assets. In this article series we\u2019ll examine the principles underlying machine learning based on the R statistical environment.","rel":"","context":"In &quot;Featured&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2014\/05\/inisde-big-data-guide-to-machine-learning.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":32878,"url":"https:\/\/insidebigdata.com\/2023\/07\/25\/video-highlights-generative-ai-with-large-language-models\/","url_meta":{"origin":25052,"position":4},"title":"Video Highlights: Generative AI with Large Language Models","date":"July 25, 2023","format":false,"excerpt":"At an unprecedented pace, Large Language Models like GPT-4 are transforming the world in general and the field of data science in particular. This two-hour training video presentation by Jon Krohn, Co-Founder and Chief Data Scientist at the machine learning company Nebula, introduces deep learning transformer architectures including LLMs.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/06\/GenerativeAI_shutterstock_2313909647_special.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":8435,"url":"https:\/\/insidebigdata.com\/2014\/03\/29\/big-data-humor-power-pie\/","url_meta":{"origin":25052,"position":5},"title":"Big Data Humor: Power of the Pie","date":"March 29, 2014","format":false,"excerpt":"Pie charts are typically deemed the least useful type of plot to data scientists during the Exploratory Data Analysis phase of a machine learning project, but in this case I think it works! \u00a0 \u00a0 Sign up for the free insideBIGDATA newsletter.","rel":"","context":"In &quot;Opinion&quot;","img":{"alt_text":"Humor_pacman","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2014\/03\/Humor_pacman.jpg?resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/25052"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/10513"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=25052"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/25052\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/22835"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=25052"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=25052"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=25052"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}