{"id":17178,"date":"2017-02-18T05:00:59","date_gmt":"2017-02-18T13:00:59","guid":{"rendered":"http:\/\/insidebigdata.com\/?p=17178"},"modified":"2017-02-19T11:01:06","modified_gmt":"2017-02-19T19:01:06","slug":"machine-learning-matters","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/","title":{"rendered":"Machine Learning: Why it Matters?"},"content":{"rendered":"<p>Are you into Machine Learning OR are you &#8220;just&#8221; a Statistician? Have you been asked this question yet? If you are in a career or looking to get into one that has anything to do with deriving insights out of data, you probably know what I am talking about.<\/p>\n<p>The year 2016 has seen over three dozen machine learning startups being acquired by tech giants; another several dozen machine learning startups raked up a aggregate funding to the tune of $4 Billion worldwide. Is it a blip or a bubble? Definitely not. In times when automation is key, it was but imperative that we figure out methods of data analysis &amp; model building that automates data analysis &amp; model building. Sounds tautological? It is. And in a way that is what machine learning is &#8230; err &#8230; rather does. It picks up right where traditional statistical models stop. It&#8217;s all about building algorithms that learn iteratively from data. The more data you feed it, the better results it churns out.<\/p>\n<p>While conceptually machine learning has been around for more than 80 years (recent history dates it back to World War II and Turing), the recent frenzy around it can be attributed to the overall advances and affordability in computing power. While manually getting these models to improve themselves through numerous iterations may seem tedious, if not impossible, a modern computer fed with the algorithm can get these models to learn, grow, change, and develop by themselves in a matter of seconds &#8230; and we are already talking &#8220;real-time!&#8221; What more, they can look for insights without being told exactly where to look for insights a.k.a. dealing with unstructured data (think social media, web-searches). It iterates, learns new stuff, and adapts, and iterates and continues the whole process all over again learning from new data every time.\u00a0It really embodies the adage that practice makes perfect.<\/p>\n<p>Now if you put this in the context of the self-driving cars, or the recommender engines in Netflix or Amazon &#8211; you can see why such algorithms that generate decisions out of data real time without human intervention, would be key to where we are headed both in terms of technology and user experience.\u00a0 It is machine learning that has turned the &#8220;hype&#8221; around the importance of &#8220;big data&#8221; into a reality. When availability of more data could have caused concerns around it&#8217;s usability for deriving meaningful insights, it was machine learning that came to the rescue. Let&#8217;s just say that compared to traditional statistical methods which dealt with static models, machine learning is more in tune with the current times and it&#8217;s needs.<\/p>\n<p>The discussion becomes a bit more exciting and a little more tangible when we start considering some problems where machine learning is a clear improvement over traditional statistical methods (although a strong caveat here would be &#8230; a lot of machine learning techniques are really enhancements or extensions of their &#8220;statistical&#8221; counterparts).<\/p>\n<p>Let&#8217;s start with <em>Pattern Recognition<\/em>: it is the essence of solving a lot of business problems that rely on regularities in the gathered data to make predictions. It is also what is called &#8220;supervised learning&#8221; in machine learning parlance. While a traditional classification and regression model can give you a prediction, it is a &#8220;closed-form&#8221; solution which of course is static. A machine learning technique called gradient boosting takes the same approach but iteratively and continuously searches for a &#8220;local minimum&#8221; and adapts as it learns. Ever had your credit card declined when using it a gas station that&#8217;s not your &#8220;usual&#8221; one? In most cases that is supervised (or semi-supervised) learning models at work for you.<\/p>\n<p>But given supervised learning deals prominently with &#8220;historical data&#8221; or a &#8220;training set,&#8221; it has to live with it&#8217;s own limitations of being unable to predict in situations where you have no past data to predict the future. While the traditional k-means or hierarchical clustering can in theory be applied, when you are dealing with a deluge of transactional data, you need to apply unsupervised learning techniques like ANN or GMM to explore the surpassed data and find structure within it. So the next time you see the &#8220;Also recommended for you&#8221; while shopping on Amazon.com, know that every new item you searched, every new item you saved, and every new item you bought, were factored into those recommendations by those unsupervised machine learning algorithms within a matter of a few seconds.<\/p>\n<p>If you are dealing with discrete outcome variables, logistic regression techniques naturally come to mind, but fall short when it comes to dealing with complex data sets &#8211; and hence the need for looking into Support Vector Regressions or Hierarchical Bayesian models. If you are looking at a large but well-behaved data set the good ol&#8217; logit will still work great; but with the big bad ugly ones you probably have to resort to the much talked about Bagged Regression technique, which by the way is nothing more than a thousand logits estimated through random sample draws from the mother data set, which are then averaged to minimize bias and variance.<\/p>\n<p>Of course, we are often looking at blurred lines when applying supervised, semi-supervised, and unsupervised learning techniques to business problems. When you first open an YouTube account, the recommended content served on your home page has to be based on in-session click-stream based unsupervised models. With repeat visits, as the system learns the depth and spread of your viewing preference, semi-supervised dynamic models with temporal aspects akin to Hidden-Markov-Models probably kick-in. (I am not commenting on what algorithms YouTube actually applies to these problems &#8211; rather I am stating what could potentially be applied)<\/p>\n<p>In conclusion&#8230;<\/p>\n<p>One way to view machine learning is as though it were statistics on steroids: keep building thousands of trees with simpler assumptions and then average them up to derive predictions yielding lower bias and variance. \u00a0Machine learning is concerned\u00a0more by the accuracy of final predictions rather than the laundry list of underlying distributions and\u00a0asymptotic tests in statistical methods. That doesn&#8217;t necessarily mean that the math is not complex &#8211; it just says that the intent is much simpler to understand.\u00a0Contrary to the common myth, all machine learning techniques are NOT adaptive (actually only the generative algorithms are). And no, you will not be expected to dive into the deep-end of writing algorithms which require &#8220;real time&#8221; on Day 1.<\/p>\n<p>Enough said?<\/p>\n<p><em><img decoding=\"async\" loading=\"lazy\" class=\"alignleft wp-image-13461\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/07\/Smita-Adhikary.jpg\" alt=\"\" width=\"101\" height=\"132\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/07\/Smita-Adhikary.jpg 151w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/07\/Smita-Adhikary-150x197.jpg 150w\" sizes=\"(max-width: 101px) 100vw, 101px\" \/>Contributed by: Smita Adhikary, Managing Consultant at <a href=\"http:\/\/www.bigdataanalyticshires.com\/\" target=\"_blank\">Big Data Analytics Hires<\/a> &#8211; a\u00a0talent search and recruiting firm focused primarily on Data Science and Decision Science professionals.\u00a0Having started her career as a &#8220;quant&#8221; more than a decade ago building scorecards and statistical models for banks and credit card companies and having spent many years in management consulting, she has witnessed from very close quarters the transformation brought about by the advent of \u201cBig Data\u201d in the skill-sets desired in &#8220;quants.&#8221; Like most &#8220;quants&#8221; she holds a Masters in Economics and like a lot management consultants an MBA from Kellogg School of Management.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><em>Sign up for the free insideBIGDATA\u00a0<a href=\"http:\/\/insidebigdata.com\/newsletter\/\" target=\"_blank\">newsletter<\/a>.<\/em><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this contributed article, Smita Adhikary, Managing Consultant at Big Data Analytics Hires, provides a whirlwind overview of machine learning technology and why it&#8217;s important to increasing the value of enterprise data assets. <\/p>\n","protected":false},"author":10513,"featured_media":13461,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[115,170,87,180,67,56,97,1],"tags":[277,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Machine Learning: Why it Matters? - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Machine Learning: Why it Matters? - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"In this contributed article, Smita Adhikary, Managing Consultant at Big Data Analytics Hires, provides a whirlwind overview of machine learning technology and why it&#039;s important to increasing the value of enterprise data assets.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2017-02-18T13:00:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-02-19T19:01:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/07\/Smita-Adhikary.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"151\" \/>\n\t<meta property=\"og:image:height\" content=\"197\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/\",\"url\":\"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/\",\"name\":\"Machine Learning: Why it Matters? - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2017-02-18T13:00:59+00:00\",\"dateModified\":\"2017-02-19T19:01:06+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Machine Learning: Why it Matters?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\",\"name\":\"Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"caption\":\"Editorial Team\"},\"sameAs\":[\"http:\/\/www.insidebigdata.com\"],\"url\":\"https:\/\/insidebigdata.com\/author\/editorial\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Machine Learning: Why it Matters? - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/","og_locale":"en_US","og_type":"article","og_title":"Machine Learning: Why it Matters? - insideBIGDATA","og_description":"In this contributed article, Smita Adhikary, Managing Consultant at Big Data Analytics Hires, provides a whirlwind overview of machine learning technology and why it's important to increasing the value of enterprise data assets.","og_url":"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2017-02-18T13:00:59+00:00","article_modified_time":"2017-02-19T19:01:06+00:00","og_image":[{"width":151,"height":197,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/07\/Smita-Adhikary.jpg","type":"image\/jpeg"}],"author":"Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@insideBigData","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Editorial Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/","url":"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/","name":"Machine Learning: Why it Matters? - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2017-02-18T13:00:59+00:00","dateModified":"2017-02-19T19:01:06+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2017\/02\/18\/machine-learning-matters\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Machine Learning: Why it Matters?"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9","name":"Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","caption":"Editorial Team"},"sameAs":["http:\/\/www.insidebigdata.com"],"url":"https:\/\/insidebigdata.com\/author\/editorial\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/07\/Smita-Adhikary.jpg","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-4t4","jetpack-related-posts":[{"id":4240,"url":"https:\/\/insidebigdata.com\/2013\/09\/20\/machine-learning-spread-big-data-spending-skyrockets\/","url_meta":{"origin":17178,"position":0},"title":"Machine Learning to Spread as Big Data Spending Skyrockets","date":"September 20, 2013","format":false,"excerpt":"Machine learning and its application in advanced analytics is one area that will make both the public and private sectors data-savvier than anything we\u2019ve seen so far.","rel":"","context":"In &quot;Enterprise&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6639,"url":"https:\/\/insidebigdata.com\/2014\/01\/13\/data-science-101-machine-learning-part-2\/","url_meta":{"origin":17178,"position":1},"title":"Data Science 101: Machine Learning, Part 2","date":"January 13, 2014","format":false,"excerpt":"The \"How Machine Learning Works\" lecture series continues by building on fundamental definitions of statistics. This is needed for any rigorous analysis of models or machine learning algorithms.","rel":"","context":"In &quot;Data Science 101&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6805,"url":"https:\/\/insidebigdata.com\/2014\/01\/24\/data-science-101-machine-learning-part-4\/","url_meta":{"origin":17178,"position":2},"title":"Data Science 101: Machine Learning, Part 4","date":"January 24, 2014","format":false,"excerpt":"The \"How Machine Learning Works\" lecture series continues by building on top of the Bayesian classifier developed in Part 3 of the series. We'll build an expectation-maximization (EM) algorithm that locally maximizes the likelihood function.","rel":"","context":"In &quot;Data Science 101&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":21328,"url":"https:\/\/insidebigdata.com\/2018\/10\/25\/big-data-machine-learning-essential-cyber-security\/","url_meta":{"origin":17178,"position":3},"title":"Why Big Data and Machine Learning are Essential for Cyber Security","date":"October 25, 2018","format":false,"excerpt":"In this contributed article, Shachar Shamir, COO of Ranky, suggests that big data and machine learning are essential for cyber security. Using machine learning to automate attack detection and response, companies can have a quick and robust cyber defense system, one where security professionals work side-by-side sophisticated automated tools.","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12727,"url":"https:\/\/insidebigdata.com\/2015\/02\/15\/skytree-automates-data-science-new-enterprise-class-machine-learning-platform\/","url_meta":{"origin":17178,"position":4},"title":"Skytree Automates Data Science with New Enterprise Class Machine Learning Platform","date":"February 15, 2015","format":false,"excerpt":"Skytree\u00ae, a leader in enterprise machine learning on big data, has announced Skytree InfinityTM 15.1 to make machine learning accessible to more users and increase the productivity of data scientists. Skytree Infinity 15.1 pairs its high performance, scalable and highly accurate machine learning algorithms with increased automation and ease of\u2026","rel":"","context":"In &quot;Featured&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":22422,"url":"https:\/\/insidebigdata.com\/2019\/04\/07\/splice-machine-launches-ml-manager-beta-program-to-meet-the-growing-demand-for-operational-ai\/","url_meta":{"origin":17178,"position":5},"title":"Splice Machine Launches ML Manager Beta Program to Meet the Growing Demand for Operational AI","date":"April 7, 2019","format":false,"excerpt":"Splice Machine, the operational artificial intelligence (AI) data platform, announced the launch of a beta program for ML Manager, a native data science and machine learning platform. Operating on top of Splice Machine's data platform, ML Manager empowers data science teams to maximize the performance of their machine learning models\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/17178"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/10513"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=17178"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/17178\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/13461"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=17178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=17178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=17178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}