{"id":20464,"date":"2018-05-31T08:30:05","date_gmt":"2018-05-31T15:30:05","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=20464"},"modified":"2018-06-01T08:54:14","modified_gmt":"2018-06-01T15:54:14","slug":"data-science-101-handling-missing-data-revisited","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/","title":{"rendered":"Data Science 101: Handling Missing Data (Revisited)"},"content":{"rendered":"<p><img decoding=\"async\" loading=\"lazy\" class=\"alignright size-full wp-image-6406\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/DataScience101.jpg\" alt=\"\" width=\"361\" height=\"54\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/DataScience101.jpg 361w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/DataScience101-300x44.jpg 300w\" sizes=\"(max-width: 361px) 100vw, 361px\" \/>I recently received the following question on data science methods from an avid reader of insideBIGDATA who hails from Taiwan. I think the topics are very relevant to many folks in our audience so I decided to run it here in our <em>Data Science 101<\/em> channel. If you have more thoughts on the subject, or would like to contribute to the discussion, please leave a comment below.<\/p>\n<p><strong>Question: <\/strong>I&#8217;ve been reading your articles on insideBIGDATA and have learned so much!<strong>\u00a0<\/strong>I started my own pet project, trying to predict whether the person has diabetes based on a few features. Here are the top 5 rows of the data set.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-20465\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig1.png\" alt=\"\" width=\"700\" height=\"185\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig1.png 700w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig1-300x79.png 300w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig1-150x40.png 150w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>This is a small data set (only 768 people). I have two questions:<\/p>\n<ol>\n<li>374 cases are missing &#8216;Insulin&#8217;. That&#8217;s more than 50% missing. I personally believe that &#8216;Insulin&#8217; could be a great and important predictor, but if this many cases are missing, is it best to discard this feature completely?<\/li>\n<li>&#8216;SkinThickness&#8217; is missing from 227 cases. This is a large percentage too, but I found that there&#8217;s a strong correlation between &#8216;SkinThickness&#8217; and &#8216;BMI&#8217; (see figure below), which makes sense too. Am I introducing data leakage if I fill in missing &#8216;SkinThickness&#8217; values based on BMI? My plan is to have ranges of BMI, say 20-25, 25-30, 30-35, etc., and get the mean for each range. Depending on which range the person&#8217;s BMI falls in, I&#8217;ll fill in the missing &#8216;SkinThickness&#8217; with the mean corresponding to that range. Is this a good approach?<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-20467\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig2.png\" alt=\"\" width=\"681\" height=\"318\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig2.png 681w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig2-300x140.png 300w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig2-150x70.png 150w\" sizes=\"(max-width: 681px) 100vw, 681px\" \/><\/p>\n<p><strong>Answer:<\/strong> Thank you for your question as I think many data science practitioners can relate to the situation you describe. Let&#8217;s take a look at part #1 first. Often you&#8217;ll commence work on a new data science project only to find the source data set incomplete, i.e. there are null values for certain predictors that can help predict the response variable value. This is where data science gets nuanced since there is no absolute right or wrong answer here. In fact, you&#8217;ll likely try a number of different approaches during the <a href=\"https:\/\/insidebigdata.com\/2014\/11\/12\/ask-data-scientist-data-science-process\/\" target=\"_blank\" rel=\"noopener\">data science process<\/a>.<\/p>\n<p>First, you might want to delete all incomplete observations even though this will decrease the power of the analysis by decreasing the effective sample size. It also means removing observations with missing values can produce a bias in the model. Another disadvantage to this approach is that the subjects with missing values may be different than the subjects without missing values (e.g., missing values that are non-random), so you have a non-representative sample after removing the observations with missing values.<\/p>\n<p>Rather than removing the incomplete observations outright, you can explore the effect of removing data by using a language mechanism to temporarily ignore incomplete observations for the operation being performed. For example, in R some functions (like cor() for computing the correlation coefficient for two variables) include the use=&#8221;complete.obs&#8221; argument that temporarily throws away the incomplete observations for the calculation.<\/p>\n<p>Another obvious option is to delete the incomplete observations and then get more &#8220;complete&#8221; data! Sadly, that&#8217;s not always possible. In this case it&#8217;s hard to justify removing the &#8216;Insulin&#8217; predictor altogether since it likely offers strong statistical significance in the selected model.<\/p>\n<p>Now let&#8217;s examine part #2. This question frames the process of &#8220;imputation&#8221; very well. Many times, you can simulate missing continuous predictor values by using a language-based construct like the impute() function found in R&#8217;s e1071 package to set missing values to the mean or median of other data values. You can use this approach directly on the &#8216;SkinThickness&#8217; variable, or you can experiment with you idea to discretize &#8216;BMI&#8217; and then use the correlation you discovered between the two variables.<\/p>\n<p>Please take a look at a couple of articles I wrote a few years back when we partnered with Intel on our &#8220;Ask a Data Scientist&#8221; series &#8211; &#8220;<a href=\"https:\/\/insidebigdata.com\/2014\/10\/29\/ask-data-scientist-handling-missing-data\/\" target=\"_blank\" rel=\"noopener\">Handling Missing Data<\/a>&#8221; and also &#8220;<a href=\"https:\/\/insidebigdata.com\/2014\/11\/26\/ask-data-scientist-data-leakage\/\" target=\"_blank\" rel=\"noopener\">Data Leakage<\/a>.&#8221;<\/p>\n<p>Finally, here is a good flow chart to follow for missing data issues from an excellent article that thoroughly details the subject: &#8220;<a href=\"https:\/\/towardsdatascience.com\/how-to-handle-missing-data-8646b18db0d4\" target=\"_blank\" rel=\"noopener\">How to Handle Missing Data<\/a>,&#8221; by Alvira Swalin, a Master&#8217;s student in data science.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-20468\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig3.png\" alt=\"\" width=\"700\" height=\"618\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig3.png 700w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig3-300x265.png 300w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig3-150x132.png 150w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><em><img decoding=\"async\" loading=\"lazy\" class=\"alignleft wp-image-5792\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2013\/11\/Dan_office.jpg\" alt=\"\" width=\"139\" height=\"109\" \/>Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist of insideBIGDATA. In addition to being a tech journalist, Daniel is also a practicing data scientist, author, educator and sits on a number of advisory boards for various start-up companies.\u00a0<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><em>Sign up for the free insideBIGDATA\u00a0<a href=\"http:\/\/insidebigdata.com\/newsletter\/\" target=\"_blank\" rel=\"noopener\">newsletter<\/a>.<\/em><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently received the following question on data science methods from an avid reader of insideBIGDATA who hails from Taiwan. I think the topics are very relevant to many folks in our audience so I decided to run it here in our Data Science 101 channel. The issue of missing data is one most data scientists see quite frequently.<\/p>\n","protected":false},"author":37,"featured_media":6406,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[182,170,90,87,180,56,1],"tags":[133,261,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Data Science 101: Handling Missing Data (Revisited) - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Science 101: Handling Missing Data (Revisited) - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"I recently received the following question on data science methods from an avid reader of insideBIGDATA who hails from Taiwan. I think the topics are very relevant to many folks in our audience so I decided to run it here in our Data Science 101 channel. The issue of missing data is one most data scientists see quite frequently.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2018-05-31T15:30:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-06-01T15:54:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/DataScience101.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"361\" \/>\n\t<meta property=\"og:image:height\" content=\"54\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Daniel Gutierrez\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AMULETAnalytics\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Gutierrez\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/\",\"url\":\"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/\",\"name\":\"Data Science 101: Handling Missing Data (Revisited) - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2018-05-31T15:30:05+00:00\",\"dateModified\":\"2018-06-01T15:54:14+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science 101: Handling Missing Data (Revisited)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\",\"name\":\"Daniel Gutierrez\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"caption\":\"Daniel Gutierrez\"},\"description\":\"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \\\"data scientist\\\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.\",\"sameAs\":[\"http:\/\/www.insidebigdata.com\",\"https:\/\/twitter.com\/@AMULETAnalytics\"],\"url\":\"https:\/\/insidebigdata.com\/author\/dangutierrez\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Science 101: Handling Missing Data (Revisited) - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/","og_locale":"en_US","og_type":"article","og_title":"Data Science 101: Handling Missing Data (Revisited) - insideBIGDATA","og_description":"I recently received the following question on data science methods from an avid reader of insideBIGDATA who hails from Taiwan. I think the topics are very relevant to many folks in our audience so I decided to run it here in our Data Science 101 channel. The issue of missing data is one most data scientists see quite frequently.","og_url":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2018-05-31T15:30:05+00:00","article_modified_time":"2018-06-01T15:54:14+00:00","og_image":[{"width":361,"height":54,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/DataScience101.jpg","type":"image\/jpeg"}],"author":"Daniel Gutierrez","twitter_card":"summary_large_image","twitter_creator":"@AMULETAnalytics","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Daniel Gutierrez","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/","url":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/","name":"Data Science 101: Handling Missing Data (Revisited) - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2018-05-31T15:30:05+00:00","dateModified":"2018-06-01T15:54:14+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Data Science 101: Handling Missing Data (Revisited)"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed","name":"Daniel Gutierrez","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","caption":"Daniel Gutierrez"},"description":"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \"data scientist\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.","sameAs":["http:\/\/www.insidebigdata.com","https:\/\/twitter.com\/@AMULETAnalytics"],"url":"https:\/\/insidebigdata.com\/author\/dangutierrez\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2013\/12\/DataScience101.jpg","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-5k4","jetpack-related-posts":[{"id":8698,"url":"https:\/\/insidebigdata.com\/2014\/04\/13\/data-science-101-k-means-clustering\/","url_meta":{"origin":20464,"position":0},"title":"Data Science 101: k-means Clustering","date":"April 13, 2014","format":false,"excerpt":"In this edition of insideBIGDATA's Data Science 101 series, I'm going to offer up a short instructional video describing the use of the popular unsupervised learning algorithm, k-means clustering.","rel":"","context":"In &quot;Data Science 101&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":32220,"url":"https:\/\/insidebigdata.com\/2023\/04\/28\/data-science-101-the-data-science-venn-diagram\/","url_meta":{"origin":20464,"position":1},"title":"Data Science 101: The Data Science Venn Diagram","date":"April 28, 2023","format":false,"excerpt":"Welcome to insideBIGDATA\u2019s\u00a0Data Science 101\u00a0channel bringing you perspectives for the topics of the day in data science, machine learning, AI and deep learning. Many of the video presentations come from my lectures for my\u00a0Introduction to Data Science\u00a0class I teach at UCLA Extension. In today\u2019s slide-based video presentation I discuss\u00a0The Data\u2026","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/04\/DataScience_shutterstock_1054542323.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":12653,"url":"https:\/\/insidebigdata.com\/2015\/01\/27\/data-science-101-machine-learning-basics\/","url_meta":{"origin":20464,"position":2},"title":"Data Science 101: Machine Learning &#8211; The Basics","date":"January 27, 2015","format":false,"excerpt":"The next installment of insideBIGDATA's Data Science 101 series comes from our friends over at LinkedIn.","rel":"","context":"In &quot;Data Science&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6396,"url":"https:\/\/insidebigdata.com\/2013\/12\/18\/data-science-101-build-model\/","url_meta":{"origin":20464,"position":3},"title":"Data Science 101: How to Build a Model","date":"December 18, 2013","format":false,"excerpt":"As part of insideBIGDATA's new Data Science 101 series of educational articles tailored to the data science aficionados out there trying to retool themselves to take advantage of this growth area of technology, we bring you a series of well-crafted instructional videos from our friends over at Salford Systems.","rel":"","context":"In &quot;Big Data Hardware&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":31724,"url":"https:\/\/insidebigdata.com\/2023\/02\/27\/data-science-101-the-data-science-process\/","url_meta":{"origin":20464,"position":4},"title":"Data Science 101: The Data Science Process","date":"February 27, 2023","format":false,"excerpt":"Welcome to insideBIGDATA's Data Science 101 channel brining you perspectives for the topics of the day in data science, machine learning, AI and deep learning. Many of the video presentations come from my lectures for my Introduction to Data Science class I teach at UCLA Extension. In today's slide-based video\u2026","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/04\/DataScience_shutterstock_1054542323.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":13621,"url":"https:\/\/insidebigdata.com\/2015\/08\/31\/data-science-101-an-introduction-to-scikit-learn-machine-learning-in-python\/","url_meta":{"origin":20464,"position":5},"title":"Data Science 101: An Introduction to scikit-learn &#8211; Machine Learning in Python","date":"August 31, 2015","format":false,"excerpt":"The tutorial presentation below offers an introduction to the scikit-learn package and to the central concepts of Machine Learning.","rel":"","context":"In &quot;Data Science&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/20464"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=20464"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/20464\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/6406"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=20464"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=20464"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=20464"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}