{"id":10795,"date":"2015-01-17T06:00:14","date_gmt":"2015-01-17T14:00:14","guid":{"rendered":"http:\/\/inside-bigdata.com\/?p=10795"},"modified":"2015-01-17T09:15:24","modified_gmt":"2015-01-17T17:15:24","slug":"data-science-101-using-statistics-predict-ab-testing","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/","title":{"rendered":"Data Science 101: Using Statistics to Predict AB Testing"},"content":{"rendered":"<p>The Wikipedia fundraising team was performing up to 100 AB tests per week. It wasn&#8217;t enough to find the gains needed. The team needed to use statistics to interpret their AB tests accurately, but also to estimate smallest acceptable sample sizes to increase testing frequency. They were not comfortable with trusting methods proposed by other practitioners or academics who could not prove that their methods would work accurately for their data.<\/p>\n<p>The talk below presents simple methods that can accurately predict future performance from AB test results, and that allow you to determine the smallest acceptable sample size. Using four years of AB testing data, you&#8217;ll see how these methods really work. By taking a walk through four years of data, the presenters show how their current methods would have predicted the actual outcomes of long tests with much smaller samples, and the implications of various decisions about statistical confidence, power, sample size and choice of statistical method.<\/p>\n<p>The presentation will be useful to anyone who performs AB tests, manages teams that rely on AB testing, or holds responsibility for implementing or maintaining AB testing platforms.<\/p>\n<p>The presenters: Zack Exley, a practice director at the global IT consultancy ThoughtWorks. Previously, Zack was the Chief Community Officer and Chief Revenue Officer at the Wikimedia Foundation (Wikipedia), where he built an aggressive AB testing program that accomplished a 350% percent increase in revenue while radically reducing the length of the annual fundraiser. Also presenting is Sahar Massachi, a startup founder, online organizer, and software developer recently out of Brandeis with a BS and MA in Computer Science. After founding a start up to provide community organizations with better mobile tools, he joined up with the Wikipedia fundraising team to help complete a review of their AB testing methods, analyzing and preparing for publication hundreds of past AB tests.<\/p>\n<p>&nbsp;<\/p>\n<p><span class=\"embed-youtube\" style=\"text-align:center; display: block;\"><iframe loading=\"lazy\" class=\"youtube-player\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/VKVIuYEUqn8?version=3&#038;rel=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;fs=1&#038;hl=en-US&#038;autohide=2&#038;wmode=transparent\" allowfullscreen=\"true\" style=\"border:0;\" sandbox=\"allow-scripts allow-same-origin allow-popups allow-presentation\"><\/iframe><\/span><\/p>\n<p>&nbsp;<\/p>\n<p><em><a href=\"http:\/\/sps.northwestern.edu\/info\/predictive-analytics.php?utm_source=InsideBigData&amp;utm_medium=DSC101_text&amp;utm_term=oct-dec&amp;utm_content=MSPA&amp;utm_campaign=MSPA_IBD15&amp;src=FY15InsdieBigData_DSC101_text\" target=\"_blank\">Earn your master\u2019s in predictive analytics completely online from Northwestern University.<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The talk below presents simple methods that can accurately predict future performance from AB test results, and that allow you to determine the smallest acceptable sample size. Using four years of AB testing data, you&#8217;ll see how these methods really work. <\/p>\n","protected":false},"author":37,"featured_media":12622,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[182,170,87,180,56,1,85],"tags":[261,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Data Science 101: Using Statistics to Predict AB Testing - insideBIGDATA<\/title>\n<meta name=\"description\" content=\"Data Science 101: Using Statistics to Predict AB Testing\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Science 101: Using Statistics to Predict AB Testing - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"Data Science 101: Using Statistics to Predict AB Testing\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2015-01-17T14:00:14+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-01-17T17:15:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/01\/Slide1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1020\" \/>\n\t<meta property=\"og:image:height\" content=\"765\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Daniel Gutierrez\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AMULETAnalytics\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Gutierrez\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/\",\"url\":\"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/\",\"name\":\"Data Science 101: Using Statistics to Predict AB Testing - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2015-01-17T14:00:14+00:00\",\"dateModified\":\"2015-01-17T17:15:24+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\"},\"description\":\"Data Science 101: Using Statistics to Predict AB Testing\",\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science 101: Using Statistics to Predict AB Testing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\",\"name\":\"Daniel Gutierrez\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"caption\":\"Daniel Gutierrez\"},\"description\":\"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \\\"data scientist\\\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.\",\"sameAs\":[\"http:\/\/www.insidebigdata.com\",\"https:\/\/twitter.com\/@AMULETAnalytics\"],\"url\":\"https:\/\/insidebigdata.com\/author\/dangutierrez\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Science 101: Using Statistics to Predict AB Testing - insideBIGDATA","description":"Data Science 101: Using Statistics to Predict AB Testing","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/","og_locale":"en_US","og_type":"article","og_title":"Data Science 101: Using Statistics to Predict AB Testing - insideBIGDATA","og_description":"Data Science 101: Using Statistics to Predict AB Testing","og_url":"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2015-01-17T14:00:14+00:00","article_modified_time":"2015-01-17T17:15:24+00:00","og_image":[{"width":1020,"height":765,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/01\/Slide1.jpg","type":"image\/jpeg"}],"author":"Daniel Gutierrez","twitter_card":"summary_large_image","twitter_creator":"@AMULETAnalytics","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Daniel Gutierrez","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/","url":"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/","name":"Data Science 101: Using Statistics to Predict AB Testing - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2015-01-17T14:00:14+00:00","dateModified":"2015-01-17T17:15:24+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed"},"description":"Data Science 101: Using Statistics to Predict AB Testing","breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2015\/01\/17\/data-science-101-using-statistics-predict-ab-testing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Data Science 101: Using Statistics to Predict AB Testing"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed","name":"Daniel Gutierrez","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","caption":"Daniel Gutierrez"},"description":"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \"data scientist\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.","sameAs":["http:\/\/www.insidebigdata.com","https:\/\/twitter.com\/@AMULETAnalytics"],"url":"https:\/\/insidebigdata.com\/author\/dangutierrez\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/01\/Slide1.jpg","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-2O7","jetpack-related-posts":[{"id":8329,"url":"https:\/\/insidebigdata.com\/2014\/03\/26\/data-science-101-time-series-r\/","url_meta":{"origin":10795,"position":0},"title":"Data Science 101: Forecasting Time Series Using R","date":"March 26, 2014","format":false,"excerpt":"An integral tool found in data science is Time Series Forecasting. Here is a useful instructional video on the subject from one of the authors of a free eBook available on OTexts - \"Forecasting: Principles and Practice.\" The presentation \"Forecasting Time Series Using R\" is made by Professor of Statistics\u2026","rel":"","context":"In &quot;Analytics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":20464,"url":"https:\/\/insidebigdata.com\/2018\/05\/31\/data-science-101-handling-missing-data-revisited\/","url_meta":{"origin":10795,"position":1},"title":"Data Science 101: Handling Missing Data (Revisited)","date":"May 31, 2018","format":false,"excerpt":"I recently received the following question on data science methods from an avid reader of insideBIGDATA who hails from Taiwan. I think the topics are very relevant to many folks in our audience so I decided to run it here in our Data Science 101 channel. The issue of missing\u2026","rel":"","context":"In &quot;Data Science&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2018\/05\/Reader_question_fig3.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":12820,"url":"https:\/\/insidebigdata.com\/2015\/03\/03\/data-science-101-automating-analytics\/","url_meta":{"origin":10795,"position":2},"title":"Data Science 101: Automating Analytics","date":"March 3, 2015","format":false,"excerpt":"For the latest installment of our Data Science 101 series, we have Bjorn Johansson, Executive Director Operations at Ericsson, presenting several initiatives where Ericsson is applying new methods to accelerate business transformation work.","rel":"","context":"In &quot;Analytics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6411,"url":"https:\/\/insidebigdata.com\/2013\/12\/26\/data-science-101-probability-monte-carlo-methods\/","url_meta":{"origin":10795,"position":3},"title":"Data Science 101: Probability and Monte Carlo Methods","date":"December 26, 2013","format":false,"excerpt":"Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, i.e. by running simulations many times in succession in order to calculate those same probabilities with machine learning just like actually playing and recording your results in a real casino\u2026","rel":"","context":"In &quot;Analytics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":7949,"url":"https:\/\/insidebigdata.com\/2014\/03\/07\/data-science-101-deep-learning-methods-applications\/","url_meta":{"origin":10795,"position":4},"title":"Data Science 101: Deep Learning Methods and Applications","date":"March 7, 2014","format":false,"excerpt":"Microsoft Research, the research arm of the software giant, is a hotbed of data science and machine learning research. Microsoft has the resources to hire the best and brightest researchers from around the globe. A recent publication is available for download (PDF): \"Deep Learning: Methods and Applications\" by Li Deng\u2026","rel":"","context":"In &quot;Book Review&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6580,"url":"https:\/\/insidebigdata.com\/2014\/01\/08\/data-science-101-machine-learning-part-1\/","url_meta":{"origin":10795,"position":5},"title":"Data Science 101: Machine Learning, Part 1","date":"January 8, 2014","format":false,"excerpt":"BloomReach engineer Srinath Sridhar walks through probability, Bayesian models and machine learning in this 5 part video series.","rel":"","context":"In &quot;Data Science 101&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/10795"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=10795"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/10795\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/12622"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=10795"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=10795"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=10795"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}