{"id":33219,"date":"2023-08-29T03:00:00","date_gmt":"2023-08-29T10:00:00","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=33219"},"modified":"2023-08-29T09:17:11","modified_gmt":"2023-08-29T16:17:11","slug":"the-countles-worth-of-fine-tuning-foundation-models","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/","title":{"rendered":"The Countless Worth of Fine-Tuning Foundation Models"},"content":{"rendered":"\n<p>Foundation models have surged to the fore of modern machine learning applications for numerous reasons. Their generative capabilities\u2014including videos, images, and text\u2014are unrivaled. They readily perform a multiplicity of tasks, as epitomized by the utility of Large Language Models (LLMs) in everything from transcription to summarization.<\/p>\n\n\n\n<p>But most importantly, they\u2019re readily repurposed from one dataset, application, and domain to another, without requiring undue time, financial resources, and energy.<\/p>\n\n\n\n<p>Applying these models to related tasks\u2014such as employing a model designed to classify cars to classify vans\u2014is known as fine-tuning. The capacity for organizations to deploy foundation models for mission critical enterprise use cases ultimately hinges on model fine-tuning, which is vital for building applications with them.<\/p>\n\n\n\n<p>According to&nbsp;<a href=\"https:\/\/monsterapi.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">Monster API<\/a>&nbsp;CEO Saurabh Vij, \u201cMost developers either want to access foundation models, or fine-tune them for domain specific tasks, or for their own custom datasets.\u201d<\/p>\n\n\n\n<p>In addition to expediting the time required to repurpose foundation models for a developer\u2019s own particular needs, model fine-tuning furthers the democratization of data science. Contemporary solutions in this space utilize codeless fine-tuning options in cloud platforms (which include both GPUs and foundation models on demand) that are accessible through APIs.<\/p>\n\n\n\n<p>Users can quickly avail themselves of pre-trained models, fine-tune them for their specific use cases, and broaden the accessibility and impact of advanced machine learning throughout the enterprise.<\/p>\n\n\n\n<p><strong>Model Fine-Tuning<\/strong><\/p>\n\n\n\n<p>Conceptually, model fine-tuning is one of the many forms of&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2211.04347\" target=\"_blank\" rel=\"noreferrer noopener\">transfer learning<\/a>&nbsp;in which \u201cyou\u2019re taking a pre-trained model and then you\u2019re now re-training it for another task,\u201d Vij explained. \u201cThe best part is you don\u2019t need to spend hundreds of thousands of dollars to do this. You can do this for 200 to 300 dollars.\u201d When building an object detection model from scratch to identify cars, for example, users would have to give it hundreds of thousands, if not millions, of examples for its initial training.<\/p>\n\n\n\n<p>However, to apply that model to a similar task like recognizing trucks, one could simply fine-tune it and avoid lengthy (and costly) re-training from scratch. \u201cWhat we understand is there is so many common features, like headlights, the structure, wheels,\u201d Vij noted. \u201cAll those things are similar for both use cases. So, what if we could just remove one or two layers of the car\u2019s model and add new layers for the trucks? This is called fine-tuning.\u201d<\/p>\n\n\n\n<p><strong>Parameter-Efficient Fine-Tuning<\/strong><\/p>\n\n\n\n<p>There is an abundance of approaches to fine-tuning foundation models.&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2305.16742\" target=\"_blank\" rel=\"noreferrer noopener\">Parameter-Efficient Fine-Tuning<\/a>&nbsp;is one such method that involves training fewer parameters while rectifying typical constraints pertaining to communication and storage. When asked about the cost-effective method of fine-tuning mentioned earlier, Vij characterized it as \u201ca type of transfer learning, which is Parameter-Efficient Fine-Tuning, and one of the techniques is LoRA, which is quickly becoming an industry standard.\u201d<\/p>\n\n\n\n<p><a href=\"https:\/\/arxiv.org\/abs\/2106.09685\" target=\"_blank\" rel=\"noreferrer noopener\">Low-Rank Adaption (LoRA)<\/a>&nbsp;is renowned for its ability to reduce costs and increase efficiency for fine-tuning machine learning models. Some of the reasons fine-tuning can become expensive (when other approaches are invoked) is because \u201cdeep learning models are very large models and they have a lot of memory requirements,\u201d Vij commented. \u201cThere\u2019s a lot of parameters in them and their weights are very heavy.\u201d<\/p>\n\n\n\n<p>LoRA is emerging as a preferrable technique for fine-tuning these models, partially because it\u2019s an alternative to full fine-tuning. According to Vij, \u201cWith LoRA, what happens is you don\u2019t need to fine-tune the entire model, but you can fine-tune a specific part of the model, which can be adapted to learn for the new dataset.\u201d LoRA enables users to quickly fine-tune models to identify the salient traits from a new dataset\u2014without building a new model. The computational paradigm underpinning LoRA is partly based on reductions, which improves its overall efficiency. \u201cWe\u2019re using this approach because lots of computations reduce a lot,\u201d Vij revealed.<\/p>\n\n\n\n<p><strong>Foundation Models&nbsp;<\/strong><\/p>\n\n\n\n<p>Once they adopt the proper techniques, organizations have the latitude to fine-tune several foundation models to meet their enterprise AI needs\u2014particularly if they require generative capabilities. Some of the more useful models&nbsp;&nbsp;include:<\/p>\n\n\n\n<ul>\n<li><strong>Stable Fusion:<\/strong>&nbsp;<a href=\"https:\/\/stablediffusionweb.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Stable Fusion<\/a>&nbsp;is an image generation model. Nonetheless, organizations can engineer it via natural language prompts. \u201cOut-the-box, it can create multiple images,\u201d Vij confirmed. \u201cIt has been trained on a large dataset of text-image pairs so that it can understand the context for the text prompts you provide.\u201d<\/li>\n\n\n\n<li><strong>Whisper AI:<\/strong>&nbsp;This speech-to-text model transcribes audio files. \u201cIt\u2019s also adding a little more functionality, such as sentiment analysis and summarization,\u201d Vij added. \u201cBut Whisper AI\u2019s specific focus is on transcription, so any audio can be transcribed.\u201d This model provides transcriptions in up to 30 different languages.<\/li>\n\n\n\n<li><strong>LLaMA and StableLM:<\/strong>&nbsp;Large Language Model Meta AI (LLaMA) is a collection of LLMs that are widely used for text-to-text applications. \u201cYou write an instruction and it generates an output for you,\u201d Vij remarked. StableLM is an open source alternative.<\/li>\n<\/ul>\n\n\n\n<p><strong>The Bottom Line&nbsp;<\/strong><\/p>\n\n\n\n<p>Whether it\u2019s some iteration of GPT, LLaMA, or any other foundation model, the enterprise merit obtained from successfully employing these constructs revolves around the ability to fine-tune them. LoRA and other Parameter-Efficient Fine-Tuning approaches produce this result in a manner that\u2019s cost-effective and efficient\u2014particularly when accessed through no-code cloud frameworks supplying GPUs and access to the models.<\/p>\n\n\n\n<p>\u201cIn 2022, you had around 60,000 fine-tuned models,\u201d Vij reflected. \u201cBut over the past seven to eight months now, that number has grown to 200,000, and we can anticipate this growth to reach 1,000,000 models in one year.\u201d<\/p>\n\n\n\n<p><strong>About the Author<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignleft size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"125\" height=\"125\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/10\/Jelani-Harper.jpg\" alt=\"\" class=\"wp-image-23475\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/10\/Jelani-Harper.jpg 125w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/10\/Jelani-Harper-110x110.jpg 110w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/10\/Jelani-Harper-50x50.jpg 50w\" sizes=\"(max-width: 125px) 100vw, 125px\" \/><\/figure><\/div>\n\n\n<p><em>Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.<\/em><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><em>Sign up for the free insideBIGDATA&nbsp;<a href=\"http:\/\/inside-bigdata.com\/newsletter\/\" target=\"_blank\" rel=\"noreferrer noopener\">newsletter<\/a>.<\/em><\/p>\n\n\n\n<p><em>Join us on Twitter:&nbsp;<a href=\"https:\/\/twitter.com\/InsideBigData1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/twitter.com\/InsideBigData1<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on LinkedIn:&nbsp;<a href=\"https:\/\/www.linkedin.com\/company\/insidebigdata\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.linkedin.com\/company\/insidebigdata\/<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on Facebook:&nbsp;<a href=\"https:\/\/www.facebook.com\/insideBIGDATANOW\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.facebook.com\/insideBIGDATANOW<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this contributed article, editorial consultant Jelani Harper points out that whether it\u2019s some iteration of GPT, LLaMA, or any other foundation model, the enterprise merit obtained from successfully employing these constructs revolves around the ability to fine-tune them. LoRA and other Parameter-Efficient Fine-Tuning approaches produce this result in a manner that\u2019s cost-effective and efficient\u2014particularly when accessed through no-code cloud frameworks supplying GPUs and access to the models.<\/p>\n","protected":false},"author":10531,"featured_media":33064,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[526,66,182,180,67,268,56,97,1],"tags":[437,634,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>The Countless Worth of Fine-Tuning Foundation Models - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The Countless Worth of Fine-Tuning Foundation Models - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"In this contributed article, editorial consultant Jelani Harper points out that whether it\u2019s some iteration of GPT, LLaMA, or any other foundation model, the enterprise merit obtained from successfully employing these constructs revolves around the ability to fine-tune them. LoRA and other Parameter-Efficient Fine-Tuning approaches produce this result in a manner that\u2019s cost-effective and efficient\u2014particularly when accessed through no-code cloud frameworks supplying GPUs and access to the models.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-29T10:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-29T16:17:11+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/08\/Neural_net_shutterstock_1615182352_special.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1100\" \/>\n\t<meta property=\"og:image:height\" content=\"550\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Contributor\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Contributor\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/\",\"url\":\"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/\",\"name\":\"The Countless Worth of Fine-Tuning Foundation Models - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2023-08-29T10:00:00+00:00\",\"dateModified\":\"2023-08-29T16:17:11+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/35a290930284d4cdbf002d457f3d5d87\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"The Countless Worth of Fine-Tuning Foundation Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/35a290930284d4cdbf002d457f3d5d87\",\"name\":\"Contributor\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/36bffd267e38ed3f525205f67270e91b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/36bffd267e38ed3f525205f67270e91b?s=96&d=mm&r=g\",\"caption\":\"Contributor\"},\"url\":\"https:\/\/insidebigdata.com\/author\/contributor\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The Countless Worth of Fine-Tuning Foundation Models - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/","og_locale":"en_US","og_type":"article","og_title":"The Countless Worth of Fine-Tuning Foundation Models - insideBIGDATA","og_description":"In this contributed article, editorial consultant Jelani Harper points out that whether it\u2019s some iteration of GPT, LLaMA, or any other foundation model, the enterprise merit obtained from successfully employing these constructs revolves around the ability to fine-tune them. LoRA and other Parameter-Efficient Fine-Tuning approaches produce this result in a manner that\u2019s cost-effective and efficient\u2014particularly when accessed through no-code cloud frameworks supplying GPUs and access to the models.","og_url":"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2023-08-29T10:00:00+00:00","article_modified_time":"2023-08-29T16:17:11+00:00","og_image":[{"width":1100,"height":550,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/08\/Neural_net_shutterstock_1615182352_special.jpg","type":"image\/jpeg"}],"author":"Contributor","twitter_card":"summary_large_image","twitter_creator":"@insideBigData","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Contributor","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/","url":"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/","name":"The Countless Worth of Fine-Tuning Foundation Models - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2023-08-29T10:00:00+00:00","dateModified":"2023-08-29T16:17:11+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/35a290930284d4cdbf002d457f3d5d87"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2023\/08\/29\/the-countles-worth-of-fine-tuning-foundation-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"The Countless Worth of Fine-Tuning Foundation Models"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/35a290930284d4cdbf002d457f3d5d87","name":"Contributor","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/36bffd267e38ed3f525205f67270e91b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/36bffd267e38ed3f525205f67270e91b?s=96&d=mm&r=g","caption":"Contributor"},"url":"https:\/\/insidebigdata.com\/author\/contributor\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/08\/Neural_net_shutterstock_1615182352_special.jpg","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-8DN","jetpack-related-posts":[{"id":32512,"url":"https:\/\/insidebigdata.com\/2023\/06\/03\/video-highlights-fine-tune-gpt-j-6b-in-under-3-hours-on-ipus\/","url_meta":{"origin":33219,"position":0},"title":"Video Highlights: Fine Tune GPT-J 6B in Under 3 Hours on IPUs","date":"June 3, 2023","format":false,"excerpt":"Did you know you can run GPT-J 6B on Graphcore IPU in the cloud? Following the now infamous leaked Google memo, there's been a real storm in the AI world recently around smaller, open source language models, like GPT-J, that are cheaper and faster to fine-tune, run and perform just\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/05\/Deep_Learning_shutterstock_386816095.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":30906,"url":"https:\/\/insidebigdata.com\/2022\/11\/19\/snorkel-ai-accelerates-foundation-model-adoption-with-data-centric-ai\/","url_meta":{"origin":33219,"position":1},"title":"Snorkel AI Accelerates Foundation Model Adoption with Data-centric AI","date":"November 19, 2022","format":false,"excerpt":"Snorkel AI, the data-centric AI platform company, today introduced Data-centric Foundation Model Development for enterprises to unlock complex, performance-critical use cases with GPT-3, RoBERTa, T5, and other foundation models. With this launch, enterprise data science and machine learning teams can overcome adaptation and deployment challenges by creating large, domain-specific datasets\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":33298,"url":"https:\/\/insidebigdata.com\/2023\/09\/06\/insidebigdata-ai-news-briefs-9-8-2023\/","url_meta":{"origin":33219,"position":2},"title":"insideBIGDATA AI News Briefs \u2013 9\/8\/2023","date":"September 6, 2023","format":false,"excerpt":"Welcome insideBIGDATA AI News Briefs, our timely new feature bringing you the latest industry insights and perspectives surrounding the field of AI including deep learning, large language models, generative AI, and transformers. We\u2019re working tirelessly to dig up the most timely and curious tidbits underlying the day\u2019s most popular technologies.\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/07\/AI-News-Briefs-column-banner.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":31714,"url":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","url_meta":{"origin":33219,"position":3},"title":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT","date":"February 24, 2023","format":false,"excerpt":"The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_paper.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":23720,"url":"https:\/\/insidebigdata.com\/2019\/12\/19\/machine-learning-for-all-the-democratizing-of-a-technology\/","url_meta":{"origin":33219,"position":4},"title":"Machine Learning for All: the Democratizing of a Technology","date":"December 19, 2019","format":false,"excerpt":"Our friends over at H2O.ai have produced a short new eBook \"Machine learning for all: the democratizing of a technology\" which covers machine learning features and automatic AI solutions, and how organizations can benefit from using them.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/12\/H20.ai_WPCover1_2019-12-10_10-46-11.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":24908,"url":"https:\/\/insidebigdata.com\/2020\/08\/24\/interview-andy-horng-co-founder-and-head-of-ai-cultivate\/","url_meta":{"origin":33219,"position":5},"title":"Interview: Andy Horng, Co-Founder and Head of AI, Cultivate","date":"August 24, 2020","format":false,"excerpt":"I recently caught up with Andy Horng, Co-Founder and Head of AI at Cultivate, to get a sense for the technology underlying the company's AI-powered leadership development platform. NLP plays an important role, and as a result they're using the RoBERTa language model for very good results.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2020\/08\/Andy-Horng.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/33219"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/10531"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=33219"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/33219\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/33064"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=33219"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=33219"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=33219"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}