{"id":31714,"date":"2023-02-24T06:00:00","date_gmt":"2023-02-24T14:00:00","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=31714"},"modified":"2023-06-23T12:37:43","modified_gmt":"2023-06-23T19:37:43","slug":"research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","title":{"rendered":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"alignright size-full is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_paper.png\" alt=\"\" class=\"wp-image-31719\" width=\"344\" height=\"369\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_paper.png 400w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_paper-280x300.png 280w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_paper-140x150.png 140w\" sizes=\"(max-width: 344px) 100vw, 344px\" \/><\/figure><\/div>\n\n\n<p>The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This <a href=\"https:\/\/arxiv.org\/pdf\/2302.09419.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">study<\/a> provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"195\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_history.png\" alt=\"\" class=\"wp-image-31717\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_history.png 700w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_history-300x84.png 300w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_history-150x42.png 150w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/figure><\/div>\n\n\n<p><em>Sign up for the free insideBIGDATA&nbsp;<a href=\"http:\/\/inside-bigdata.com\/newsletter\/\" target=\"_blank\" rel=\"noreferrer noopener\">newsletter<\/a>.<\/em><\/p>\n\n\n\n<p><em>Join us on Twitter:&nbsp;<a href=\"https:\/\/twitter.com\/InsideBigData1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/twitter.com\/InsideBigData1<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on LinkedIn:&nbsp;<a href=\"https:\/\/www.linkedin.com\/company\/insidebigdata\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.linkedin.com\/company\/insidebigdata\/<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on Facebook:&nbsp;<a href=\"https:\/\/www.facebook.com\/insideBIGDATANOW\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.facebook.com\/insideBIGDATANOW<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. <\/p>\n","protected":false},"author":10513,"featured_media":31716,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[526,115,182,87,180,67,56,84,1303,1],"tags":[1254,264,1248,277,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2023-02-24T14:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-23T19:37:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/GPT4_shutterstock_2252419881_small.png\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/\",\"url\":\"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/\",\"name\":\"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2023-02-24T14:00:00+00:00\",\"dateModified\":\"2023-06-23T19:37:43+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\",\"name\":\"Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"caption\":\"Editorial Team\"},\"sameAs\":[\"http:\/\/www.insidebigdata.com\"],\"url\":\"https:\/\/insidebigdata.com\/author\/editorial\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","og_locale":"en_US","og_type":"article","og_title":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT - insideBIGDATA","og_description":"The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications.","og_url":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2023-02-24T14:00:00+00:00","article_modified_time":"2023-06-23T19:37:43+00:00","og_image":[{"width":300,"height":200,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/GPT4_shutterstock_2252419881_small.png","type":"image\/png"}],"author":"Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@insideBigData","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Editorial Team","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","url":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","name":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2023-02-24T14:00:00+00:00","dateModified":"2023-06-23T19:37:43+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9","name":"Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","caption":"Editorial Team"},"sameAs":["http:\/\/www.insidebigdata.com"],"url":"https:\/\/insidebigdata.com\/author\/editorial\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/GPT4_shutterstock_2252419881_small.png","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-8fw","jetpack-related-posts":[{"id":31764,"url":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/","url_meta":{"origin":31714,"position":0},"title":"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot","date":"March 3, 2023","format":false,"excerpt":"A new research paper shows that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/Neural_Magic_paper_fig.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":32861,"url":"https:\/\/insidebigdata.com\/2023\/07\/17\/brief-history-of-llms\/","url_meta":{"origin":31714,"position":1},"title":"Brief History of LLMs","date":"July 17, 2023","format":false,"excerpt":"The early days of natural language processing saw researchers experiment with many different approaches, including conceptual ontologies and rule-based systems. While some of these methods proved narrowly useful, none yielded robust results. That changed in the 2010s when NLP research intersected with the then-bustling field of neural networks. The collision\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/06\/GenerativeAI_shutterstock_2284999159_special.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":23446,"url":"https:\/\/insidebigdata.com\/2019\/10\/18\/how-nlp-and-bert-will-change-the-language-game\/","url_meta":{"origin":31714,"position":2},"title":"How NLP and BERT Will Change the Language Game","date":"October 18, 2019","format":false,"excerpt":"In this contributed article, Rob Dalgety, Industry Specialist at Peltarion, discusses how the recent model open-sourced by Google in October 2018, BERT (Bidirectional Encoder Representations from Transformers, is now reshaping the NLP landscape. BERT is significantly more evolved in its understanding of word semantics given its context and has an\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/10\/Peltarion_pic1.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":32112,"url":"https:\/\/insidebigdata.com\/2023\/04\/14\/chatgpt-llms-in-the-enterprise-best-practices-applications\/","url_meta":{"origin":31714,"position":3},"title":"ChatGPT &amp; LLMs in the Enterprise: Best Practices &amp; Applications","date":"April 14, 2023","format":false,"excerpt":"While OpenAI\u2019s ChatGPT, Microsoft\u2019s Bing, and Google\u2019s Bard have received a lot of public attention in the past months, it is important to remember that they are specific products built on top of a class of technologies called Large Language Models (LLMs).\u00a0Our friends over at Dataiku have put together a\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":29628,"url":"https:\/\/insidebigdata.com\/2022\/06\/19\/research-highlights-emergent-abilities-of-large-language-models\/","url_meta":{"origin":31714,"position":4},"title":"Research Highlights: Emergent Abilities of Large Language Models","date":"June 19, 2022","format":false,"excerpt":"In this regular column we take a look at highlights for breaking research topics of the day in the areas of big data, data science, machine learning, AI and deep learning. For data scientists, it\u2019s important to keep connected with the research arm of the field in order to understand\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2022\/06\/arXiv_1.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":30414,"url":"https:\/\/insidebigdata.com\/2022\/09\/20\/nvidia-launches-large-language-model-cloud-services\/","url_meta":{"origin":31714,"position":5},"title":"NVIDIA Launches Large Language Model Cloud Services","date":"September 20, 2022","format":false,"excerpt":"NVIDIA today announced two new large language model cloud AI services \u2014 the NVIDIA NeMo Large Language Model Service and the NVIDIA BioNeMo LLM Service \u2014 that enable developers to easily adapt LLMs and deploy customized AI applications for content generation, text summarization, chatbots, code development, as well as protein\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/31714"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/10513"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=31714"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/31714\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/31716"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=31714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=31714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=31714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}