{"id":32866,"date":"2023-07-21T03:00:00","date_gmt":"2023-07-21T10:00:00","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=32866"},"modified":"2023-07-17T14:10:05","modified_gmt":"2023-07-17T21:10:05","slug":"ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/","title":{"rendered":"Video Highlights: Ultimate Guide To Scaling ML Models &#8211; Megatron-LM | ZeRO | DeepSpeed | Mixed Precision"},"content":{"rendered":"\n<p>In this video presentation, Aleksa Gordi\u0107&nbsp;explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta&#8217;s OPT-175B, BigScience BLOOM 176B, EleutherAI&#8217;s GPT-NeoX-20B, GPT-J, OpenAI&#8217;s GPT-3, Google&#8217;s PaLM, DeepMind&#8217;s Chinchilla\/Gopher models, etc. He covers the ideas of data parallelism, model\/pipeline parallelism (e.g. GPipe, PipeDream, etc.), model\/tensor parallelism (Megatron-LM), activation checkpointing, mixed precision training, ZeRO (zero redundancy optimizer) from Microsoft&#8217;s DeepSpeed library and many more. Along the way, many top research papers are highlighted. The video presentation is sponsored by <a href=\"https:\/\/www.assemblyai.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">AssemblyAI<\/a>. <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\"  id=\"_ytid_16794\"  width=\"480\" height=\"270\"  data-origwidth=\"480\" data-origheight=\"270\" src=\"https:\/\/www.youtube.com\/embed\/hc0u4avAkuM?enablejsapi=1&#038;autoplay=0&#038;cc_load_policy=0&#038;cc_lang_pref=&#038;iv_load_policy=1&#038;loop=0&#038;modestbranding=0&#038;rel=1&#038;fs=1&#038;playsinline=0&#038;autohide=2&#038;theme=dark&#038;color=red&#038;controls=1&#038;\" class=\"__youtube_prefs__  epyt-is-override  no-lazyload\" title=\"YouTube player\"  allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen data-no-lazy=\"1\" data-skipgform_ajax_framebjll=\"\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Papers: <\/p>\n\n\n\n<p>\u2705 Megatron-LM paper: <a href=\"https:\/\/www.youtube.com\/redirect?event=video_description&amp;redir_token=QUFFLUhqbllibkdEUGMyanVyakpnbFJBMF9UQzhTVWRDZ3xBQ3Jtc0tua3BOUFBCZkR0SkFOVXB5bkhxalBVMmUxUW1WM1JPSlJXZGRfNllkR09aeWo3RUpFZUFRSkZfWjVnQ3N2NXZYWVdiMER3a1pRN3BvLU5SQ2V3WWp6cGpxTTJraVYzSlJXWm1FRnp3Z2tWSHNTczF3QQ&amp;q=https%3A%2F%2Farxiv.org%2Fabs%2F1909.08053&amp;v=hc0u4avAkuM\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/abs\/1909.08053<\/a> <\/p>\n\n\n\n<p>\u2705 ZeRO (DeepSpeed) paper: <a href=\"https:\/\/www.youtube.com\/redirect?event=video_description&amp;redir_token=QUFFLUhqa3ZVbDdCUExsSjRhbnBEN3hRNkZuRUVvLU02d3xBQ3Jtc0ttTUJzRG5kOE9iaGE0dllrZUFGWGlfNmxmWi0yWmhuRGRJRXJBRGNrUzRrbGdZMnhTaEdfWTB6OGxMOEtTZXBiUENFaGVwUFM0V283cTlhS19SdzRVT0F5UkpqYWRJTzJTOUlwRHZrYVhiTDRoa1lOZw&amp;q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.02054v3&amp;v=hc0u4avAkuM\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/abs\/1910.02054v3<\/a> <\/p>\n\n\n\n<p>\u2705 Mixed precision training paper: <a href=\"https:\/\/www.youtube.com\/redirect?event=video_description&amp;redir_token=QUFFLUhqbFpjWDFlM0xvZ2ticV9sanBwOTVidE5pTXUwUXxBQ3Jtc0ttdUtNSF80T0ZDcXFocHoxNnNmcGVFQk1XMnRfWjA4bUNWb1ZmQTBheHVHbk5wNXM4SGFQbWxQN0FreFBoLThlYThkYUpMelNVWm1jeXBOekNOUUVPMmd3TGw0UUR5bk5ld0NCYnRjLU5pdWF4dUgxbw&amp;q=https%3A%2F%2Farxiv.org%2Fabs%2F1710.03740&amp;v=hc0u4avAkuM\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/abs\/1710.03740<\/a> <\/p>\n\n\n\n<p>\u2705 Gpipe (pipeline parallelism) paper: <a href=\"https:\/\/www.youtube.com\/redirect?event=video_description&amp;redir_token=QUFFLUhqbHVsNUYzc1N4WGJLQ3pMczJyc0o1TENZeWo3d3xBQ3Jtc0ttbmFHVWVEcjlmdk42YU42TGxiZl9PVUlDbzhXaksyUnVSUFlGUWpRMXFJVEEyV2lSdFlVTzJoeVJJbDV0b3lpTVhxa2FnLWxnV0Jnb3N2TlFIdERXeEZzYkM2X1VJb21IcDF5bnZFTXV0MzdYVGZHcw&amp;q=https%3A%2F%2Farxiv.org%2Fabs%2F1811.06965&amp;v=hc0u4avAkuM\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/abs\/1811.06965<\/a><\/p>\n\n\n\n<p>Articles: <\/p>\n\n\n\n<p>\u2705 Collective ops: <a href=\"https:\/\/www.youtube.com\/redirect?event=video_description&amp;redir_token=QUFFLUhqblNEdUI1SVYzeUUtUmlrdmI5eVQ5bVJHalBGUXxBQ3Jtc0tsSWNCTTcwZnJ5NnFfQV8tRFB3S1RLaFRWaGdWa0ZEaHBtNnF0VEF1QW9Sc3kyUzVzZmhtWHh0bnV5UTk4bDYxX1hRUTY1SVdrcEdmbmFqYldJRW45OU5JZ090RkM0eF9MUV9BaXMyNmR2RlBoSG4wTQ&amp;q=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FCollective_operation&amp;v=hc0u4avAkuM\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/en.wikipedia.org\/wiki\/Collect&#8230;<\/a> <\/p>\n\n\n\n<p>\u2705 IEEE float16 format: <a href=\"https:\/\/www.youtube.com\/redirect?event=video_description&amp;redir_token=QUFFLUhqbG9CakdxSVFoRmZ6M0UzVXJ6eHN2N1FVM0NBd3xBQ3Jtc0trbjF5bDlHVTFOVFV0d3hBWHd3dS1xbndBMGlRMFF0SHFLUGFiOEFKc1Z4M2FaajdrNno4eE5GMHhCcl9uc0dSa21TMjAtTFlhRHc4ZkoySWNFTGsyYW10WlZSTFJKYURMR2hkSUpFN2llSU5faldrVQ&amp;q=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FHalf-precision_floating-point_format&amp;v=hc0u4avAkuM\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/en.wikipedia.org\/wiki\/Half-pr&#8230;<\/a> <\/p>\n\n\n\n<p>\u2705 Google Brain&#8217;s bfloat16 format: <a href=\"https:\/\/www.youtube.com\/redirect?event=video_description&amp;redir_token=QUFFLUhqbEpvR1FjQ1EtNlpMMHFfRml6TkF4UXdnbEJTd3xBQ3Jtc0tuS1d4cnltUENCbGdjTGlNWjllVFY4MURyMlgzQmhvVmVIZ24tYWlOcGdYTEtyeHZNQzhOa1FpWVVaR28wWmVYX3NtbmdqNl9tMEZvSjRfNEFwc1pmRXJVLVBrTHZ0Ulp2aVNCa0RHbHd1T0FycHFYRQ&amp;q=https%3A%2F%2Fcloud.google.com%2Fblog%2Fproducts%2Fai-machine-learning%2Fbfloat16-the-secret-to-high-performance-on-cloud-tpus&amp;v=hc0u4avAkuM\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/cloud.google.com\/blog\/product&#8230;<\/a><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><em>Sign up for the free insideBIGDATA&nbsp;<a href=\"http:\/\/inside-bigdata.com\/newsletter\/\" target=\"_blank\" rel=\"noreferrer noopener\">newsletter<\/a>.<\/em><\/p>\n\n\n\n<p><em>Join us on Twitter:&nbsp;<a href=\"https:\/\/twitter.com\/InsideBigData1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/twitter.com\/InsideBigData1<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on LinkedIn:&nbsp;<a href=\"https:\/\/www.linkedin.com\/company\/insidebigdata\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.linkedin.com\/company\/insidebigdata\/<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on Facebook:&nbsp;<a href=\"https:\/\/www.facebook.com\/insideBIGDATANOW\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.facebook.com\/insideBIGDATANOW<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this video presentation, Aleksa Gordi\u0107\u00a0explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta&#8217;s OPT-175B, BigScience BLOOM 176B, EleutherAI&#8217;s GPT-NeoX-20B, GPT-J, OpenAI&#8217;s GPT-3, Google&#8217;s PaLM, DeepMind&#8217;s Chinchilla\/Gopher models, etc.<\/p>\n","protected":false},"author":10513,"featured_media":23655,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[526,115,182,90,180,67,268,56,1303,1,85],"tags":[437,264,277,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Video Highlights: Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Video Highlights: Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"In this video presentation, Aleksa Gordi\u0107\u00a0explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta&#039;s OPT-175B, BigScience BLOOM 176B, EleutherAI&#039;s GPT-NeoX-20B, GPT-J, OpenAI&#039;s GPT-3, Google&#039;s PaLM, DeepMind&#039;s Chinchilla\/Gopher models, etc.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2023-07-21T10:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-17T21:10:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/12\/Machine_Learning_shutterstock_344688470.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"212\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/\",\"url\":\"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/\",\"name\":\"Video Highlights: Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2023-07-21T10:00:00+00:00\",\"dateModified\":\"2023-07-17T21:10:05+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Video Highlights: Ultimate Guide To Scaling ML Models &#8211; Megatron-LM | ZeRO | DeepSpeed | Mixed Precision\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\",\"name\":\"Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"caption\":\"Editorial Team\"},\"sameAs\":[\"http:\/\/www.insidebigdata.com\"],\"url\":\"https:\/\/insidebigdata.com\/author\/editorial\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Video Highlights: Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/","og_locale":"en_US","og_type":"article","og_title":"Video Highlights: Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision - insideBIGDATA","og_description":"In this video presentation, Aleksa Gordi\u0107\u00a0explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta's OPT-175B, BigScience BLOOM 176B, EleutherAI's GPT-NeoX-20B, GPT-J, OpenAI's GPT-3, Google's PaLM, DeepMind's Chinchilla\/Gopher models, etc.","og_url":"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2023-07-21T10:00:00+00:00","article_modified_time":"2023-07-17T21:10:05+00:00","og_image":[{"width":300,"height":212,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/12\/Machine_Learning_shutterstock_344688470.jpg","type":"image\/jpeg"}],"author":"Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@insideBigData","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Editorial Team","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/","url":"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/","name":"Video Highlights: Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2023-07-21T10:00:00+00:00","dateModified":"2023-07-17T21:10:05+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2023\/07\/21\/ultimate-guide-to-scaling-ml-models-megatron-lm-zero-deepspeed-mixed-precision\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Video Highlights: Ultimate Guide To Scaling ML Models &#8211; Megatron-LM | ZeRO | DeepSpeed | Mixed Precision"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9","name":"Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","caption":"Editorial Team"},"sameAs":["http:\/\/www.insidebigdata.com"],"url":"https:\/\/insidebigdata.com\/author\/editorial\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/12\/Machine_Learning_shutterstock_344688470.jpg","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-8y6","jetpack-related-posts":[{"id":32878,"url":"https:\/\/insidebigdata.com\/2023\/07\/25\/video-highlights-generative-ai-with-large-language-models\/","url_meta":{"origin":32866,"position":0},"title":"Video Highlights: Generative AI with Large Language Models","date":"July 25, 2023","format":false,"excerpt":"At an unprecedented pace, Large Language Models like GPT-4 are transforming the world in general and the field of data science in particular. This two-hour training video presentation by Jon Krohn, Co-Founder and Chief Data Scientist at the machine learning company Nebula, introduces deep learning transformer architectures including LLMs.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/06\/GenerativeAI_shutterstock_2313909647_special.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":30414,"url":"https:\/\/insidebigdata.com\/2022\/09\/20\/nvidia-launches-large-language-model-cloud-services\/","url_meta":{"origin":32866,"position":1},"title":"NVIDIA Launches Large Language Model Cloud Services","date":"September 20, 2022","format":false,"excerpt":"NVIDIA today announced two new large language model cloud AI services \u2014 the NVIDIA NeMo Large Language Model Service and the NVIDIA BioNeMo LLM Service \u2014 that enable developers to easily adapt LLMs and deploy customized AI applications for content generation, text summarization, chatbots, code development, as well as protein\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":30906,"url":"https:\/\/insidebigdata.com\/2022\/11\/19\/snorkel-ai-accelerates-foundation-model-adoption-with-data-centric-ai\/","url_meta":{"origin":32866,"position":2},"title":"Snorkel AI Accelerates Foundation Model Adoption with Data-centric AI","date":"November 19, 2022","format":false,"excerpt":"Snorkel AI, the data-centric AI platform company, today introduced Data-centric Foundation Model Development for enterprises to unlock complex, performance-critical use cases with GPT-3, RoBERTa, T5, and other foundation models. With this launch, enterprise data science and machine learning teams can overcome adaptation and deployment challenges by creating large, domain-specific datasets\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":33231,"url":"https:\/\/insidebigdata.com\/2023\/08\/28\/generative-ai-report-nutanix-simplifies-adoption-of-generative-ai-with-new-nutanix-gpt-in-a-box-solution\/","url_meta":{"origin":32866,"position":3},"title":"Generative AI Report: Nutanix Simplifies Adoption of Generative AI with New Nutanix GPT-in-a-Box Solution","date":"August 28, 2023","format":false,"excerpt":"Nutanix\u00a0(NASDAQ:\u00a0NTNX), a leader in hybrid multicloud computing, announced the Nutanix GPT-in-a-Box\u2122\u00a0solution for customers looking to jump-start their artificial intelligence (AI) and machine learning (ML) innovation, while maintaining control over their data. The new offering is a full-stack software-defined AI-ready platform, along with services to help organizations size and configure hardware\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/08\/Generative_AI_shutterstock_2273007347_special.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":31714,"url":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","url_meta":{"origin":32866,"position":4},"title":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT","date":"February 24, 2023","format":false,"excerpt":"The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_paper.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":32512,"url":"https:\/\/insidebigdata.com\/2023\/06\/03\/video-highlights-fine-tune-gpt-j-6b-in-under-3-hours-on-ipus\/","url_meta":{"origin":32866,"position":5},"title":"Video Highlights: Fine Tune GPT-J 6B in Under 3 Hours on IPUs","date":"June 3, 2023","format":false,"excerpt":"Did you know you can run GPT-J 6B on Graphcore IPU in the cloud? Following the now infamous leaked Google memo, there's been a real storm in the AI world recently around smaller, open source language models, like GPT-J, that are cheaper and faster to fine-tune, run and perform just\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/05\/Deep_Learning_shutterstock_386816095.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/32866"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/10513"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=32866"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/32866\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/23655"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=32866"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=32866"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=32866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}