{"id":31764,"date":"2023-03-03T06:00:00","date_gmt":"2023-03-03T14:00:00","guid":{"rendered":"https:\/\/insidebigdata.com\/?p=31764"},"modified":"2023-06-23T12:37:20","modified_gmt":"2023-06-23T19:37:20","slug":"research-highlights-sparsegpt-prune-llms-accurately-in-one-shot","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/","title":{"rendered":"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot"},"content":{"rendered":"\n<p><strong><em>Achieve 50% Sparsity With One-Shot Learning Without Any Retraining<\/em><\/strong><\/p>\n\n\n\n<p>It might come as a surprise, but large language models are a great match for sparsification. Why? They give up less accuracy as compared to the amount of weights that are being eliminated (set to 0). This is an encouraging finding from <a href=\"https:\/\/neuralmagic.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Neural Magic<\/a>&#8216;s collaboration with the Institute of Science and Technology Austria (ISTA) because it makes it possible to run billion parameter models more efficiently, with significantly less hardware.<\/p>\n\n\n\n<p>A new research <a href=\"https:\/\/arxiv.org\/pdf\/2301.00774.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">paper<\/a> shows that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. When executing SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, we can reach 60% sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"277\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/Neural_Magic_paper_fig.png\" alt=\"\" class=\"wp-image-31765\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/Neural_Magic_paper_fig.png 700w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/Neural_Magic_paper_fig-300x119.png 300w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/Neural_Magic_paper_fig-150x59.png 150w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/figure><\/div>\n\n\n<p><em>Sign up for the free insideBIGDATA&nbsp;<a href=\"http:\/\/inside-bigdata.com\/newsletter\/\" target=\"_blank\" rel=\"noreferrer noopener\">newsletter<\/a>.<\/em><\/p>\n\n\n\n<p><em>Join us on Twitter:&nbsp;<a href=\"https:\/\/twitter.com\/InsideBigData1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/twitter.com\/InsideBigData1<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on LinkedIn:&nbsp;<a href=\"https:\/\/www.linkedin.com\/company\/insidebigdata\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.linkedin.com\/company\/insidebigdata\/<\/a><\/em><\/p>\n\n\n\n<p><em>Join us on Facebook:&nbsp;<a href=\"https:\/\/www.facebook.com\/insideBIGDATANOW\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.facebook.com\/insideBIGDATANOW<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new research paper shows that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. <\/p>\n","protected":false},"author":10513,"featured_media":31716,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[526,182,87,180,67,56,84,1303,1],"tags":[437,1254,264,1131,96],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot - insideBIGDATA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"A new research paper shows that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2023-03-03T14:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-23T19:37:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/GPT4_shutterstock_2252419881_small.png\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/\",\"url\":\"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/\",\"name\":\"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2023-03-03T14:00:00+00:00\",\"dateModified\":\"2023-06-23T19:37:20+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\"},\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9\",\"name\":\"Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g\",\"caption\":\"Editorial Team\"},\"sameAs\":[\"http:\/\/www.insidebigdata.com\"],\"url\":\"https:\/\/insidebigdata.com\/author\/editorial\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot - insideBIGDATA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/","og_locale":"en_US","og_type":"article","og_title":"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot - insideBIGDATA","og_description":"A new research paper shows that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models.","og_url":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2023-03-03T14:00:00+00:00","article_modified_time":"2023-06-23T19:37:20+00:00","og_image":[{"width":300,"height":200,"url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/GPT4_shutterstock_2252419881_small.png","type":"image\/png"}],"author":"Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@insideBigData","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Editorial Team","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/","url":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/","name":"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2023-03-03T14:00:00+00:00","dateModified":"2023-06-23T19:37:20+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9"},"breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2023\/03\/03\/research-highlights-sparsegpt-prune-llms-accurately-in-one-shot\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2949e412c144601cdbcc803bd234e1b9","name":"Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e137ce7ea40e38bd4d25bb7860cfe3e4?s=96&d=mm&r=g","caption":"Editorial Team"},"sameAs":["http:\/\/www.insidebigdata.com"],"url":"https:\/\/insidebigdata.com\/author\/editorial\/"}]}},"jetpack_featured_media_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/GPT4_shutterstock_2252419881_small.png","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-8gk","jetpack-related-posts":[{"id":24721,"url":"https:\/\/insidebigdata.com\/2020\/07\/11\/research-highlights-exbert\/","url_meta":{"origin":31764,"position":0},"title":"Research Highlights: ExBERT","date":"July 11, 2020","format":false,"excerpt":"In the insideBIGDATA Research Highlights column we take a look at new and upcoming results from the research community for data science, machine learning, AI and deep learning. Our readers need to get a glimpse for technology coming down the pipeline that will make their efforts more strategic and competitive.\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2020\/07\/exBERT_fig.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":29230,"url":"https:\/\/insidebigdata.com\/2022\/05\/06\/research-highlights-transformer-feed-forward-layers-are-key-value-memories\/","url_meta":{"origin":31764,"position":1},"title":"Research Highlights: Transformer Feed-Forward Layers Are Key-Value Memories","date":"May 6, 2022","format":false,"excerpt":"In this regular column, we take a look at highlights for important research topics of the day for big data, data science, machine learning, AI and deep learning. It\u2019s important to keep connected with the research arm of the field in order to see where we\u2019re headed. In this edition,\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2022\/05\/Research_highlights_3.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":31714,"url":"https:\/\/insidebigdata.com\/2023\/02\/24\/research-highlights-a-comprehensive-survey-on-pretrained-foundation-models-a-history-from-bert-to-chatgpt\/","url_meta":{"origin":31764,"position":2},"title":"Research Highlights: A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT","date":"February 24, 2023","format":false,"excerpt":"The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/02\/LLM_paper.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":31639,"url":"https:\/\/insidebigdata.com\/2023\/02\/15\/video-highlights-attention-is-all-you-need-paper-explained\/","url_meta":{"origin":31764,"position":3},"title":"Video Highlights: Attention Is All You Need &#8211; Paper Explained","date":"February 15, 2023","format":false,"excerpt":"In this video presentation, Mohammad Namvarpour presents a comprehensive study on Ashish Vaswani and his coauthors' renowned paper, \u201cAttention Is All You Need.\u201d This paper is a major turning point in deep learning research. The transformer architecture, which was introduced in this paper, is now used in a variety of\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2019\/10\/NLP_shutterstock_299138114.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":32878,"url":"https:\/\/insidebigdata.com\/2023\/07\/25\/video-highlights-generative-ai-with-large-language-models\/","url_meta":{"origin":31764,"position":4},"title":"Video Highlights: Generative AI with Large Language Models","date":"July 25, 2023","format":false,"excerpt":"At an unprecedented pace, Large Language Models like GPT-4 are transforming the world in general and the field of data science in particular. This two-hour training video presentation by Jon Krohn, Co-Founder and Chief Data Scientist at the machine learning company Nebula, introduces deep learning transformer architectures including LLMs.","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2023\/06\/GenerativeAI_shutterstock_2313909647_special.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":29034,"url":"https:\/\/insidebigdata.com\/2022\/04\/13\/habana-labs-and-hugging-face-partner-to-accelerate-transformer-model-training\/","url_meta":{"origin":31764,"position":5},"title":"Habana Labs and Hugging Face Partner to Accelerate Transformer Model Training","date":"April 13, 2022","format":false,"excerpt":"Habana\u00ae Labs, a pioneer in high-efficiency, purpose-built deep learning processors, and Hugging Face, the home of Transformer models, announced that they\u2019re joining forces to make it easier and quicker to train high-quality transformer models. Thanks to the integration of Habana\u2019s SynapseAI software suite with the Hugging Face Optimum open-source library,\u2026","rel":"","context":"In &quot;AI Deep Learning&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/31764"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/10513"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=31764"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/31764\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media\/31716"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=31764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=31764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=31764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}