{"version":"1.0","provider_name":"insideBIGDATA","provider_url":"https:\/\/insidebigdata.com","author_name":"Editorial Team","author_url":"https:\/\/insidebigdata.com\/author\/editorial\/","title":"Research Highlights: R&R: Metric-guided Adversarial Sentence Generation - insideBIGDATA","type":"rich","width":600,"height":338,"html":"<blockquote class=\"wp-embedded-content\" data-secret=\"jX1ENNvZxp\"><a href=\"https:\/\/insidebigdata.com\/2022\/12\/02\/research-highlights-rr-metric-guided-adversarial-sentence-generation\/\">Research Highlights: R&#038;R: Metric-guided Adversarial Sentence Generation<\/a><\/blockquote><iframe sandbox=\"allow-scripts\" security=\"restricted\" src=\"https:\/\/insidebigdata.com\/2022\/12\/02\/research-highlights-rr-metric-guided-adversarial-sentence-generation\/embed\/#?secret=jX1ENNvZxp\" width=\"600\" height=\"338\" title=\"&#8220;Research Highlights: R&#038;R: Metric-guided Adversarial Sentence Generation&#8221; &#8212; insideBIGDATA\" data-secret=\"jX1ENNvZxp\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" class=\"wp-embedded-content\"><\/iframe><script type=\"text\/javascript\">\n\/*! This file is auto-generated *\/\n!function(c,d){\"use strict\";var e=!1,o=!1;if(d.querySelector)if(c.addEventListener)e=!0;if(c.wp=c.wp||{},c.wp.receiveEmbedMessage);else if(c.wp.receiveEmbedMessage=function(e){var t=e.data;if(!t);else if(!(t.secret||t.message||t.value));else if(\/[^a-zA-Z0-9]\/.test(t.secret));else{for(var r,s,a,i=d.querySelectorAll('iframe[data-secret=\"'+t.secret+'\"]'),n=d.querySelectorAll('blockquote[data-secret=\"'+t.secret+'\"]'),o=new RegExp(\"^https?:$\",\"i\"),l=0;l<n.length;l++)n[l].style.display=\"none\";for(l=0;l<i.length;l++)if(r=i[l],e.source!==r.contentWindow);else{if(r.removeAttribute(\"style\"),\"height\"===t.message){if(1e3<(s=parseInt(t.value,10)))s=1e3;else if(~~s<200)s=200;r.height=s}if(\"link\"===t.message)if(s=d.createElement(\"a\"),a=d.createElement(\"a\"),s.href=r.getAttribute(\"src\"),a.href=t.value,!o.test(a.protocol));else if(a.host===s.host)if(d.activeElement===r)c.top.location.href=t.value}}},e)c.addEventListener(\"message\",c.wp.receiveEmbedMessage,!1),d.addEventListener(\"DOMContentLoaded\",t,!1),c.addEventListener(\"load\",t,!1);function t(){if(o);else{o=!0;for(var e,t,r,s=-1!==navigator.appVersion.indexOf(\"MSIE 10\"),a=!!navigator.userAgent.match(\/Trident.*rv:11\\.\/),i=d.querySelectorAll(\"iframe.wp-embedded-content\"),n=0;n<i.length;n++){if(!(r=(t=i[n]).getAttribute(\"data-secret\")))r=Math.random().toString(36).substr(2,10),t.src+=\"#?secret=\"+r,t.setAttribute(\"data-secret\",r);if(s||a)(e=t.cloneNode(!0)).removeAttribute(\"security\"),t.parentNode.replaceChild(e,t);t.contentWindow.postMessage({message:\"ready\",secret:r},\"*\")}}}}(window,document);\n<\/script>\n","thumbnail_url":"https:\/\/insidebigdata.com\/wp-content\/uploads\/2019\/10\/NLP_shutterstock_299138114.jpg","thumbnail_width":300,"thumbnail_height":200,"description":"Large language models are a hot topic in AI research right now. But there\u2019s a hotter, more significant problem looming: we might run out of data to train them on ... as early as 2026.\u00a0Kalyan Veeramachaneni and the team at MIT Data-to-AI Lab may have found the solution: in their new paper on Rewrite and Rollback (\u201cR&R: Metric-Guided Adversarial Sentence Generation\u201d), an R&R framework can tweak and turn low-quality (from sources like Twitter and 4Chan) into high-quality data (texts from sources like Wikipedia and industry websites) by rewriting meaningful sentences and thereby adding to the amount of the right type of data to test and train language models on."}