{"id":13317,"date":"2015-06-30T06:00:08","date_gmt":"2015-06-30T13:00:08","guid":{"rendered":"http:\/\/insidebigdata.com\/?p=13317"},"modified":"2015-06-29T15:48:10","modified_gmt":"2015-06-29T22:48:10","slug":"big-data-technology-for-scientific-research","status":"publish","type":"post","link":"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/","title":{"rendered":"Big Data Technology for Scientific Research"},"content":{"rendered":"<p><em>This article is the third in\u00a0<a href=\"http:\/\/insidebigdata.com\/2015\/06\/23\/insidebigdata-guide-to-scientific-research\/\" target=\"_blank\">an editorial\u00a0series<\/a> with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.<br \/>\n<\/em><\/p>\n<p>In the <a href=\"http:\/\/insidebigdata.com\/2015\/06\/25\/primary-motivators-of-big-data-vis-a-vis-scientific-research\/\" target=\"_blank\">last article<\/a>, we took a look at the primary motivators for scientific researchers to engage the big data technology stack. The complete\u00a0<em>insideBIGDATA Guide to Scientific Research <\/em>is available\u00a0for download from the<a href=\"http:\/\/insidebigdata.com\/white-paper\/insidebigdata-guide-to-scientific-research\/\" target=\"_blank\"> insideBIGDATA White Paper Library<\/a>.<\/p>\n<p><strong><a href=\"http:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/insideBIGDATA_Guide_Research_feature.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignright size-full wp-image-13285\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/insideBIGDATA_Guide_Research_feature.jpg\" alt=\"insideBIGDATA_Guide_Research_feature\" width=\"218\" height=\"279\" \/><\/a>Big Data Technology for Scientific Research<\/strong><\/p>\n<p>The continued and rapid evolution of big data technology and services has formed a fertile foundation for scientific applications in the past several years. We\u2019ve seen big data hardware and software solutions promulgate access to analytics and methods of statistical learning like never before. And one thing researchers have learned along the way is that management of computing resources is one of the primary questions to be answered with big data. It is not just a case of determining the scale of resources needed for a project, but also how to configure them, all within the available budget. For example, running a large project on fewer machines might save on hardware costs but will result in a longer project timeline. In some cases, scientific big data are being stored in the cloud instead of on conventional hardware in a research lab. Instead of having to invest in the infrastructure of an on-premises High Performance Computing (HPC) cluster to analyze the data, some researchers are using HPC and big data methods in the cloud. The disadvantage of the cloud is that large data sets must be transferred to cloud storage.<\/p>\n<p>In this section we\u2019ll examine specific areas of big data technology that scientific researchers are deploying in order to see significant increases in their ability to manage the scientific data deluge.<\/p>\n<p><strong>Colliding Worlds of HPC and Big Data<\/strong><\/p>\n<p>In order to address the needs of data-centric scientific research projects, the profiles of traditional HPC and big data are merging and becoming closely intertwined. The compute nature of HPC is finding significant benefit from big data analytics and its ability to process high volume, high velocity data sets. The current most effective software platform for big data analytics\u2014Hadoop \u2014has its classic architecture consisting of HDFS and MapReduce, running on commodity cluster nodes, and the HPC environment has a different architecture where compute is distinct from the storage solution. You\u2019d like to leverage your current investment in HPC by doing big data analytics on the architecture you already have. This is where two worlds come together.<\/p>\n<p>An area of scientific research that\u2019s benefiting from HPC and big data merging is genomics. The advancement of this area of research depends on the availability of HPC because of the compute resources needed to process genomic data sets. But these problems also require solving big data issues like analyzing the data, making sense of what\u2019s in the data, identifying what patterns emerge from the data, and more. Essentially, every aspect of what genomic researchers do is becoming an opportunity to capture, analyze and use big data. This is the same net effect seen from the perspective of many other scientific disciplines.<\/p>\n<p><strong>Data Sources and Data Integration<\/strong><\/p>\n<p>Scientific researchers routinely collect extremely large data sets, primarily for computational analysis with an HPC\u00a0 system. These data sets can also be analyzed with big data tools to look for valuable insights with data visualization\u00a0 tools or advanced analysis algorithms. The difference between HPC and big data analytics is primarily that HPC is CPU bound, whereas big data analytical problems are IO bound. As the two environments continue to merge, researchers can apply big data analytics against a primarily HPC data set without moving the data set from the HPC environment to a Hadoop Cluster with HDFS.<\/p>\n<p>A paradigm shift in big data analytics relative to scientific research use cases, as well as other use cases where there is value from analyzing HPC data sets, necessitates a new direction away from the common HDFS architecture, and towards using MapReduce on the Lustre parallel file system. The Intel\u00ae Enterprise Edition for Lustre software unleashes the performance and scalability of the Lustre for HPC workloads, including big data applications becoming common within today\u2019s research labs. Dell built the reference architecture where Lustre is used in place of HDFS. Further, Dell layers analytics on top of this architecture as well.<\/p>\n<p>A key component to connecting the Hadoop and Lustre ecosystems is the Intel Hadoop Adapter for Lustre plug-in (Intel HAL). Intel HAL is bundled with the Intel Enterprise Edition for Lustre software. It allows the users to run MapReduce jobs directly on a Lustre file system. The immediate benefit is that Lustre is able to deliver faster, stable and easily managed storage for the MapReduce applications directly. A potential long term benefit using Lustre as the underlying Hadoop storage would be a higher raw capacity available when compared to HDFS due to the three time replication as well as the performance benefits of running Lustre on InfiniBand connectivity.<\/p>\n<p>Researchers are interested in multiple additional classes of data sets. Many of them may be on premise and consist of multiple types of data. Other data sources may come from publicly available sites, or purchased from one of the many services that provide data sources to labs. Dell\u2019s Data Integration Platform as a Service called Boomi could be used for such applications.<\/p>\n<p><strong>Hadoop<\/strong><\/p>\n<p>The Hadoop distributed computing architecture increasingly is being deployed for scientific applications requiring big data capabilities, specifically managing, collecting and analyzing the data. Dell\u2122 Apache\u2122 Hadoop\u00ae solutions for big data provide an open source, end-to-end scalable infrastructure that allows you to:<\/p>\n<ul>\n<li>Simultaneously store and process large data sets in a distributed environment\u2014across servers and storage\u2014for extensive, structured and unstructured statistical learning and analysis<\/li>\n<li>Accommodate a wide range of analytic, exploration, query and transformation workloads<\/li>\n<li>Tailor and deploy validated reference architectures<\/li>\n<li>Reduce project costs<\/li>\n<li>Drive important insights from scientific data<\/li>\n<\/ul>\n<p>Take the complexity out of analyzing research data sets. With Dell\u2019s extensive Hadoop-ready library of analytics\u00a0 solutions, you can easily create \u201cwhat if\u201d scenario dashboards, generate graphs for relationship analysis and innovate over legacy systems. Dell has teamed up with Cloudera and Intel to provide the most comprehensive,\u00a0 easy-to-implement big data solutions on the market for research applications.<\/p>\n<p>Dell\u2019s tested and validated Reference Architectures include Dell PowerEdge servers with Intel\u00ae Xeon\u00ae processors, Dell Networking and Cloudera Enterprise. This broad compatibility can help your research group build robust Hadoop solutions to collect, manage, analyze and store data while leveraging existing tools and resources. The Intel\u00ae Xeon\u00ae powered Dell | Cloudera solution can give your research group everything it needs to tackle big data challenges including software, hardware, networking and services.<\/p>\n<p><strong>Spark<\/strong><\/p>\n<p>Apache Spark is another distributed processing environment that\u2019s gained much interest in the scientific community. Spark is an open-source platform for large-scale distributed computing. MapReduce is a widely adopted programming model that divides a large computation into two steps: a Map step in which data are partitioned and analyzed in parallel, and a Reduce step, in which intermediate results are combined or summarized. Many analyses can be expressed in this model, but systems like Hadoop have key weaknesses in that data must be loaded from disk storage for each operation, which can slow iterative computations (including many machine learning algorithms), and makes interactive, exploratory analysis difficult. Spark extends and generalizes the MapReduce model while addressing this weakness by introducing a primitive for data sharing called a resilient distributed data set (RDD). With Spark, a data set or intermediate result can be cached in the memory across cluster nodes, performing iterative computations faster than with Hadoop MapReduce and allowing for interactive analyses.<\/p>\n<p>One example of a research project taking advantage of Spark is the Howard Hughes Brain Institute. The project\u2019s goal is to understand brain function by monitoring and interpreting the activity of large networks of neurons during behavior. An hour of brain imaging for a mouse can yield 50-100 gigabytes of data. The researchers developed a library of analytical tools called Thunder which is based on Spark using the Python API along with existing libraries\u00a0 for scientific computing and visualization. The core of Thunder is expressing different neuroscience analyses in the language of RDD operations. Many computations such as summary statistics, regression and clustering can be parallelized using MapReduce.<\/p>\n<p><a href=\"http:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/Thunder_Spark.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-13318\" src=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/Thunder_Spark.jpg\" alt=\"Thunder_Spark\" width=\"490\" height=\"335\" srcset=\"https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/Thunder_Spark.jpg 490w, https:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/Thunder_Spark-300x205.jpg 300w\" sizes=\"(max-width: 490px) 100vw, 490px\" \/><\/a><strong>Statistical Analysis Software<\/strong><\/p>\n<p>The scientific community is fortunate to have many quality statistical environments and analytics tools with which to use for developing analytical pipelines that connect data to models and then to predictions\u2014SAS, SPSS, Matlab and Statistica. On the open source front, many researchers employ tools like R and Python\u2014each containing vast libraries of statistical functions and machine learning algorithms.<\/p>\n<p>It is important to transform complex and time-consuming manipulation of scientific data sets into a fast and intuitive process. Statistica Big Data Analytics from Dell combines search and analytics in a single, unified environment. Statistica is an advanced content mining and analytics solution that is fully integrated, configurable and cloudenabled. It deploys in minutes and brings together natural language processing, machine learning, search and advanced visualization.<\/p>\n<p>If you prefer, the\u00a0complete\u00a0<em>insideBIGDATA Guide to Scientific Research <\/em>is\u00a0available\u00a0for\u00a0download in PDF from the<a href=\"http:\/\/insidebigdata.com\/white-paper\/insidebigdata-guide-to-scientific-research\/\" target=\"_blank\">\u00a0insideBIGDATA White Paper Library<\/a>, courtesy of Dell and Intel.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is the third in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.<\/p>\n","protected":false},"author":37,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[65,115,64,182,87,180,109,210,67,56,77,84,1,58],"tags":[95],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Big Data Technology for Scientific Research - insideBIGDATA<\/title>\n<meta name=\"description\" content=\"Big Data Technology for Scientific Research\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Big Data Technology for Scientific Research - insideBIGDATA\" \/>\n<meta property=\"og:description\" content=\"Big Data Technology for Scientific Research\" \/>\n<meta property=\"og:url\" content=\"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/\" \/>\n<meta property=\"og:site_name\" content=\"insideBIGDATA\" \/>\n<meta property=\"article:publisher\" content=\"http:\/\/www.facebook.com\/insidebigdata\" \/>\n<meta property=\"article:published_time\" content=\"2015-06-30T13:00:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-06-29T22:48:10+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/insideBIGDATA_Guide_Research_feature.jpg\" \/>\n<meta name=\"author\" content=\"Daniel Gutierrez\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AMULETAnalytics\" \/>\n<meta name=\"twitter:site\" content=\"@insideBigData\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Gutierrez\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/\",\"url\":\"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/\",\"name\":\"Big Data Technology for Scientific Research - insideBIGDATA\",\"isPartOf\":{\"@id\":\"https:\/\/insidebigdata.com\/#website\"},\"datePublished\":\"2015-06-30T13:00:08+00:00\",\"dateModified\":\"2015-06-29T22:48:10+00:00\",\"author\":{\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\"},\"description\":\"Big Data Technology for Scientific Research\",\"breadcrumb\":{\"@id\":\"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/insidebigdata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Big Data Technology for Scientific Research\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/insidebigdata.com\/#website\",\"url\":\"https:\/\/insidebigdata.com\/\",\"name\":\"insideBIGDATA\",\"description\":\"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/insidebigdata.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed\",\"name\":\"Daniel Gutierrez\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g\",\"caption\":\"Daniel Gutierrez\"},\"description\":\"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \\\"data scientist\\\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.\",\"sameAs\":[\"http:\/\/www.insidebigdata.com\",\"https:\/\/twitter.com\/@AMULETAnalytics\"],\"url\":\"https:\/\/insidebigdata.com\/author\/dangutierrez\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Big Data Technology for Scientific Research - insideBIGDATA","description":"Big Data Technology for Scientific Research","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/","og_locale":"en_US","og_type":"article","og_title":"Big Data Technology for Scientific Research - insideBIGDATA","og_description":"Big Data Technology for Scientific Research","og_url":"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/","og_site_name":"insideBIGDATA","article_publisher":"http:\/\/www.facebook.com\/insidebigdata","article_published_time":"2015-06-30T13:00:08+00:00","article_modified_time":"2015-06-29T22:48:10+00:00","og_image":[{"url":"http:\/\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/insideBIGDATA_Guide_Research_feature.jpg"}],"author":"Daniel Gutierrez","twitter_card":"summary_large_image","twitter_creator":"@AMULETAnalytics","twitter_site":"@insideBigData","twitter_misc":{"Written by":"Daniel Gutierrez","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/","url":"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/","name":"Big Data Technology for Scientific Research - insideBIGDATA","isPartOf":{"@id":"https:\/\/insidebigdata.com\/#website"},"datePublished":"2015-06-30T13:00:08+00:00","dateModified":"2015-06-29T22:48:10+00:00","author":{"@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed"},"description":"Big Data Technology for Scientific Research","breadcrumb":{"@id":"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/insidebigdata.com\/2015\/06\/30\/big-data-technology-for-scientific-research\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/insidebigdata.com\/"},{"@type":"ListItem","position":2,"name":"Big Data Technology for Scientific Research"}]},{"@type":"WebSite","@id":"https:\/\/insidebigdata.com\/#website","url":"https:\/\/insidebigdata.com\/","name":"insideBIGDATA","description":"Your Source for AI, Data Science, Deep Learning &amp; Machine Learning Strategies","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/insidebigdata.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/2540da209c83a68f4f5922848f7376ed","name":"Daniel Gutierrez","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/insidebigdata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5780282e7e567e2a502233e948464542?s=96&d=mm&r=g","caption":"Daniel Gutierrez"},"description":"Daniel D. Gutierrez is a Data Scientist with Los Angeles-based AMULET Analytics, a service division of AMULET Development Corp. He's been involved with data science and Big Data long before it came in vogue, so imagine his delight when the Harvard Business Review recently deemed \"data scientist\" as the sexiest profession for the 21st century. Previously, he taught computer science and database classes at UCLA Extension for over 15 years, and authored three computer industry books on database technology. He also served as technical editor, columnist and writer at a major computer industry monthly publication for 7 years. Follow his data science musings at @AMULETAnalytics.","sameAs":["http:\/\/www.insidebigdata.com","https:\/\/twitter.com\/@AMULETAnalytics"],"url":"https:\/\/insidebigdata.com\/author\/dangutierrez\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p9eA3j-3sN","jetpack-related-posts":[{"id":13326,"url":"https:\/\/insidebigdata.com\/2015\/07\/02\/big-data-and-open-science-data\/","url_meta":{"origin":13317,"position":0},"title":"Big Data and Open Science Data","date":"July 2, 2015","format":false,"excerpt":"This article is the fourth in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/insideBIGDATA_Guide_Research_feature.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":13292,"url":"https:\/\/insidebigdata.com\/2015\/06\/25\/primary-motivators-of-big-data-vis-a-vis-scientific-research\/","url_meta":{"origin":13317,"position":1},"title":"Primary Motivators of Big Data vis-\u00e0-vis Scientific Research","date":"June 25, 2015","format":false,"excerpt":"This article is the second in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.","rel":"","context":"In &quot;Academic&quot;","img":{"alt_text":"LHC_data_collection","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/LHC_data_collection.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":13345,"url":"https:\/\/insidebigdata.com\/2015\/07\/07\/case-studies-big-data-and-scientific-research\/","url_meta":{"origin":13317,"position":2},"title":"Case Studies: Big Data and Scientific Research","date":"July 7, 2015","format":false,"excerpt":"This is the fifth and final article in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.","rel":"","context":"In &quot;Big Data&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/insideBIGDATA_Guide_Research_feature.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":13281,"url":"https:\/\/insidebigdata.com\/2015\/12\/01\/insidebigdata-guide-to-scientific-research\/","url_meta":{"origin":13317,"position":3},"title":"insideBIGDATA Guide to Scientific Research","date":"December 1, 2015","format":false,"excerpt":"In this new insideBIGDATA Guide to Scientific Research, the goal is to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.","rel":"","context":"In &quot;Academic&quot;","img":{"alt_text":"insideBIGDATA_Guide_Research_SKA","src":"https:\/\/i0.wp.com\/insidebigdata.com\/wp-content\/uploads\/2015\/06\/insideBIGDATA_Guide_Research_SKA.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":5892,"url":"https:\/\/insidebigdata.com\/2013\/11\/14\/nasa-brings-earth-science-big-data-cloud-aws\/","url_meta":{"origin":13317,"position":4},"title":"NASA Brings Earth Science Big Data to the Cloud with AWS","date":"November 14, 2013","format":false,"excerpt":"In a significant coupling of scientific research and big data, NASA and Amazon Web Services Inc. (AWS) are making a large collection of NASA climate and Earth science satellite data available to research and educational users through the AWS cloud.","rel":"","context":"In &quot;Academic&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2256,"url":"https:\/\/insidebigdata.com\/2013\/01\/09\/joint-computing-institute-to-tackle-big-data-in-the-northwest\/","url_meta":{"origin":13317,"position":5},"title":"Joint Computing Institute to Tackle Big Data in the Northwest","date":"January 9, 2013","format":false,"excerpt":"This week PNNL announced that the lab is launching the new Northwest Institute for Advanced Computing in cooperation with the University of Washington. Researchers associated with the institute will work to ensure the next generation of computers and the methods used to run them can address challenges ranging from climate\u2026","rel":"","context":"In &quot;Academic&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/13317"}],"collection":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/comments?post=13317"}],"version-history":[{"count":0,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/posts\/13317\/revisions"}],"wp:attachment":[{"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/media?parent=13317"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/categories?post=13317"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/insidebigdata.com\/wp-json\/wp\/v2\/tags?post=13317"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}