RStudio provides the premiere open source and enterprise-ready professional software for R, including RStudio Desktop, RStudio Server, RStudio Connect, Shiny Server, and shinyapps.io. The tidyverse, shiny, ggplot, ggvis, dplyr, knitr, R Markdown, and packrat are R packages from RStudio that every data scientist will want to enhance the value, reproducibility, and appearance of their work.

Date: The month of August
Time: 11:00 a.m. EDT

Description:

R is well-suited to handle data that can fit in memory but additional tools are needed when the amount of data you want to analyze in R grows beyond the limits of your machine’s RAM. There have been a variety of solutions to this problem over the years that aim to solve this problem in R; one of the latest options is Apache Spark™. Spark is a cluster computing tool that enables analysis of massive, distributed data across dozens or hundreds of servers.

RStudio recently announced a new open-source package called sparklyr that facilitates a connection between R and Spark using a full-fledged dplyr backend with support for the entirety of Spark’s MLlib library. Due to Spark’s ability to interact with distributed data with little latency, it is becoming an attractive tool for interfacing with large datasets in an interactive environment. In addition to handling the storage of data, Spark also incorporates a variety of other tools including stream processing, computing on graphs, and a distributed machine learning framework. Some of these tools are available to R programmers via the sparklyr package.

In this four part series, we’ll discuss how to leverage Spark’s capabilities in a modern R environment.

The Sparklyr Series:

  1. Introducing an R interface for Apache Spark by Edgar Ruiz – Wednesday, August 9th @ 11 a.m. EDT
  2. Extending Spark using sparklyr and R by Javier Luraschi – Wednesday, August 16th @ 11 a.m. EDT
  3. Advanced Features by Javier Luraschi – Wednesday, August 23th @ 11 a.m. EDT
  4. Understanding Spark and sparklyr deployment modes by Edgar Ruiz – Wednesday, August 30th @ 11 a.m. EDT


Logistics:

Only 1,000 live attendees are allowed in the Webinar on a first come first serve basis. It is typical for many people who register to not attend (which is why registration does not guarantee access.) If for any reason you cannot make the webinar or cannot get in we will provide links to the recording as well as all materials within 48 hours.

Sparklyr Series Webinar Registration:


Presenter:

Javier Luraschi Javier Luraschi, Software Engineer-  Javier is a Software Engineer with experience in technologies ranging from desktop, web, mobile and backend; to augmented reality and deep learning applications. He previously worked for Microsoft Research and SAP and holds a double degree in Mathematics and Software Engineering.

Presenter:

Edgar Ruiz Edgar Ruiz, Solutions Engineer-  Edgar has a background in deploying enterprise reporting and Business Intelligence solutions. He has posted multiple articles and blog posts sharing analytics insights and server infrastructure for Data Science. He lives with his family near Biloxi, MS.



Webinar Recordings:

We try to record every webinar we host and post all materials on our website.
http://www.rstudio.com/resources/webinars/

Slides & Code:

We've started a Github repository with all webinar materials. Speakers for this webinar and all future webinars will add their materials to the repository.
https://github.com/rstudio/webinars


Live in August at 11am EDT
Approximately 45 minutes of presentation followed by 15 Minutes of Q&A.