RStudio provides the premiere open source and enterprise-ready professional software for data scientists moving to R from less flexible, proprietary, and expensive analytic platforms. Shiny, ggvis, dplyr, knitr, R Markdown, and packrat are R packages from RStudio that every data scientist will want to enhance the value, reproducibility, and appearance of their work.

Date: November 9th
Time: 11:00 AM EDT

Description:

The internet is a treasure trove of data, if you know how to collect it. In this two part series of webinars, we will examine easy ways to collect different types of data from the web with R.

In Part 1 (November 9th), we will use the httr package to collect data that is provided through web APIs. APIs are a popular and efficient way to share data online. If someone purposefully collected your data to share online, there is a good chance that they are sharing it through an API. Unfortunately, not all APIs work the same way, and how they are implemented depends largely on the developer. In this webinar, we will look at the basic components of HTTP, the language that underlies web APIs. You will learn how to make HTTP requests to an API with the httr package, and how to use best practices when making HTTP GET requests from APIs with R.

In Part 2 (November 30th), we will use the rvest package to extract data that is not provided through an API from the web. How do you collect data that the web developer hasn't packaged nicely in an API for your consumption? By searching for the data in the page's HTML structure and extracting it in a surgical way. The rvest package contains several tools that make this process easy and automatable. We will examine these tools along with the background knowledge of HTML and CSS that they depend on.


Logistics:

Only 1,000 live attendees are allowed in the Webinar on a first come first serve basis. It is typical for many people who register to not attend (which is why registration does not guarantee access.) If for any reason you cannot make the webinar or cannot get in we will provide links to the recording as well as all materials within 48 hours.

Extracting Data from the Web Part 1 Webinar Registration:


Presenter:

Garrett Grolemund Garrett Grolemund, Data Scientist and Master Instructor - Garrett is the Editor and Chief of the Shiny Development Center (shiny.rstudio.com), the official source of documentation, articles, and how-to examples for Shiny. He wrote the popular lubridate package and is the author of Hands On Programming with R and the upcoming book, R for Data Science, from O’Reilly Media. He holds a PhD in Statistics and specializes in Data Visualization. GitHub


Webinar Recordings:

We try to record every webinar we host and post all materials on our website.
http://www.rstudio.com/resources/webinars/

Slides & Code:

We've started a Github repository with all webinar materials. Speakers for this webinar and all future webinars will add their materials to the repository.
https://github.com/rstudio/webinars


Live on November 9th at 11am EDT
Approximately 45 minutes of presentation followed by 15 Minutes of Q&A.