How to parse a website with R

Parsing or web scraping refers to extracting the required data from the websites. The rvest library in R provides parsing functionality.

Steps to parse a webpage

We can parse a webpage with R in the following three steps:

  1. Import the rvest library.

  2. Read the HTML code.

  3. Scrap the required data from the HTML code.

Example

Here is an R code that scraps data from a Wiki page.

library (rvest)
# Read the HTML
webpage = read_html("https://en.wikipedia.org/wiki/Web_scraping")
# Scrape data with CSS selector
data = html_node(webpage, '.mw-page-title-main')
# Convert the data to text
text = html_text(data)
print(text)

Explanation

  • Line 1: We import the rvest library.

  • Line 4: We use the read_html() function to fetch the downloaded HTML from the Wiki URL provided as a parameter.

  • Line 7: We scrape the page's title from the HTML code stored in the webpage. In this case, the CSS selector for the title is mv-page-title-main.

  • Line 10: We convert the value stored in data to readable form, i.e., text.

Try changing the CSS selector at line 7 to 'p'. This will scrape all the paragraph sections.

Note: In case, a pre-added CSS selector doesn't work, try inspecting the element and verify the CSS code.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved