How to get all the image URLs from a Wikipedia page using Python

If you are creating an application that is going to fetch the images from a Wikipedia page for a particular topic, you might be using BeautifulSoup for this task. However, there is a Python package named wikipedia that can help you fetch all the image URLs with just a few lines of code.

We will be using the page() function from the wikipedia package. Let’s take a look at the details of this function.

Parameters

The page() function can accept the following parameters:

  • title: The title of the page that you want to get the details from Wikipedia.
  • pageid: This is the numeric page id of the Wikipedia page that you want to load.
  • auto_suggest: This is an optional parameter that, when set to True, lets Wikipedia find a valid page title for the query.
  • redirect: This is also an optional parameter that, when set to True, allows redirection without raising RedirectError.

Let’s see what the page() function returns.

Return value

The page() function returns an object of WikipediaPage class. On this object, you can use various methods and properties of the class like categories, content, coordinates, html(), etc.

Fetch image URLs

Now, let’s see how we can fetch the image URLs from our desired Wikipedia page.

import wikipedia as wp
query = "New York city"
wp_page = wp.page(query)
list_img_urls = wp_page.images
print(list_img_urls)

Explanation

  • In line 1, we import the required package.
  • In line 3, we define the term that we want to search. (Remember, that if we give New York then a DisambiguationE Error can occur as the term could point to many other similar terms).
  • In line 4, we call the page() and get the WikipediaPage object.
  • In line 5, we use the images property of the WikipediaPage class to get all the image URLs.
  • In line 6, we print all the URLs. In the output, we can see that all the URLs are printed in a list format.

So, in this way, it becomes very easy to fetch the data from Wikipedia using the wikipedia module instead of BeautifulSoup as it eliminates data cleaning and selecting the <img> tags.

Free Resources