How to get all the image URLs from a Wikipedia page using Python

If you are creating an application that is going to fetch the images from a Wikipedia page for a particular topic, you might be using BeautifulSoup for this task. However, there is a Python package named wikipedia that can help you fetch all the image URLs with just a few lines of code.

We will be using the page() function from the wikipedia package. Let’s take a look at the details of this function.

Parameters

The page() function can accept the following parameters:

  • title: The title of the page that you want to get the details from Wikipedia.
  • pageid: This is the numeric page id of the Wikipedia page that you want to load.
  • auto_suggest: This is an optional parameter that, when set to True, lets Wikipedia find a valid page title for the query.
  • redirect: This is also an optional parameter that, when set to True, allows redirection without raising RedirectError.

Let’s see what the page() function returns.

Return value

The page() function returns an object of WikipediaPage class. On this object, you can use various methods and properties of the class like categories, content, coordinates, html(), etc.

Fetch image URLs

Now, let’s see how we can fetch the image URLs from our desired Wikipedia page.

import wikipedia as wp
query = "New York city"
wp_page = wp.page(query)
list_img_urls = wp_page.images
print(list_img_urls)

Explanation

  • In line 1, we import the required package.
  • In line 3, we define the term that we want to search. (Remember, that if we give New York then a DisambiguationE Error can occur as the term could point to many other similar terms).
  • In line 4, we call the page() and get the WikipediaPage object.
  • In line 5, we use the images property of the WikipediaPage class to get all the image URLs.
  • In line 6, we print all the URLs. In the output, we can see that all the URLs are printed in a list format.

So, in this way, it becomes very easy to fetch the data from Wikipedia using the wikipedia module instead of BeautifulSoup as it eliminates data cleaning and selecting the <img> tags.

New on Educative
Learn to Code
Learn any Language as a beginner
Develop a human edge in an AI powered world and learn to code with AI from our beginner friendly catalog
🏆 Leaderboard
Daily Coding Challenge
Solve a new coding challenge every day and climb the leaderboard

Free Resources