If you are creating an application that is going to fetch the images from a Wikipedia page for a particular topic, you might be using BeautifulSoup for this task. However, there is a Python package named wikipedia that can help you fetch all the image URLs with just a few lines of code.
We will be using the page()
function from the wikipedia package. Let’s take a look at the details of this function.
The page()
function can accept the following parameters:
Let’s see what the page()
function returns.
The page()
function returns an object of WikipediaPage class.
On this object, you can use various methods and properties of the class like categories, content, coordinates, html(), etc.
Now, let’s see how we can fetch the image URLs from our desired Wikipedia page.
import wikipedia as wpquery = "New York city"wp_page = wp.page(query)list_img_urls = wp_page.imagesprint(list_img_urls)
Explanation
New York
then a DisambiguationE Error
can occur as the term could point to many other similar terms).page()
and get the WikipediaPage
object.images
property of the WikipediaPage
class to get all the image URLs.So, in this way, it becomes very easy to fetch the data from Wikipedia using the wikipedia module instead of BeautifulSoup as it eliminates data cleaning and selecting the <img>
tags.