Scikit-Learn is a popular machine learning library in Python. It has some of the most fundamental algorithms used in supervised and unsupervised learning in machine learning.
To use Scikit-learn, we need to import the library abbreviated as sklearn, as shown below.
import sklearn
The Iris dataset is one of the most popular datasets in data science. It is considered the ‘Hello World’ of machine learning and can be used to learn classification algorithms.
The Iris dataset consists of 3 types of Iris flowers and their characteristics and classifications.
The ‘scikit-learn’ package already comes with the Iris dataset preloaded.
Use the following steps to import the datasets package from sklearn. This gives us access to other datasets as well.
from sklearn import datasets#this imports the package 'datasets' from sklearn
In order to import the iris data as a numpy array, set the return parameter to True.
from sklearn import datasetsiris_X,iris_y = datasets.load_iris(return_X_y = True)#loads the dataset as a numpy array#to view the Iris_X dataset arrayprint(iris_X)
To import the training data (X) as a dataframe and the training data (y) as a series, set the as_frame parameter to True.
from sklearn import datasetsiris_X,iris_y = datasets.load_iris(return_X_y = True , as_frame = True)#the X,y data is converted to a dataframe and series respectively
The
as_framefunctionality is not available insklearnversion 0.22 and older, so in case you run into an error (such as ‘unspecified keyword argument’ as_frame), you can upgrade yoursklearnlibrary using this code:
!pip install scikit-learn == 0.24on your jupyter notebookor
pip install --upgrade scikit-learnin your Python terminal.