Scikit-learn or Sklean is a library in Python specially designed to solve machine learning problems. It has the sklearn.datasets
module, which includes a bunch of predefined datasets.
The fetch_kddcup99()
method is used to load the kddcup99
dataset in the program. The kddcup99
dataset is purely designed for classification problems.
sklearn.datasets.fetch_kddcup99(*,subset=None,data_home=None,shuffle=False,random_state=None,percent10=True,download_if_missing=True,return_X_y=False,as_frame=False)
subset
: It shows the segmented dataset as a subset of kddcup99
dataset. By default, its value is None
. It can be 'SA'
, 'SF'
, 'http'
, or 'smtp'
.data_home
: It shows the directory where the Sklearn has to save datasets. By default, the Sklearn data is stored in the ~/scikit_learn_data
directory.shuffle
: This is a boolean value that shows whether to shuffle downloaded data or not. By default, it's False
.percent10
: This is a boolean value that shows whether to load 10% of the whole dataset or not.download_if_missing
: If set to True
, it shows the download dataset and stores it locally. Its default value is True
.as_frame
: If true
, it return the DataFrame as well as target instances packed as a bunch
object.This method returns data in the form of a dictionary-like object.
# load some required librariesimport numpy as npfrom sklearn.datasets import fetch_kddcup99import pandas as pd# load kddcup99 datasetdataset = fetch_kddcup99(percent10=True)# show dataset on consoleprint(dataset)
To execute the code, please click on the "Run" button below. It will open a Jupyter Notebook, open the kDDCupDataset
file and run it to view the output.
np
, pd
, and sklearn
libraries in the program.fetch_kddcup99(percent10=True)
loads the kddcup99
dataset into the program. The percent10=True
statement only loads 10% of the whole dataset.