Scikit-learn or Sklean is a library in Python specially designed to solve machine learning problems. It has the sklearn.datasets module, which includes a bunch of predefined datasets.
The fetch_kddcup99() method is used to load the kddcup99 dataset in the program. The kddcup99 dataset is purely designed for classification problems.
sklearn.datasets.fetch_kddcup99(*,subset=None,data_home=None,shuffle=False,random_state=None,percent10=True,download_if_missing=True,return_X_y=False,as_frame=False)
subset: It shows the segmented dataset as a subset of kddcup99 dataset. By default, its value is None. It can be 'SA', 'SF', 'http', or 'smtp'.data_home: It shows the directory where the Sklearn has to save datasets. By default, the Sklearn data is stored in the ~/scikit_learn_data directory.shuffle: This is a boolean value that shows whether to shuffle downloaded data or not. By default, it's False.percent10: This is a boolean value that shows whether to load 10% of the whole dataset or not.download_if_missing: If set to True, it shows the download dataset and stores it locally. Its default value is True.as_frame: If true, it return the DataFrame as well as target instances packed as a bunch object.This method returns data in the form of a dictionary-like object.
# load some required librariesimport numpy as npfrom sklearn.datasets import fetch_kddcup99import pandas as pd# load kddcup99 datasetdataset = fetch_kddcup99(percent10=True)# show dataset on consoleprint(dataset)
To execute the code, please click on the "Run" button below. It will open a Jupyter Notebook, open the kDDCupDataset file and run it to view the output.
np, pd, and sklearn libraries in the program.fetch_kddcup99(percent10=True) loads the kddcup99 dataset into the program. The percent10=True statement only loads 10% of the whole dataset.