What is the sklearn.datasets.fetch_kddcup99 method in Python?

Overview

Scikit-learn or Sklean is a library in Python specially designed to solve machine learning problems. It has the sklearn.datasets module, which includes a bunch of predefined datasets.

The fetch_kddcup99() method is used to load the kddcup99 dataset in the program. The kddcup99 dataset is purely designed for classification problems.

Syntax


sklearn.datasets.fetch_kddcup99(*,
subset=None,
data_home=None,
shuffle=False,
random_state=None,
percent10=True,
download_if_missing=True,
return_X_y=False,
as_frame=False)

Parameters

  • subset: It shows the segmented dataset as a subset of kddcup99 dataset. By default, its value is None. It can be 'SA', 'SF', 'http', or 'smtp'.
  • data_home: It shows the directory where the Sklearn has to save datasets. By default, the Sklearn data is stored in the ~/scikit_learn_data directory.
  • shuffle: This is a boolean value that shows whether to shuffle downloaded data or not. By default, it's False.
  • percent10: This is a boolean value that shows whether to load 10% of the whole dataset or not.
  • download_if_missing: If set to True, it shows the download dataset and stores it locally. Its default value is True.
  • as_frame: If true, it return the DataFrame as well as target instances packed as a bunch object.

Return value

This method returns data in the form of a dictionary-like object.

Example

# load some required libraries
import numpy as np
from sklearn.datasets import fetch_kddcup99
import pandas as pd
# load kddcup99 dataset
dataset = fetch_kddcup99(percent10=True)
# show dataset on console
print(dataset)

To execute the code, please click on the "Run" button below. It will open a Jupyter Notebook, open the kDDCupDataset file and run it to view the output.

Explanation

  • Line 2–4: We load the np, pd, and sklearn libraries in the program.
  • Line 6: Here, fetch_kddcup99(percent10=True) loads the kddcup99 dataset into the program. The percent10=True statement only loads 10% of the whole dataset.
  • Line 8: We print the downloaded dataset to the console.

Free Resources