In Scikit-learn, the sklearn.datasets.make_classification()
function generates the data for an n-class classification problem. Let's take a closer look at the syntax, parameters, and return values of the function.
Here is the syntax of the function:
sklearn.datasets.make_classification(n_samples=100,n_features=20,n_informative=2,n_redundant=2,n_repeated=0,n_classes=2,n_clusters_per_class=2,weights=None,flip_y=0.01,class_sep=1.0,hypercube=True,shift=0.0,scale=1.0,shuffle=True,random_state=None)
n_samples
: This is the number of samples, and its value type is an int
. The default value is 100
.n_features
: This is the total number of functions. Its value type is int
, and its default value is 20
.n_informative
: This is the number of informative features. Its value type is int
, and its default value is 2
.n_redundant
: This is the number of redundant functions. This feature generates arbitrary linear combinations of informative features. Its value type is int
, and its default value is 2
.n_repeated
: This is the number of repeating functions that derive from information and redundant functions. Its value type is int
, and its default value is 0
.n_classes
: This is the number of classes (or labels) for classification problems. Its value type is int
, and its default value is 2
.n_clusters_per_class
: This is the number of clusters per class. Its value type is int
, and its default value is 2
.weights
: This is the proportion of monsters assigned to each category. Its value type is an array-like shape (n_classes,)
or (n_classes - 1,)
and its default value is None
.flip_y
: This is the proportion of samples randomly assigned to classes. Its value type is float
, and its default value is 0.01
.class_sep
: This is the factor to multiply the size of the hypercube with. Its value type is float
, and its default value is 1.0
.hypercube:
This is a boolean value. If it's set to True
, the clusters are placed on the vertices of the hypercube. If it's set to False
, the clusters are placed on the vertices of any polyhedron. Its default value is True
.shift
: This shifts the function by the specified value. Its value type is float
, and its default value is 0.0
.scale
: This multiplies the function by the specified value. Its value type is float
, and its default value is 1.0
.shuffle
: This shuffles the samples and the features. Its value type is bool
, and its default value is True
.random_state
: This controls the generation of random numbers used to create the dataset. Its value type is int
, and its default value is None
.The function returns the following two values:
X
: This shows the input samples in the form of an n-dimensional array of shape (n_samples, n_features)
.Y
: This shows the integer labels for class membership of each sample in the form of an n-dimensional array of shape (n_samples,)
.In the code snippet below, we use the make_classification()
function.
# import libraryfrom sklearn.datasets import make_classification# create features and targetfeatures, target = make_classification(n_samples=100,n_features=10,n_informative=10,n_redundant=0,n_classes=2,weights=[0.3, 0.7],random_state=42)# print features and targetprint("Features:")print(features[:5])print("Targets:")print(target[:5])
make_classification()
function from the sklearn
library.make_classification()
function with parameters.Free Resources