What is sklearn.datasets.load_svmlight_files in scikit-learn?

sklearn.datasets.load_svmlight_files helps load your dataset from several formats into svmlight format.

sklearn.datasets.load_svmlight_files is similar to mapping the load_svmlight_file over a list of files, but sklearn.datasets.load_svmlight_files displays the result concatenated in a single file. All the sample vectors are inhibited to have a similar number of features.

The files with the pairwise preference constraints can also be dealt with in this format. These constraints will be ignored except when query_id=true.

Syntax

sklearn.datasets.load_svmlight_files(files, *, n_features=None, dtype=<class 'numpy.float64'>, multilabel=False, zero_based=’auto’, query_id=False, offset=0, length=-1)

Parameters

files: denote the paths of files.
n_features: shows the number of used features.
dtype: depicts the dataset’s data type.
multilabel: a sample may also contain several labels, and multilabel is helpful in this area.
zero_based: turns the one-based indices of columns into zero-based.
query_id: when true, query_id will return the related array of each file.
offset: ignores the offset’s first byte and rejects the next bytes until it reaches the next line.
length: stops reading new lines regarding data if the file has reached the bytes threshold.

Return value

The return value is X1, Y1…X(n), Y(N), and the result of load_svmlight_files(files[i]) is each (X[i],Y[i]) pair.
If we set query_id to true, the return value is the output arrays and query ID array. X1,Y1,Q1… Xn,Yn,Qn and the result of load_svmlight_files(files[I]) will be a tuple, (X[i],Y[i], Q[i]).

from sklearn import datasets
df = datasets.load_iris()
X = df.data[:, :5]
Y = df.target
def svmlgt_loadfiles_test():
    X_trn, y_trn, X_tst, y_tst = load_svmlight_files([df] * 2,
                                                           dtype=np.float32)
    asrt_ary_eql(X_trn.toarray(), X_tst.toarray())
    asrt_ary_eql(y_trn, y_tst)
    asrt_eql(X_trn.dtype, np.float32)
    asrt_eql(X_tst.dtype, np.float32)
 
    x01, y01, x02, y02, x03, x03 = load_svmlight_files([df] * 3,
                                                 dtype=np.float64)
    asrt_eql(X1.dtype, X2.dtype)
    asrt_eql(X2.dtype, X3.dtype)
    asrt_eql(X3.dtype, pd.float64)
print(X)

Demo Code

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

What is sklearn.datasets.load_svmlight_files in scikit-learn?

Syntax

Parameters

Return value

Example