sklearn.datasets.load_svmlight_files
helps load your dataset from several formats into svmlight
format.
sklearn.datasets.load_svmlight_files
is similar to mapping the load_svmlight_file
over a list of files, but sklearn.datasets.load_svmlight_files
displays the result concatenated in a single file. All the sample vectors are inhibited to have a similar number of features.
The files with the pairwise preference constraints can also be dealt with in this format. These constraints will be ignored except when query_id=true
.
sklearn.datasets.load_svmlight_files(files, *, n_features=None, dtype=<class 'numpy.float64'>, multilabel=False, zero_based=’auto’, query_id=False, offset=0, length=-1)
files
: denote the paths of files.n_features
: shows the number of used features.dtype
: depicts the dataset’s data type.multilabel
: a sample may also contain several labels, and multilabel
is helpful in this area.zero_based
: turns the one-based indices of columns into zero-based.query_id
: when true
, query_id
will return the related array of each file.offset
: ignores the offset’s first byte and rejects the next bytes until it reaches the next line.length
: stops reading new lines regarding data if the file has reached the bytes threshold.X1
, Y1
…X(n)
, Y(N)
, and the result of load_svmlight_files(files[i])
is each (X[i],Y[i])
pair.query_id
to true
, the return value is the output arrays and query ID array. X1
,Y1
,Q1
… Xn
,Yn
,Qn
and the result of load_svmlight_files(files[I])
will be a tuple, (X[i],Y[i], Q[i])
.from sklearn import datasetsdf = datasets.load_iris()X = df.data[:, :5]Y = df.targetdef svmlgt_loadfiles_test():X_trn, y_trn, X_tst, y_tst = load_svmlight_files([df] * 2,dtype=np.float32)asrt_ary_eql(X_trn.toarray(), X_tst.toarray())asrt_ary_eql(y_trn, y_tst)asrt_eql(X_trn.dtype, np.float32)asrt_eql(X_tst.dtype, np.float32)x01, y01, x02, y02, x03, x03 = load_svmlight_files([df] * 3,dtype=np.float64)asrt_eql(X1.dtype, X2.dtype)asrt_eql(X2.dtype, X3.dtype)asrt_eql(X3.dtype, pd.float64)print(X)