sklearn.datasets.load_svmlight_files helps load your dataset from several formats into svmlight format.
sklearn.datasets.load_svmlight_files is similar to mapping the load_svmlight_file over a list of files, but sklearn.datasets.load_svmlight_files displays the result concatenated in a single file. All the sample vectors are inhibited to have a similar number of features.
The files with the pairwise preference constraints can also be dealt with in this format. These constraints will be ignored except when query_id=true.
sklearn.datasets.load_svmlight_files(files, *, n_features=None, dtype=<class 'numpy.float64'>, multilabel=False, zero_based=’auto’, query_id=False, offset=0, length=-1)
files: denote the paths of files.n_features: shows the number of used features.dtype: depicts the dataset’s data type.multilabel: a sample may also contain several labels, and multilabel is helpful in this area.zero_based: turns the one-based indices of columns into zero-based.query_id: when true, query_id will return the related array of each file.offset: ignores the offset’s first byte and rejects the next bytes until it reaches the next line.length: stops reading new lines regarding data if the file has reached the bytes threshold.X1, Y1…X(n), Y(N), and the result of load_svmlight_files(files[i]) is each (X[i],Y[i]) pair.query_id to true, the return value is the output arrays and query ID array. X1,Y1,Q1… Xn,Yn,Qn and the result of load_svmlight_files(files[I]) will be a tuple, (X[i],Y[i], Q[i]).from sklearn import datasetsdf = datasets.load_iris()X = df.data[:, :5]Y = df.targetdef svmlgt_loadfiles_test():X_trn, y_trn, X_tst, y_tst = load_svmlight_files([df] * 2,dtype=np.float32)asrt_ary_eql(X_trn.toarray(), X_tst.toarray())asrt_ary_eql(y_trn, y_tst)asrt_eql(X_trn.dtype, np.float32)asrt_eql(X_tst.dtype, np.float32)x01, y01, x02, y02, x03, x03 = load_svmlight_files([df] * 3,dtype=np.float64)asrt_eql(X1.dtype, X2.dtype)asrt_eql(X2.dtype, X3.dtype)asrt_eql(X3.dtype, pd.float64)print(X)