1 -------------------------------------
2 --- Python interface of LIBLINEAR ---
3 -------------------------------------
9 - Installation via PyPI
10 - Installation via Sources
12 - Quick Start with Scipy
16 - Additional Information
21 Python (http://www.python.org/) is a programming language suitable for rapid
22 development. This tool provides a simple Python interface to LIBLINEAR, a library
23 for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/liblinear). The
24 interface is very easy to use as the usage is the same as that of LIBLINEAR. The
25 interface is developed with the built-in Python library "ctypes."
30 To install the interface from PyPI, execute the following command:
32 > pip install -U liblinear-official
34 Installation via Sources
35 ========================
37 Alternatively, you may install the interface from sources by
38 generating the LIBLINEAR shared library.
40 Depending on your use cases, you can choose between local-directory
41 and system-wide installation.
43 - Local-directory installation:
49 This generates a .so file in the LIBLINEAR main directory and you
50 can run the interface in the current python directory.
52 For Windows, the shared library liblinear.dll is ready in the
53 directory `..\windows' and you can directly run the interface in
54 the current python directory. You can copy liblinear.dll to the
55 system directory (e.g., `C:\WINDOWS\system32\') to make it
56 system-widely available. To regenerate liblinear.dll, please
57 follow the instruction of building Windows binaries in LIBLINEAR
60 - System-wide installation:
66 Please note that you must keep the sources after the installation.
68 For Windows, to run the above command, Microsoft Visual C++ and
69 other tools are needed.
71 In addition, DON'T use the following FAILED commands
73 > python setup.py install (failed to run at the python directory)
79 "Quick Start with Scipy" is in the next section.
81 There are two levels of usage. The high-level one uses utility
82 functions in liblinearutil.py and commonutil.py (shared with LIBSVM
83 and imported by svmutil.py). The usage is the same as the LIBLINEAR
86 >>> from liblinear.liblinearutil import *
87 # Read data in LIBSVM format
88 >>> y, x = svm_read_problem('../heart_scale')
89 >>> m = train(y[:200], x[:200], '-c 4')
90 >>> p_label, p_acc, p_val = predict(y[200:], x[200:], m)
92 # Construct problem in python format
94 >>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
96 >>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
97 >>> prob = problem(y, x)
98 >>> param = parameter('-s 0 -c 4 -B 1')
99 >>> m = train(prob, param)
101 # Other utility functions
102 >>> save_model('heart_scale.model', m)
103 >>> m = load_model('heart_scale.model')
104 >>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
105 >>> ACC, MSE, SCC = evaluations(y, p_label)
107 # Getting online help
110 The low-level use directly calls C interfaces imported by liblinear.py. Note that
111 all arguments and return values are in ctypes format. You need to handle them
114 >>> from liblinear.liblinear import *
115 >>> prob = problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
116 >>> param = parameter('-c 4')
117 >>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
118 # Convert a Python-format instance to feature_nodearray, a ctypes structure
119 >>> x0, max_idx = gen_feature_nodearray({1:1, 3:1})
120 >>> label = liblinear.predict(m, x0)
122 Quick Start with Scipy
123 ======================
125 Make sure you have Scipy installed to proceed in this section.
126 If numba (http://numba.pydata.org) is installed, some operations will be much faster.
128 There are two levels of usage. The high-level one uses utility functions
129 in liblinearutil.py and the usage is the same as the LIBLINEAR MATLAB interface.
132 >>> from liblinear.liblinearutil import *
133 # Read data in LIBSVM format
134 >>> y, x = svm_read_problem('../heart_scale', return_scipy = True) # y: ndarray, x: csr_matrix
135 >>> m = train(y[:200], x[:200, :], '-c 4')
136 >>> p_label, p_acc, p_val = predict(y[200:], x[200:, :], m)
138 # Construct problem in Scipy format
139 # Dense data: numpy ndarray
140 >>> y, x = scipy.asarray([1,-1]), scipy.asarray([[1,0,1], [-1,0,-1]])
141 # Sparse data: scipy csr_matrix((data, (row_ind, col_ind))
142 >>> y, x = scipy.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2])))
143 >>> prob = problem(y, x)
144 >>> param = parameter('-s 0 -c 4 -B 1')
145 >>> m = train(prob, param)
147 # Apply data scaling in Scipy format
148 >>> y, x = svm_read_problem('../heart_scale', return_scipy=True)
149 >>> scale_param = csr_find_scale_param(x, lower=0)
150 >>> scaled_x = csr_scale(x, scale_param)
152 # Other utility functions
153 >>> save_model('heart_scale.model', m)
154 >>> m = load_model('heart_scale.model')
155 >>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
156 >>> ACC, MSE, SCC = evaluations(y, p_label)
158 # Getting online help
161 The low-level use directly calls C interfaces imported by liblinear.py. Note that
162 all arguments and return values are in ctypes format. You need to handle them
165 >>> from liblinear.liblinear import *
166 >>> prob = problem(scipy.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2]))))
167 >>> param = parameter('-c 4')
168 >>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
169 # Convert a tuple of ndarray (index, data) to feature_nodearray, a ctypes structure
170 # Note that index starts from 0, though the following example will be changed to 1:1, 3:1 internally
171 >>> x0, max_idx = gen_feature_nodearray((scipy.asarray([0,2]), scipy.asarray([1,1])))
172 >>> label = liblinear.predict(m, x0)
177 There are two files liblinear.py and liblinearutil.py, which respectively correspond to
178 low-level and high-level use of the interface.
180 In liblinear.py, we adopt the Python built-in library "ctypes," so that
181 Python can directly access C structures and interface functions defined
184 While advanced users can use structures/functions in liblinear.py, to
185 avoid handling ctypes structures, in liblinearutil.py we provide some easy-to-use
186 functions. The usage is similar to LIBLINEAR MATLAB interface.
191 Three data structures derived from linear.h are node, problem, and
192 parameter. They all contain fields with the same names in
193 linear.h. Access these fields carefully because you directly use a C structure
194 instead of a Python object. The following description introduces additional
197 Before using the data structures, execute the following command to load the
198 LIBLINEAR shared library:
200 >>> from liblinear.liblinear import *
202 - class feature_node:
204 Construct a feature_node.
206 >>> node = feature_node(idx, val)
208 idx: an integer indicates the feature index.
210 val: a float indicates the feature value.
212 Show the index and the value of a node.
216 - Function: gen_feature_nodearray(xi [,feature_max=None])
218 Generate a feature vector from a Python list/tuple/dictionary, numpy ndarray or tuple of (index, data):
220 >>> xi_ctype, max_idx = gen_feature_nodearray({1:1, 3:1, 5:-2})
222 xi_ctype: the returned feature_nodearray (a ctypes structure)
224 max_idx: the maximal feature index of xi
226 feature_max: if feature_max is assigned, features with indices larger than
227 feature_max are removed.
231 Construct a problem instance
233 >>> prob = problem(y, x [,bias=-1])
235 y: a Python list/tuple/ndarray of l labels (type must be int/double).
237 x: 1. a list/tuple of l training instances. Feature vector of
238 each training instance is a list/tuple or dictionary.
240 2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
242 bias: if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term
245 You can also modify the bias value by
249 Note that if your x contains sparse data (i.e., dictionary), the internal
250 ctypes data format is still sparse.
254 Construct a parameter instance
256 >>> param = parameter('training_options')
258 If 'training_options' is empty, LIBLINEAR default values are applied.
260 Set param to LIBLINEAR default values.
262 >>> param.set_to_default_values()
264 Parse a string of options.
266 >>> param.parse_options('training_options')
268 Show values of parameters.
274 There are two ways to obtain an instance of model:
276 >>> model_ = train(y, x)
277 >>> model_ = load_model('model_file_name')
279 Note that the returned structure of interface functions
280 liblinear.train and liblinear.load_model is a ctypes pointer of
281 model, which is different from the model object returned
282 by train and load_model in liblinearutil.py. We provide a
283 function toPyModel for the conversion:
285 >>> model_ptr = liblinear.train(prob, param)
286 >>> model_ = toPyModel(model_ptr)
288 If you obtain a model in a way other than the above approaches,
289 handle it carefully to avoid memory leak or segmentation fault.
291 Some interface functions to access LIBLINEAR models are wrapped as
292 members of the class model:
294 >>> nr_feature = model_.get_nr_feature()
295 >>> nr_class = model_.get_nr_class()
296 >>> class_labels = model_.get_labels()
297 >>> is_prob_model = model_.is_probability_model()
298 >>> is_regression_model = model_.is_regression_model()
300 The decision function is W*x + b, where
301 W is an nr_class-by-nr_feature matrix, and
302 b is a vector of size nr_class.
303 To access W_kj (i.e., coefficient for the k-th class and the j-th feature)
304 and b_k (i.e., bias for the k-th class), use the following functions.
306 >>> W_kj = model_.get_decfun_coef(feat_idx=j, label_idx=k)
307 >>> b_k = model_.get_decfun_bias(label_idx=k)
309 We also provide a function to extract w_k (i.e., the k-th row of W) and
310 b_k directly as follows.
312 >>> [w_k, b_k] = model_.get_decfun(label_idx=k)
314 Note that w_k is a Python list of length nr_feature, which means that
316 For regression models, W is just a vector of length nr_feature. Either
317 set label_idx=0 or omit the label_idx parameter to access the coefficients.
319 >>> W_j = model_.get_decfun_coef(feat_idx=j)
320 >>> b = model_.get_decfun_bias()
321 >>> [W, b] = model_.get_decfun()
323 For one-class SVM models, label_idx is ignored and b=-rho is
324 returned from get_decfun(). That is, the decision function is
327 >>> rho = model_.get_decfun_rho()
328 >>> [W, b] = model_.get_decfun()
330 Note that in get_decfun_coef, get_decfun_bias, and get_decfun, feat_idx
331 starts from 1, while label_idx starts from 0. If label_idx is not in the
332 valid range (0 to nr_class-1), then a NaN will be returned; and if feat_idx
333 is not in the valid range (1 to nr_feature), then a zero value will be
334 returned. For regression models, label_idx is ignored.
339 To use utility functions, type
341 >>> from liblinear.liblinearutil import *
343 The above command loads
344 train() : train a linear model
345 predict() : predict testing data
346 svm_read_problem() : read the data from a LIBSVM-format file.
347 load_model() : load a LIBLINEAR model.
348 save_model() : save model to a file.
349 evaluations() : evaluate prediction results.
353 There are three ways to call train()
355 >>> model = train(y, x [, 'training_options'])
356 >>> model = train(prob [, 'training_options'])
357 >>> model = train(prob, param)
359 y: a list/tuple/ndarray of l training labels (type must be int/double).
361 x: 1. a list/tuple of l training instances. Feature vector of
362 each training instance is a list/tuple or dictionary.
364 2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
366 training_options: a string in the same form as that for LIBLINEAR command
369 prob: a problem instance generated by calling
372 param: a parameter instance generated by calling
373 parameter('training_options')
375 model: the returned model instance. See linear.h for details of this
376 structure. If '-v' is specified, cross validation is
377 conducted and the returned model is just a scalar: cross-validation
378 accuracy for classification and mean-squared error for regression.
380 If the '-C' option is specified, best parameters are found
381 by cross validation. The parameter selection utility is supported
382 only by -s 0, -s 2 (for finding C) and -s 11 (for finding C, p).
383 The returned structure is a triple with the best C, the best p,
384 and the corresponding cross-validation accuracy or mean squared
385 error. The returned best p for -s 0 and -s 2 is set to -1 because
386 the p parameter is not used by classification models.
389 To train the same data many times with different
390 parameters, the second and the third ways should be faster..
394 >>> y, x = svm_read_problem('../heart_scale')
395 >>> prob = problem(y, x)
396 >>> param = parameter('-s 3 -c 5 -q')
397 >>> m = train(y, x, '-c 5')
398 >>> m = train(prob, '-w1 5 -c 5')
399 >>> m = train(prob, param)
400 >>> CV_ACC = train(y, x, '-v 3')
401 >>> best_C, best_p, best_rate = train(y, x, '-C -s 0') # best_p is only for -s 11
402 >>> m = train(y, x, '-c {0} -s 0'.format(best_C)) # use the same solver: -s 0
406 To predict testing data with a model, use
408 >>> p_labs, p_acc, p_vals = predict(y, x, model [,'predicting_options'])
410 y: a list/tuple/ndarray of l true labels (type must be int/double).
411 It is used for calculating the accuracy. Use [] if true labels are
414 x: 1. a list/tuple of l training instances. Feature vector of
415 each training instance is a list/tuple or dictionary.
417 2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
419 predicting_options: a string of predicting options in the same format as
422 model: a model instance.
424 p_labels: a list of predicted labels
426 p_acc: a tuple including accuracy (for classification), mean
427 squared error, and squared correlation coefficient (for
430 p_vals: a list of decision values or probability estimates (if '-b 1'
431 is specified). If k is the number of classes, for decision values,
432 each element includes results of predicting k binary-class
433 SVMs. If k = 2 and solver is not MCSVM_CS, only one decision value
434 is returned. For probabilities, each element contains k values
435 indicating the probability that the testing instance is in each class.
436 Note that the order of classes here is the same as 'model.label'
437 field in the model structure.
441 >>> m = train(y, x, '-c 5')
442 >>> p_labels, p_acc, p_vals = predict(y, x, m)
444 - Functions: svm_read_problem/load_model/save_model
446 See the usage by examples:
448 >>> y, x = svm_read_problem('data.txt')
449 >>> m = load_model('model_file')
450 >>> save_model('model_file', m)
452 - Function: evaluations
454 Calculate some evaluations using the true values (ty) and the predicted
457 >>> (ACC, MSE, SCC) = evaluations(ty, pv, useScipy)
459 ty: a list/tuple/ndarray of true values.
461 pv: a list/tuple/ndarray of predicted values.
463 useScipy: convert ty, pv to ndarray, and use scipy functions to do the evaluation
467 MSE: mean squared error.
469 SCC: squared correlation coefficient.
471 - Function: csr_find_scale_parameter/csr_scale
473 Scale data in csr format.
475 >>> param = csr_find_scale_param(x [, lower=l, upper=u])
476 >>> x = csr_scale(x, param)
478 x: a csr_matrix of data.
480 l: x scaling lower limit; default -1.
482 u: x scaling upper limit; default 1.
484 The scaling process is: x * diag(coef) + ones(l, 1) * offset'
486 param: a dictionary of scaling parameters, where param['coef'] = coef and param['offset'] = offset.
488 coef: a scipy array of scaling coefficients.
490 offset: a scipy array of scaling offsets.
492 Additional Information
493 ======================
495 This interface was originally written by Hsiang-Fu Yu from Department of Computer
496 Science, National Taiwan University. If you find this tool useful, please
497 cite LIBLINEAR as follows
499 R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
500 LIBLINEAR: A Library for Large Linear Classification, Journal of
501 Machine Learning Research 9(2008), 1871-1874. Software available at
502 http://www.csie.ntu.edu.tw/~cjlin/liblinear
504 For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
505 or check the FAQ page:
507 http://www.csie.ntu.edu.tw/~cjlin/liblinear/faq.html