-LIBLINEAR is a simple package for solving large-scale regularized linear
-classification and regression. It currently supports
+LIBLINEAR is a simple package for solving large-scale regularized linear
+classification and regression. It currently supports
- L2-regularized logistic regression/L2-loss support vector classification/L1-loss support vector classification
- L1-regularized L2-loss support vector classification/L1-regularized logistic regression
-- L2-regularized L2-loss support vector regression/L1-loss support vector regression.
+- L2-regularized L2-loss support vector regression/L1-loss support vector regression.
This document explains the usage of LIBLINEAR.
To get started, please read the ``Quick Start'' section first.
There are some large data for which with/without nonlinear mappings
gives similar performances. Without using kernels, one can
-efficiently train a much larger set via linear classification/regression.
+efficiently train a much larger set via linear classification/regression.
These data usually have a large number of features. Document classification
is an example.
where f is the primal function and pos/neg are # of
positive/negative data (default 0.01)
-s 11
- |f'(w)|_2 <= eps*|f'(w0)|_2 (default 0.001)
+ |f'(w)|_2 <= eps*|f'(w0)|_2 (default 0.001)
-s 1, 3, 4 and 7
Dual maximal violation <= eps; similar to libsvm (default 0.1)
-s 5 and 6
The primal-dual relationship implies that -s 1 and -s 2 give the same
model, -s 0 and -s 7 give the same, and -s 11 and -s 12 give the same.
-We implement 1-vs-the rest multi-class strategy for classification.
-In training i vs. non_i, their C parameters are (weight from -wi)*C
+We implement 1-vs-the rest multi-class strategy for classification.
+In training i vs. non_i, their C parameters are (weight from -wi)*C
and C, respectively. If there are only two classes, we train only one
model. Thus weight1*C vs. weight2*C is used. See examples below.
> train -C data_file
-Conduct cross validation many times by L2-loss SVM
-and find the parameter C which achieves the best cross
+Conduct cross validation many times by L2-loss SVM
+and find the parameter C which achieves the best cross
validation accuracy.
> train -C -s 0 -v 3 -c 0.5 -e 0.0001 data_file
-For parameter selection by -C, users can specify other
-solvers (currently -s 0 and -s 2 are supported) and
-different number of CV folds. Further, users can use
-the -c option to specify the smallest C value of the
-search range. This setting is useful when users want
-to rerun the parameter selection procedure from a
-specified C under a different setting, such as a stricter
+For parameter selection by -C, users can specify other
+solvers (currently -s 0 and -s 2 are supported) and
+different number of CV folds. Further, users can use
+the -c option to specify the smallest C value of the
+search range. This setting is useful when users want
+to rerun the parameter selection procedure from a
+specified C under a different setting, such as a stricter
stopping tolerance -e 0.0001 in the above example.
> train -c 10 -w1 2 -w2 5 -w3 2 four_class_data_file
- Function: model* train(const struct problem *prob,
const struct parameter *param);
- This function constructs and returns a linear classification
- or regression model according to the given training data and
+ This function constructs and returns a linear classification
+ or regression model according to the given training data and
parameters.
struct problem describes the problem:
where `l' is the number of training data. If bias >= 0, we assume
that one additional feature is added to the end of each data
instance. `n' is the number of feature (including the bias feature
- if bias >= 0). `y' is an array containing the target values. (integers
- in classification, real numbers in regression) And `x' is an array
- of pointers, each of which points to a sparse representation (array
+ if bias >= 0). `y' is an array containing the target values. (integers
+ in classification, real numbers in regression) And `x' is an array
+ of pointers, each of which points to a sparse representation (array
of feature_node) of one training vector.
For example, if we have the following training data:
[ ] -> (2,0.1) (4,1.4) (5,0.5) (6,1) (-1,?)
[ ] -> (1,-0.1) (2,-0.2) (3,0.1) (4,1.1) (5,0.1) (6,1) (-1,?)
- struct parameter describes the parameters of a linear classification
+ struct parameter describes the parameters of a linear classification
or regression model:
struct parameter
L2R_L1LOSS_SVR_DUAL L2-regularized L1-loss support vector regression (dual)
C is the cost of constraints violation.
- p is the sensitiveness of loss of support vector regression.
+ p is the sensitiveness of loss of support vector regression.
eps is the stopping criterion.
nr_weight, weight_label, and weight are used to change the penalty
param describes the parameters used to obtain the model.
- nr_class and nr_feature are the number of classes and features,
- respectively. nr_class = 2 for regression.
+ nr_class and nr_feature are the number of classes and features,
+ respectively. nr_class = 2 for regression.
The nr_feature*nr_class array w gives feature weights. We use one
against the rest for multi-class classification, so each feature
The format of prob is same as that for train().
-- Function: void find_parameter_C(const struct problem *prob,
- const struct parameter *param, int nr_fold, double start_C,
+- Function: void find_parameter_C(const struct problem *prob,
+ const struct parameter *param, int nr_fold, double start_C,
double max_C, double *best_C, double *best_rate);
This function is similar to cross_validation. However, instead of
- conducting cross validation under a specified parameter C, it
- conducts cross validation many times under parameters C = start_C,
+ conducting cross validation under a specified parameter C, it
+ conducts cross validation many times under parameters C = start_C,
2*start_C, 4*start_C, 8*start_C, ..., and finds the best one with
the highest cross validation accuracy.
-
- If start_C <= 0, then this procedure calculates a small enough C
- for prob as the start_C. The procedure stops when the models of
- all folds become stable or C reaches max_C. The best C and the
+
+ If start_C <= 0, then this procedure calculates a small enough C
+ for prob as the start_C. The procedure stops when the models of
+ all folds become stable or C reaches max_C. The best C and the
corresponding accuracy are assigned to *best_C and *best_rate,
respectively.
For a classification model, the predicted class for x is returned.
For a regression model, the function value of x calculated using
- the model is returned.
+ the model is returned.
- Function: double predict_values(const struct model *model_,
const struct feature_node *x, double* dec_values);
- This function gives nr_w decision values in the array dec_values.
+ This function gives nr_w decision values in the array dec_values.
nr_w=1 if regression is applied or the number of classes is two. An exception is
- multi-class SVM by Crammer and Singer (-s 4), where nr_w = 2 if there are two classes. For all other situations, nr_w is the
+ multi-class SVM by Crammer and Singer (-s 4), where nr_w = 2 if there are two classes. For all other situations, nr_w is the
number of classes.
- We implement one-vs-the rest multi-class strategy (-s 0,1,2,3,5,6,7)
+ We implement one-vs-the rest multi-class strategy (-s 0,1,2,3,5,6,7)
and multi-class SVM by Crammer and Singer (-s 4) for multi-class SVM.
The class with the highest decision value is returned.
- Function: void set_print_string_function(void (*print_func)(const char *));
Users can specify their output format by a function. Use
- set_print_string_function(NULL);
+ set_print_string_function(NULL);
for default printing to stdout.
Building Windows Binaries
g[i] = w[i] + 2*g[i];
}
-// A coordinate descent algorithm for
+// A coordinate descent algorithm for
// multi-class support vector machines by Crammer and Singer
//
// min_{\alpha} 0.5 \sum_m ||w_m(\alpha)||^2 + \sum_i \sum_m e^m_i alpha^m_i
// s.t. \alpha^m_i <= C^m_i \forall m,i , \sum_m \alpha^m_i=0 \forall i
-//
+//
// where e^m_i = 0 if y_i = m,
// e^m_i = 1 if y_i != m,
-// C^m_i = C if m = y_i,
-// C^m_i = 0 if m != y_i,
-// and w_m(\alpha) = \sum_i \alpha^m_i x_i
+// C^m_i = C if m = y_i,
+// C^m_i = 0 if m != y_i,
+// and w_m(\alpha) = \sum_i \alpha^m_i x_i
//
-// Given:
+// Given:
// x, y, C
// eps is the stopping tolerance
//
double eps_shrink = max(10.0*eps, 1.0); // stopping tolerance for shrinking
bool start_from_all = true;
- // Initial alpha can be set here. Note that
+ // Initial alpha can be set here. Note that
// sum_m alpha[i*nr_class+m] = 0, for all i=1,...,l-1
// alpha[i*nr_class+m] <= C[GETI(i)] if prob->y[i] == m
// alpha[i*nr_class+m] <= 0 if prob->y[i] != m
delete [] active_size_i;
}
-// A coordinate descent algorithm for
+// A coordinate descent algorithm for
// L1-loss and L2-loss SVM dual problems
//
// min_\alpha 0.5(\alpha^T (Q + D)\alpha) - e^T \alpha,
// s.t. 0 <= \alpha_i <= upper_bound_i,
-//
+//
// where Qij = yi yj xi^T xj and
-// D is a diagonal matrix
+// D is a diagonal matrix
//
// In L1-SVM case:
// upper_bound_i = Cp if y_i = 1
// D_ii = 1/(2*Cp) if y_i = 1
// D_ii = 1/(2*Cn) if y_i = -1
//
-// Given:
+// Given:
// x, y, Cp, Cn
// eps is the stopping tolerance
//
// solution will be put in w
-//
+//
// See Algorithm 3 of Hsieh et al., ICML 2008
#undef GETI
}
-// A coordinate descent algorithm for
+// A coordinate descent algorithm for
// L1-loss and L2-loss epsilon-SVR dual problem
//
// min_\beta 0.5\beta^T (Q + diag(lambda)) \beta - p \sum_{i=1}^l|\beta_i| + \sum_{i=1}^l yi\beta_i,
// s.t. -upper_bound_i <= \beta_i <= upper_bound_i,
-//
+//
// where Qij = xi^T xj and
-// D is a diagonal matrix
+// D is a diagonal matrix
//
// In L1-SVM case:
// upper_bound_i = C
// upper_bound_i = INF
// lambda_i = 1/(2*C)
//
-// Given:
+// Given:
// x, y, p, C
// eps is the stopping tolerance
//
// solution will be put in w
//
-// See Algorithm 4 of Ho and Lin, 2012
+// See Algorithm 4 of Ho and Lin, 2012
#undef GETI
#define GETI(i) (0)
}
-// A coordinate descent algorithm for
+// A coordinate descent algorithm for
// the dual of L2-regularized logistic regression problems
//
// min_\alpha 0.5(\alpha^T Q \alpha) + \sum \alpha_i log (\alpha_i) + (upper_bound_i - \alpha_i) log (upper_bound_i - \alpha_i),
// s.t. 0 <= \alpha_i <= upper_bound_i,
-//
-// where Qij = yi yj xi^T xj and
+//
+// where Qij = yi yj xi^T xj and
// upper_bound_i = Cp if y_i = 1
// upper_bound_i = Cn if y_i = -1
//
-// Given:
+// Given:
// x, y, Cp, Cn
// eps is the stopping tolerance
//
delete [] index;
}
-// A coordinate descent algorithm for
+// A coordinate descent algorithm for
// L1-regularized L2-loss support vector classification
//
// min_w \sum |wj| + C \sum max(0, 1-yi w^T xi)^2,
//
-// Given:
+// Given:
// x, y, Cp, Cn
// eps is the stopping tolerance
//
delete [] xj_sq;
}
-// A coordinate descent algorithm for
+// A coordinate descent algorithm for
// L1-regularized logistic regression problems
//
// min_w \sum |wj| + C \sum log(1+exp(-yi w^T xi)),
//
-// Given:
+// Given:
// x, y, Cp, Cn
// eps is the stopping tolerance
//
}
//
- // Labels are ordered by their first occurrence in the training set.
- // However, for two-class sets with -1/+1 labels and -1 appears first,
+ // Labels are ordered by their first occurrence in the training set.
+ // However, for two-class sets with -1/+1 labels and -1 appears first,
// we swap labels to ensure that internally the binary SVM has positive data corresponding to the +1 instances.
//
if (nr_class == 2 && label[0] == -1 && label[1] == 1)
param1.C = param1.C*ratio;
}
- if(param1.C > max_C && max_C > start_C)
+ if(param1.C > max_C && max_C > start_C)
info("warning: maximum C reached.\n");
free(fold_start);
free(perm);
}
// use inline here for better performance (around 20% faster than the non-inline one)
-static inline double get_w_value(const struct model *model_, int idx, int label_idx)
+static inline double get_w_value(const struct model *model_, int idx, int label_idx)
{
int nr_class = model_->nr_class;
int solver_type = model_->param.solver_type;
return 0;
if(check_regression_model(model_))
return w[idx];
- else
+ else
{
if(label_idx < 0 || label_idx >= nr_class)
return 0;
&& param->solver_type != L2R_L1LOSS_SVR_DUAL)
return "unknown solver type";
- if(param->init_sol != NULL
+ if(param->init_sol != NULL
&& param->solver_type != L2R_LR && param->solver_type != L2R_L2LOSS_SVC)
return "Initial-solution specification supported only for solver L2R_LR and L2R_L2LOSS_SVC";
nr_class otherwise.
If the '-v' option is specified, cross validation is conducted and the
-returned model is just a scalar: cross-validation accuracy for
+returned model is just a scalar: cross-validation accuracy for
classification and mean-squared error for regression. If the '-C' option
-is specified, the best parameter C is found by cross validation. The
-returned model is a two dimensional vector, where the first value is
-the best C and the second value is the corresponding cross-validation
+is specified, the best parameter C is found by cross validation. The
+returned model is a two dimensional vector, where the first value is
+the best C and the second value is the corresponding cross-validation
accuracy. The parameter selection utility is supported by only -s 0
and -s 2.
Other Utilities
===============
-A matlab function libsvmread reads files in LIBSVM format:
+A matlab function libsvmread reads files in LIBSVM format:
-[label_vector, instance_matrix] = libsvmread('data.txt');
+[label_vector, instance_matrix] = libsvmread('data.txt');
Two outputs are labels and instances, which can then be used as inputs
-of svmtrain or svmpredict.
+of svmtrain or svmpredict.
A matlab function libsvmwrite writes Matlab matrix to a file in LIBSVM format:
libsvmwrite('data.txt', label_vector, instance_matrix]
The instance_matrix must be a sparse matrix. (type must be double)
-For windows, `libsvmread.mexw64' and `libsvmwrite.mexw64' are ready in
+For windows, `libsvmread.mexw64' and `libsvmwrite.mexw64' are ready in
the directory `..\windows'.
These codes are prepared by Rong-En Fan and Kai-Wei Chang from National
Use the best parameter to train (only supported by -s 0 and -s 2):
matlab> best = train(heart_scale_label, heart_scale_inst, '-C -s 0');
-matlab> model = train(heart_scale_label, heart_scale_inst, sprintf('-c %f -s 0', best(1))); % use the same solver: -s 0
+matlab> model = train(heart_scale_label, heart_scale_inst, sprintf('-c %f -s 0', best(1))); % use the same solver: -s 0
Additional Information
======================