spark mllib svm输出的分数是什么意思?


0

我不理解Spark MLLIB算法中SVM分类器的输出。我想把分数 transformation成概率,这样我就可以得到属于某个类的数据点的概率(在这个类上对SVM进行训练,也就是说,多类问题)(另请参见这个线程)。不清楚分数是什么意思。是到超平面的距离吗?我如何从中得到概率呢?

2 答案

0
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.mllib.util.MLUtils

// Load training data in LIBSVM format.
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

// Split data into training (60%) and test (40%).
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache()
val test = splits(1)

// Run training algorithm to build the model
val numIterations = 100
val model = SVMWithSGD.train(training, numIterations)

// Clear the default threshold.
model.clearThreshold()

// Compute raw scores on the test set.
val scoreAndLabels = test.map { point =>
  val score = model.predict(point.features)
  (score, point.label)
}

// Get evaluation metrics.
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
val auROC = metrics.areaUnderROC()

println("Area under ROC = " + auROC)

// Save and load model
model.save(sc, "myModelPath")
val sameModel = SVMModel.load(sc, "myModelPath")

如果您在MLLIB中使用SVM模块,它们会提供AUC,即ROC曲线下的面积,它相当于“精度”。希望有帮助。


0

该值是距分隔超平面的距离。这不是概率,SVM一般不会给你一个概率。不过,正如@cfh note的评论所述,您可以尝试学习基于此差额的概率。但这和SVM是分开的。


我来回答