下面是Python机器学习性能度量利用鸢尾花数据绘制P-R曲线的完整攻略。
首先,需要导入相关的Python包:
import matplotlib.pyplot as plt
import numpy as np
from itertools import cycle
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
from sklearn import datasets
from sklearn.preprocessing import label_binarize
接下来,我们需要载入鸢尾花数据集:
iris = datasets.load_iris()
X = iris.data
y = iris.target
为了绘制P-R曲线,我们需要将标签二值化。这里我们使用label_binarize
函数:
y_bin = label_binarize(y, classes=[0, 1, 2])
然后,使用交叉验证拆分数据集为训练集和测试集。在这里,我们使用train_test_split
函数来完成数据集的拆分:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y_bin, test_size=.5,
random_state=0)
注:这里我们将样本拆分成了训练集和测试集,比例为5:5。
在这里,我们使用sklearn
中的RandomForestClassifier
模型作为示例。同样,需要引入模型,然后对训练集进行拟合:
from sklearn.ensemble import RandomForestClassifier
# Apply classifier to training data
classifier = RandomForestClassifier()
classifier.fit(X_train, y_train)
由于需要绘制P-R曲线,所以我们需要计算P-R曲线的关键性能指标——精度和召回率。在这里,我们使用precision_recall_curve
函数来计算这些指标:
y_score = classifier.predict_proba(X_test)
precision = dict()
recall = dict()
n_classes = 3
for i in range(n_classes):
precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
y_score[:, i])
plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i))
plt.xlabel("recall")
plt.ylabel("precision")
plt.legend(loc="best")
plt.title("Precision-Recall Curve")
plt.show()
在这里,我们使用predict_proba
函数来获取测试集的预测概率值。然后,我们循环遍历每个样本标签,分别计算精度和召回率。最后,绘制出P-R曲线。
为了进一步度量模型的性能,可以计算平均精度。在这里,我们使用average_precision_score
函数来计算平均精度:
# Compute average precision score
avg_precision = dict()
for i in range(n_classes):
avg_precision[i] = average_precision_score(y_test[:, i],
y_score[:, i])
print('Average precision-recall score for class {}: {:.2f}'.format(i,
avg_precision[i]))
下面是两个示例:
在这个示例中,我们将test_size
设置成了0.3:
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_bin, test_size=.3,
random_state=0)
classifier = RandomForestClassifier()
classifier.fit(X_train, y_train)
y_score = classifier.predict_proba(X_test)
for i in range(n_classes):
precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
y_score[:, i])
plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i))
plt.xlabel("recall")
plt.ylabel("precision")
plt.legend(loc="best")
plt.title("Precision-Recall Curve")
plt.show()
avg_precision = dict()
for i in range(n_classes):
avg_precision[i] = average_precision_score(y_test[:, i],
y_score[:, i])
print('Average precision-recall score for class {}: {:.2f}'.format(i,
avg_precision[i]))
在这个示例中,我们使用了SVC
模型:
from sklearn.svm import SVC
X_train, X_test, y_train, y_test = train_test_split(X, y_bin, test_size=.5,
random_state=0)
classifier = SVC(probability=True)
classifier.fit(X_train, y_train)
y_score = classifier.predict_proba(X_test)
for i in range(n_classes):
precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
y_score[:, i])
plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i))
plt.xlabel("recall")
plt.ylabel("precision")
plt.legend(loc="best")
plt.title("Precision-Recall Curve")
plt.show()
avg_precision = dict()
for i in range(n_classes):
avg_precision[i] = average_precision_score(y_test[:, i],
y_score[:, i])
print('Average precision-recall score for class {}: {:.2f}'.format(i,
avg_precision[i]))
在这个示例中,我们使用了SVC
模型,并通过参数probability=True
来获取预测概率值。
以上就是Python机器学习性能度量利用鸢尾花数据绘制P-R曲线的完整攻略,希望对您有所帮助。
本文链接:http://task.lmcjl.com/news/16695.html