原标题:SVM支持向量机实例
原文来自:博客园 原文链接:https://www.cnblogs.com/weijiazheng/p/10903863.html
波士顿房价回归分析
1.导入波士顿房价数据集
1 2 3 4 5 6 7 8 9 10 11 12 13 | import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
boston = load_boston()
print (boston.keys())
|
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
2.使用SVR进行建模
1 2 3 4 5 6 7 8 9 10 11 12 13 | from sklearn.model_selection import train_test_split
X,y = boston.data,boston.target
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state = 8 )
print ( 'nnn' )
print ( '代码运行结果' )
print ( '====================================n' )
print (X_train.shape)
print (X_test.shape)
print ( 'n====================================' )
print ( 'nnn' )
|
代码运行结果
====================================
(379, 13)
(127, 13)
====================================
1 2 3 4 5 6 7 8 | from sklearn.svm import SVR
for kernel in [ 'linear' , 'rbf' ]:
svr = SVR(kernel = kernel,gamma = 'auto' )
svr.fit(X_train,y_train)
print (kernel, '核函数的模型训练集得分: {:.3f}' . format (svr.score(X_train,y_train)))
print (kernel, '核函数的模型测试集得分: {:.3f}' . format (svr.score(X_test,y_test)))
|
linear 核函数的模型训练集得分: 0.709
linear 核函数的模型测试集得分: 0.696
rbf 核函数的模型训练集得分: 0.145
rbf 核函数的模型测试集得分: 0.001
1 2 3 4 5 6 7 8 9 10 11 12 | plt.plot(X. min (axis = 0 ), 'v' ,label = 'min' )
plt.plot(X. max (axis = 0 ), '^' ,label = 'max' )
plt.yscale( 'log' )
plt.legend(loc = 'best' )
plt.xlabel( 'features' )
plt.ylabel( 'feature magnitude' )
plt.show()
|
3.用StandardScaler进行数据预处理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
plt.plot(X_train_scaled. min (axis = 0 ), 'v' ,label = 'train set min' )
plt.plot(X_train_scaled. max (axis = 0 ), '^' ,label = 'train set max' )
plt.plot(X_test_scaled. min (axis = 0 ), 'v' ,label = 'test set min' )
plt.plot(X_test_scaled. max (axis = 0 ), '^' ,label = 'test set max' )
plt.legend(loc = 'best' )
plt.xlabel( 'scaled features' )
plt.ylabel( 'scaled feature magnitude' )
plt.show()
|
4.数据预处理后重新训练模型
1 2 3 4 5 6 | for kernel in [ 'linear' , 'rbf' ]:
svr = SVR(kernel = kernel)
svr.fit(X_train_scaled,y_train)
print ( '数据预处理后' ,kernel, '核函数的模型训练集得分: {:.3f}' . format (svr.score(X_train_scaled,y_train)))
print ( '数据预处理后' ,kernel, '核函数的模型测试集得分: {:.3f}' . format (svr.score(X_test_scaled,y_test)))
|
数据预处理后 linear 核函数的模型训练集得分: 0.706
数据预处理后 linear 核函数的模型测试集得分: 0.698
数据预处理后 rbf 核函数的模型训练集得分: 0.665
数据预处理后 rbf 核函数的模型测试集得分: 0.695
1 2 3 4 5 | svr = SVR(C = 100 ,gamma = 0.1 )
svr.fit(X_train_scaled,y_train)
print ( '调节参数后的"rbf"内核的SVR模型在训练集得分:{:.3f}' . format (svr.score(X_train_scaled,y_train)))
print ( '调节参数后的"rbf"内核的SVR模型在测试集得分:{:.3f}' . format (svr.score(X_test_scaled,y_test)))
|
调节参数后的"rbf"内核的SVR模型在训练集得分:0.966
调节参数后的"rbf"内核的SVR模型在测试集得分:0.894
总结:
我们通过对数据预处理和参数的调节,使"rbf"内核的SVR模型在测试集中的得分从0.001升到0.894,
于是我们可以知道SVM算法对数据预处理和调参的要求非常高.
文章引自 : 《深入浅出python机器学习》
免责声明:本文来自互联网新闻客户端自媒体,不代表本网的观点和立场。
合作及投稿邮箱:E-mail:editor@tusaishared.com