线性分类器-Tumer Prediction
来源:互联网 发布:时标网络图 软件 编辑:程序博客网 时间:2024/06/02 22:37
肿瘤预测数据地址:https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
#coding=utf-8import pandas as pdimport numpy as np#-------------#use train_test_split to split datafrom sklearn.cross_validation import train_test_split#-------------from sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import LogisticRegressionfrom sklearn.linear_model import SGDClassifier#-------------from sklearn.metrics import classification_report#-------------download data#create feature listcolumn_names=['Sample code number','Clump Thickness','Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']#use pandas.read_csv funtion to read data from internetdata=pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data',names=column_names)#replace ? with standard missing value representationdata=data.replace(to_replace='?',value=np.nan)#drop the data which has missing value(one or more dimension has missing value)data=data.dropna(how='any')#output the total counts and dimensions of the dataprint data.shape#-------------prepare training and testing datas#random select 25%datas for testing,75% for trainingX_train,X_test,y_train,y_test=train_test_split(data[column_names[1:10]],data[column_names[10]],test_size=0.25,random_state=33)#see the nums and types of traingDataprint y_train.value_counts()#see the nums and types of testingDataprint y_test.value_counts()#-------------use Linear Classification Model to make predictions#standardize the data,make sure that datas on each dimension variance is 1,mean value is 0. Do this to make sure that the result won't be dominanted by some dimension because of some large characteristic valuess=StandardScaler()X_train=ss.fit_transform(X_train)X_test=ss.transform(X_test)#initialize LogisticRegression and SGDClassifierlr=LogisticRegression()sgdc=SGDClassifier()#use fit function/model on LogisticRegression to train model pramslr.fit(X_train,y_train)#use trained model lr to make prediction at X_test and store the result on lr_y_predictlr_y_predict=lr.predict(X_test)#use fit function/model on SGDClassifier to train model pramssgdc.fit(X_train,y_train)#use trained model sgdc to make prediction at X_test and store the result on sgdc_y_predictsgdc_y_predict=sgdc.predict(X_test)#-------------performance analysis#use score function provided by LR model to get Accuracy resultprint 'Accuracy of LR Classifier:',lr.score(X_test,y_test)#get other three indexprint classification_report(y_test,lr_y_predict,target_names=['Benign','Malignant'])#use score function provided by SGD model to get Accuracy resultprint 'Accuracy of SGD Classifier:',sgdc.score(X_test,y_test)#get other three indexprint classification_report(y_test,sgdc_y_predict,target_names=['Benign','Malignant'])Result:
(683, 11)
2 344
4 168
Name: Class, dtype: int64
2 100
4 71
Name: Class, dtype: int64
Accuracy of LR Classifier: 0.988304093567
precision recall f1-score support
Benign 0.99 0.99 0.99 100
Malignant 0.99 0.99 0.99 71
avg / total 0.99 0.99 0.99 171
Accuracy of SGD Classifier: 0.982456140351
precision recall f1-score support
Benign 1.00 0.97 0.98 100
Malignant 0.96 1.00 0.98 71
avg / total 0.98 0.98 0.98 171
LR和SGDClassifier:前者对参数的计算采用精确解析的方式,计算时间长但是模型性能略低,后者采用随机梯度上升算法估计模型参数,计算时间时间短但模型性能略高。一般,对于训练数据规模在10万量级以上的数据,考虑到时间到耗用,更推荐使用SGD算法对模型参数进行估计。
- 线性分类器-Tumer Prediction
- 线性分类器-基本概念
- 线性分类器
- 线性分类器
- 模式识别: 线性分类器
- 线性分类器
- 线性分类器设计
- MATLAB线性分类器
- 线性分类器
- 线性分类器
- cs231n-线性分类器
- 模式识别: 线性分类器
- 线性分类器
- 线性分类器
- 线性分类器
- 模式识别: 线性分类器
- 线性分类器之Fisher线性判别函数
- 线性分类器:Fisher线性判别
- tcpdump
- 全栈工程师将会缔造下一个高薪群体
- Akka(18): Stream:组合数据流,组件-Graph components
- Java--a++与 ++a 与 a=a+1 与a+=1
- HDU 6129 Just do it (组合数)
- 线性分类器-Tumer Prediction
- Kotlin 基础
- HDU1029-Ignatius and the Princess IV
- 15_接水问题
- ifconfig
- 使用POI导入Excel文件信息
- 织梦DEDE网站程序动态化+静态化+伪静态设置(图文)教程
- html接入海康视频数据
- Eureka+ribbon 实现服务注册与发现和负载均衡