在机器学习pipeline中同时使用PCA和LDA

Bea ·

更新时间:2024-09-21

· 967 次阅读


'''
在机器学习pipeline中同时使用PCA和LDA
'''
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
# import the Iris dataset from scikit-learn
from sklearn.datasets import load_iris
# import our plotting module
import matplotlib.pyplot as plt
# load the Iris dataset
iris = load_iris()
# 创建X，y变量来表示特征和响应变量列。create X and y variables to hold features and response column
iris_X, iris_y = iris.data, iris.target
# Create a PCA module to keep a single component
single_pca = PCA(n_components=1)
# Create a LDA module to keep a single component
single_lda = LinearDiscriminantAnalysis(n_components=1)
# Instantiate a KNN model
knn = KNeighborsClassifier(n_neighbors=3)
# run a cross validation on the KNN without any feature transformation
knn_average = cross_val_score(knn, iris_X, iris_y).mean()
# This is a baseline accuracy. If we did nothing, KNN on its own achieves a 98% accuracy
knn_average
#Let's use our LDA, which keeps only the most powerful component
lda_pipeline = Pipeline([('lda', single_lda), ('knn', knn)])
lda_average = cross_val_score(lda_pipeline, iris_X, iris_y).mean()
lda_average
# create a pipeline that performs PCA
pca_pipeline = Pipeline([('pca', single_pca), ('knn', knn)])
pca_average = cross_val_score(pca_pipeline, iris_X, iris_y).mean()
pca_average
# try LDA with 2 components
lda_pipeline = Pipeline([('lda',LinearDiscriminantAnalysis(n_components=2)),('knn', knn)])
lda_average = cross_val_score(lda_pipeline, iris_X, iris_y).mean()
# Just as good as using original data
lda_average
# compare our feature transformation tools to a feature selection tool
from sklearn.feature_selection import SelectKBest
# try all possible values for k, excluding keeping all columns
for k in [1, 2, 3]:
	# make the pipeline
	select_pipeline = Pipeline([('select', SelectKBest(k=k)), ('knn', knn)])
	# cross validate the pipeline
	select_average = cross_val_score(select_pipeline, iris_X, iris_y).mean()
	print (k, "best feature has accuracy:", select_average)
'''
用 GridSearch module 寻找最优组合：
Scaling data (with or without mean/std) 
PCA components
LDA components
KNN neighbors
'''
def get_best_model_and_accuracy(model, params, X, y):
	grid = GridSearchCV(model, # the model to grid search
						params, # the parameter set to try
						error_score=0.) # if a parameter set raises an error, continue and set the performance as 0
	grid.fit(X, y) # fit the model and parameters
	# our classical metric for performance
	print ("Best Accuracy: {}".format(grid.best_score_))
	# the best parameters that caused the best accuracy
	print ("Best Parameters: {}".format(grid.best_params_))
	# the average time it took a model to fit to the data (in seconds)
	avg_time_fit = round(grid.cv_results_['mean_fit_time'].mean(), 3)
	print ("Average Time to Fit (s): {}".format(avg_time_fit))
	# the average time it took a model to predict out of sample data (in seconds)
	# this metric gives us insight into how this model will perform in real-time analysis
	print ("Average Time to Score (s):{}".format(round(grid.cv_results_['mean_score_time'].mean(), 3)))
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
iris_params = {
		'preprocessing__scale__with_std': [True, False],
		'preprocessing__scale__with_mean': [True, False],
		'preprocessing__pca__n_components':[1, 2, 3, 4], 
		'preprocessing__lda__n_components':[1, 2],# according to scikit-learn docs, max allowed n_components for LDA is number of classes-1
		'clf__n_neighbors': range(1, 9) }
# make a larger pipeline
preprocessing = Pipeline([('scale', StandardScaler()), ('pca', PCA()),('lda', LinearDiscriminantAnalysis())])
iris_pipeline = Pipeline(steps=[('preprocessing', preprocessing),('clf',KNeighborsClassifier())])
get_best_model_and_accuracy(iris_pipeline, iris_params, iris_X, iris_y)
'''output:
1 best feature has accuracy: 0.9538398692810457
2 best feature has accuracy: 0.9607843137254902
3 best feature has accuracy: 0.9738562091503268
Best Accuracy: 0.9866666666666667
Best Parameters: {'clf__n_neighbors': 3, 'preprocessing__lda__n_components': 2, 'preprocessing__pca__n_components': 3, 'preprocessing__scale__with_mean': True, 'preprocessing__scale__with_std': False}
Average Time to Fit (s): 0.003
Average Time to Score (s):0.003
'''


作者：夜已.入深
                    
 
                

                            lda
                            pca


           
    
    

            
                
                    
                
            
            
                
    
        
            需要 登录 后方可回复, 如果你还没有账号请 注册新账号
        
    
                
            
                
                    
                        相关文章

    
        
    
    
        
            详解CSS盒子塌陷的5种解决方法
        
        
            Frieda
            2021-03-19
        
    
    
        891
    


    
        
            Ubuntu VMware出现提示No 3D support is available的解决方法
        
        
            Lacie
            2021-06-13
        
    
    
        773
    


    
        
            在机器学习pipeline中同时使用PCA和LDA
        
        
            Bea
            2020-02-25
        
    
    
        967
    


    
        
    
    
        
            用PCA、LDA、LR做人脸识别代码实现
        
        
            Tyne
            2020-05-26
        
    
    
        710
    


    
        
    
    
        
            Python Gensim文本分析——从文本预处理到TFIDF、LDA建模分析
        
        
            Petra
            2020-05-03
        
    
    
        936
    


    
        
    
    
        
            树莓派4B使用 Adafruit_PCA9685 报错IOError: [Errno 121] Remote I/O error解决办法
        
        
            Clementine
            2020-09-22
        
    
    
        602
    


    
        
    
    
        
            无监督学习之PCA降维
        
        
            Hoshi
            2021-01-16
        
    
    
        504
    


    
        
    
    
        
            ZYNQ #EC1 PL端模拟HDMI输出，i2c接入PCA9548复用器后设备树的分析
        
        
            Charlotte
            2020-07-26
        
    
    
        544
    


    
        
    
    
        
            在Python中使用K-Means聚类和PCA主成分分析进行图像压缩
        
        
            Maeve
            2021-05-13
        
    
    
        924
    


    
        
    
    
        
            机器学习入门 --- LDA与PCA算法（公式推导、纯python代码实现、scikit-learn api调用对比结果）
        
        
            Ingrid
            2020-04-09
        
    
    
        735
    


    
        
    
    
        
            混合模型：基于VGG-16+PCA+Meanshift/DBSCAN的图像分类
        
        
            Yelena
            2020-03-26
        
    
    
        846
    


    
        
            主成分分析（Principal Component Analysis，PCA）
        
        
            Peren
            2021-06-18
        
    
    
        893
    


    
        
            PCA降维的例子
        
        
            Bambi
            2020-08-05
        
    
    
        878
    


    
        
    
    
        
            线性分类的数学基础与应用、Fisher判别的推导（python）、Fisher分类器（线性判别分析，LDA）
        
        
            Pearl
            2021-05-14
        
    
    
        668
    


    
        
            在Python中使用K-Means聚类和PCA主成分分析进行图像压缩
        
        
            Kersen
            2021-03-16
        
    
    
        891
    


    
        
            python实现PCA降维的示例详解
        
        
            Freda
            2020-07-15
        
    
    
        793
    


    
        
            Python sklearn库实现PCA教程(以鸢尾花分类为例)
        
        
            Dawn
            2020-07-15
        
    
    
        755
    


    
        
    
    
        
            (手写)PCA原理及其Python实现图文详解
        
        
            Xanthe
            2021-12-16
        
    
    
        1122
    


    
        
    
    
        
            Python机器学习之PCA降维算法详解
        
        
            Danica
            2021-12-16
        
    
    
        1672
    


    
        
    
    
        
            利用Matlab仿真实现图像烟雾识别(k-means聚类图像分割+LBP+PCA+SVM)
        
        
            Ophelia
            2022-01-14
        
    
    
        1339


        
    
        
            我要提问
        
    
    
        
        
    
        致谢
        
            帮助他人，成就自己。
            人生最大成功就是伸出热情而温暖的双手，尽自己所能去帮助身边的每一个人，只要无私的奉献，就会收获到美好的生活。
            1024问感谢每一位朋友的帮助和支持。
            软件开发网提供编程的基础软件技术培训教程,软件开发编程实例讲解Go,Node,HTML,CSS,Javascript,Python,Java,Ruby,C,PHP,MySQL等软件开发编程语言以及数据开发的基础知识，也提供大量的软件开发在线实例、从入门到精通就在1024问。
        
    
    
        
            
    育儿网
    微养生
    全球行
    美食街
    育儿
    菜谱大全
    海南旅游
    女性
    养狗百科
    星座