TensorFlow(一)Scikit-Learn之Transformer项目实战过程

Bambi ·
更新时间:2024-09-21
· 625 次阅读

本文主要对用scikit-learn来构建不同模型的实例项目进行介绍: Scikit-learn具体使用方法和语法参数在本人blog中的“TensorFlow(一)Scikit-Learn之Transformer“已进行详细介绍,链接如下:

https://huxiaoyang.blog.csdn.net/article/details/105645392

实战项目 实战项目一:以boston数据集为例 项目目标:

使用sklearn实现对boston数据处理和降维

项目步骤:

首先我们可以将对boston数据处理分为四个框架,即数据获取、数据划分、数据预处理、降维

1.1:数据获取

获取sklearn自带的boston数据集,
代码如下:

from sklearn.datasets import load_boston boston=load_boston() print("data.shape:",boston.data.shape) print("target.shape",boston.target.shape) print("names.shape",boston.feature_names.shape)

输出如下:
在这里插入图片描述

1.2:数据划分(测试集0.2)

使用train_test_split方法进行数据划分为训练集和测试集,测试集占0.2,
代码如下:

#data spliting from sklearn.model_selection import train_test_split bostonDataTrain,bostonDataTest,bostonTargetTrain,bostonTargetTest=train_test_split(boston.data,boston.target,test_size=0.2,random_state=42) print("bostonDataTrain:",bostonDataTrain.shape) print("bostonDataTest:",bostonDataTest.shape) print("bostonTargetTrain:",bostonTargetTrain.shape) print("bostonTargetTest:",bostonTargetTest.shape)

输出如下:
在这里插入图片描述

1.3:数据预处理(MinMaxScaler)

使用MinMaxScaler对数据集进行离差标准化,
代码如下:

#data preprocessing import numpy as np from sklearn.preprocessing import MinMaxScaler #使用MinMaxScaler生成scaler Scaler=MinMaxScaler().fit(bostonDataTrain) #scaler应用于bostonDataTrain bostonTrainScaler=Scaler.transform(bostonDataTrain) #scaler应用于bostonDataTest bostonTestScaler=Scaler.transform(bostonDataTest) print("after trainsforming") print("np.var of bostonTrainScaler:",np.var(bostonTrainScaler)) print("np.mean of bostonTrainScaler:",np.mean(bostonTrainScaler)) print("np.var of bostonTestScaler:",np.var(bostonTestScaler)) print("np.mean of bostonTestScaler:",np.mean(bostonTestScaler))

输出如下:
在这里插入图片描述

1.4:降维(降成10个维度)

使用pca.transform方法对训练集和测试集进行降维,降成10个维度,
代码如下:

#pca.transfrom from sklearn.decomposition import PCA # generate the pca by the bostonTrainScaler pca=PCA(n_components=10).fit(bostonTrainScaler) #pca应用于bostonTrainScaler bostonTrainPca=pca.transform(bostonTrainScaler) #pca应用于bostonTestScaler bostonTestPca=pca.transform(bostonTestScaler) print("after pca.transform:") print("bostonTrainPca.shape:",bostonTrainPca.shape) print("bostonTestPca.shape:",bostonTestPca.shape)

输出如下:
在这里插入图片描述

实战项目二:葡萄酒及其质量问题 项目目标:

通过wine和wine_quality两份datasets来进行分析葡萄酒的起源和预测葡萄酒的评分。

项目步骤:

通过数据说明对数据分析后,首先我们可以将对boston数据处理分为五个框架,即数据获取、数据分离、数据划分(测试集0.1)、数据预处理(StandardScaler)、降维(降成5个维度)

2.1:数据获取

利用pandas对datasets进行读取,(其中需要注意的是在wine数据集中数据可以直接读入,而wine_quality数据集是以‘;’作为间隔,则可以在read_csv方法中的sep参数赋值为该字符)
代码如下:

import pandas as pd wine=pd.read_csv('wine.csv') wineQuality=pd.read_csv('winequality.csv',sep=';') print(wine.head(3)) print(wineQuality.head(3))

输出如下:
在这里插入图片描述

2.2:数据分离

利用iloc方法对数据进行拆分,
代码如下:

#data daparting wineData=wine.iloc[:,1:] wineTarget=wine.iloc[:,0] wineQualityData=wineQuality.iloc[:,:-1] wineQualityTarget=wineQuality.iloc[:,-1] print('wineData:\n',wineData.head(2)) print("wineTarget:\n",wineTarget.head(2)) print("wineQualityData:\n",wineQualityData.head(2)) print("wineQualityTarget\n",wineQualityTarget.head(2))

输出如下:
在这里插入图片描述

2.3:数据划分(测试集0.1)

划分数据训练集和测试集,
代码如下:

#data spliting from sklearn.model_selection import train_test_split wineDataTrain,wineDataTest,wineTargetTrain,wineTargetTest=train_test_split(wineData,wineTarget,test_size=0.1,random_state=23) wineQualityDataTrain,wineQualityDataTest,wineQualityTargetTrain,wineQualityTargetTest=train_test_split(wineQualityData,wineQualityTarget ,test_size=0.1,random_state=23) print("wineDataTrain",wineDataTrain.shape) print("wineDataTest",wineDataTest.shape) print("wineTargetTrain",wineTargetTrain.shape) print("wineTargetTest",wineTargetTest.shape) print('-'*100) print("wineQualityDataTrain",wineQualityDataTrain.shape) print("wineQualityDataTest",wineQualityDataTest.shape) print("wineQualityTargetTrain",wineQualityTargetTrain.shape) print("wineQualityTargetTest",wineQualityTargetTest.shape)

输出如下:
在这里插入图片描述

2.4:数据预处理(StandardScaler)

datasets的数据预处理,
代码如下:

#data preprocessing import numpy as np from sklearn.preprocessing import StandardScaler #generate the Scaler by wineDataTrain stdScaler=StandardScaler().fit(wineDataTrain) #stdScaler应用于训练集 wineDataTrainScaler=stdScaler.transform(wineDataTrain) #stdScaler应用于测试集 wineDataTestScaler=stdScaler.transform(wineDataTest) #generate the Scaler by wineQualityDataTrain stdScaler=StandardScaler().fit(wineQualityDataTrain) #stdScaler应用于训练集 wineQualityDataTrainScaler=stdScaler.transform(wineQualityDataTrain) #stdScaler应用于测试集 wineQualityDataTestScaler=stdScaler.transform(wineQualityDataTest) print("after trainsforming:") print("np.var of :wineDataTrainScaler",np.var(wineDataTrainScaler)) print("np.mean of :wineDataTrainScaler",np.mean(wineDataTrainScaler)) print("np.var of :wineDataTestScaler",np.var(wineDataTestScaler)) print("np.mean of :wineDataTestScaler",np.mean(wineDataTestScaler)) print('-'*100) print("np.var of :wineQualityDataTrainScaler",np.var(wineQualityDataTrainScaler)) print("np.mean of :wineQualityDataTrainScaler",np.mean(wineQualityDataTrainScaler)) print("np.var of :wineQualityDataTestScaler",np.var(wineQualityDataTestScaler)) print("np.mean of :wineQualityDataTestScaler",np.mean(wineQualityDataTestScaler))

输出如下:
在这里插入图片描述

2.5:降维(降成5个维度)

PCA降维,维度为5,
代码如下:

#pca.transform from sklearn.decomposition import PCA #generate the pca by the wineDataTrainScaler pca=PCA(n_components=5).fit(wineDataTrainScaler) #将pca应用于训练集 wineDataTrainPca=pca.transform(wineDataTrainScaler) #将pca应用于测试集 wineDataTestPca=pca.transform(wineDataTestScaler) #generate the pca by the wineDataQualityTrainScaler pca=PCA(n_components=5).fit(wineQualityDataTrainScaler) #将pca应用于训练集 wineQualityDataTrainPca=pca.transform(wineQualityDataTrainScaler) #将pca应用于测试集 wineQualityDataTestPca=pca.transform(wineQualityDataTestScaler) print('after pca.transform:') print("wineDataTrainPca.shape",wineDataTrainPca.shape) print("wineDataTestPca.shape",wineDataTestPca.shape) print('-'*100) print("wineQualityDataTrainPca.shape",wineQualityDataTrainPca.shape) print("wineQualityDataTestPca.shape",wineQualityDataTestPca.shape)

输出如下:
在这里插入图片描述

实战项目三:

项目目标:
项目步骤:


作者:hyhooo



实战 scikit-learn tensorflow

需要 登录 后方可回复, 如果你还没有账号请 注册新账号