shuxuemoxing_iris visilization

Isabella ·

更新时间:2024-11-01

· 782 次阅读

玩转鸾尾花
Iris 鸢尾花数据集是一个经典数据集，在统计学习和机器学习领域都经常被用作示例。
\quad数据集内包含 3 类共 150 条记录，每类各 50 个数据，
\quad每条记录都有 4 项特征：花萼长度、花萼宽度、花瓣长度、花瓣宽度，
可以通过这4个特征预测鸢尾花卉属于（iris-setosa, iris-versicolour, iris-virginica）中的哪一品种。
\quadsetosa是山鸾尾，versicolour是变色鸾尾（或彩色鸾尾），virginica是维吉尼亚鸾尾。
\quad下面3幅图分别是setosa，versicolour，virginica。

iris

sepal1

from sklearn import datasets
iris = datasets.load_iris()

dir(iris)

['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']

iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

iris.data # train set训练集

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
        ......
       [5.9, 3. , 5.1, 1.8]])

iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

iris_data_and_target=np.hstack((iris.data,iris.target.reshape(-1,1).astype(np.int32)))

iris1['Species'] = load_iris().target

array([[5.1, 3.5, 1.4, 0.2, 0. ],
       [4.9, 3. , 1.4, 0.2, 0. ],
       [4.7, 3.2, 1.3, 0.2, 0. ],
       [4.6, 3.1, 1.5, 0.2, 0. ],
       [5. , 3.6, 1.4, 0.2, 0. ],
       [5.4, 3.9, 1.7, 0.4, 0. ],
       [4.6, 3.4, 1.4, 0.3, 0. ],
       [5. , 3.4, 1.5, 0.2, 0. ],
       [4.4, 2.9, 1.4, 0.2, 0. ],
       [4.9, 3.1, 1.5, 0.1, 0. ],
       [5.4, 3.7, 1.5, 0.2, 0. ],
       [4.8, 3.4, 1.6, 0.2, 0. ],
       [4.8, 3. , 1.4, 0.1, 0. ],
       [4.3, 3. , 1.1, 0.1, 0. ],
       [5.8, 4. , 1.2, 0.2, 0. ],
       [5.7, 4.4, 1.5, 0.4, 0. ],
       [5.4, 3.9, 1.3, 0.4, 0. ],
       [5.1, 3.5, 1.4, 0.3, 0. ],
       [5.7, 3.8, 1.7, 0.3, 0. ],
       [5.1, 3.8, 1.5, 0.3, 0. ],
        ......
       [5.9, 3. , 5.1, 1.8, 2. ]])

iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

print(iris.DESCR)

.. _iris_dataset:
Iris plants dataset
--------------------
**Data Set Characteristics:**
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
    :Summary Statistics:
    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
    ============== ==== ==== ======= ===== ====================
    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988
The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.
This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.
.. topic:: References
   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     Mathematical Statistics" (John Wiley, NY, 1950).
   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     Structure and Classification Rule for Recognition in Partially Exposed
     Environments".  IEEE Transactions on Pattern Analysis and Machine
     Intelligence, Vol. PAMI-2, No. 1, 67-71.
   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     on Information Theory, May 1972, 431-433.
   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     conceptual clustering system finds 3 classes in the data.
   - Many, many more ...

import pandas as pd
iris_df = pd.DataFrame(iris.data, columns=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']) 
iris_df['Species'] = load_iris().target
iris_df.head(10)

	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2
5	5.4	3.9	1.7	0.4
6	4.6	3.4	1.4	0.3
7	5.0	3.4	1.5	0.2
8	4.4	2.9	1.4	0.2
9	4.9	3.1	1.5	0.1

import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
plt.figure(figsize=(12,7))
x=iris.data
y=iris.target
p1=plt.scatter(x[y==0,0],x[y==0,1],color='red',s=50)#取特征第0,1列，绘制类别为0的，颜色red
p2=plt.scatter(x[y==1,0],x[y==1,1],color='blue',s=50)#取特征第0,1列，绘制类别为1的，颜色blue
p3=plt.scatter(x[y==2,0],x[y==2,1],color='green',s=50)#取特征第0,1列，绘制类别为2的，颜色green
plt.legend([p1,p2,p3],['setosa','versicolor','virginica'],loc='upper left',fontsize=15)
plt.title('鸢尾花花瓣的长度和宽度',fontsize=20)
plt.xlabel('(花瓣长度)pedal length',fontsize=20)
plt.ylabel('(花瓣宽度)pedal width',fontsize=20)
plt.show()

png

# x=iris.data
# y=iris.target
import matplotlib.pyplot as plt
idx=np.array([[0,1],[0,2],[0,3],[1,2],[1,3],[2,3]])
fig=plt.figure(figsize=(15,20))
x=iris.data
y=iris.target
for i in range(1,7):
    ax=fig.add_subplot(3,2,i)
    p1=ax.scatter(x[y==0,idx[i-1,0]],x[y==0,idx[i-1,1]],color='red')#取特征第0,1列，绘制类别为0的，颜色red
    p2=ax.scatter(x[y==1,idx[i-1,0]],x[y==1,idx[i-1,1]],color='blue')#取特征第0,1列，绘制类别为1的，颜色blue
    p3=ax.scatter(x[y==2,idx[i-1,0]],x[y==2,idx[i-1,1]],color='green')#取特征第0,1列，绘制类别为2的，颜色green
    plt.legend([p1,p2,p3],['setosa','versicolor','virginica'],loc='upper left',fontsize=15)
    titles=np.array(['花瓣长度和花瓣宽度','花瓣长度和花萼长度','花瓣长度和花萼宽度',
                    '花瓣宽度和花萼长度','花瓣宽度和花萼宽度','花萼长度和花萼宽度'])
    ax.set_title(titles[i-1],fontsize=15)
    labels=np.array(['花瓣长度','花瓣宽度','花萼长度','花萼宽度'])
    ax.set_xlabel(labels[idx[i-1]][0],fontsize=15)
    ax.set_ylabel(labels[idx[i-1]][1],fontsize=15)
plt.show()

png

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
iris = datasets.load_iris()
x = iris.data[:,[0,1]] #x - axis petals length -width
y = iris.target #y - axis species
x_min, x_max = x[:,0].min() - 0.5, x[:,0].max() + 0.5
y_min, y_max = x[:,1].min() - 0.5, x[:,1].max() + 0.5
# MESH
cmap_light = ListedColormap(['#AAAAFF','#AAFFAA','#FFAAAA'])
h = 0.2
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max,h))
knn = KNeighborsClassifier()
knn.fit(x,y)
Z = knn.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(10,6))
plt.pcolormesh(xx,yy,Z,cmap = cmap_light)
# Plot the training points
plt.scatter(x[:,0],x[:,1],c = y)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.show()

png

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig=plt.figure(figsize=(17,17))
idx=np.array([[0,1,2],[0,1,3],[0,2,3],[1,2,3]])
x=iris.data
y=iris.target
for i in range(1,5):
    ax=fig.add_subplot(2,2,i,projection='3d')
    p1=ax.scatter3D(x[y==0,idx[i-1,0]],x[y==0,idx[i-1,1]],x[y==0,idx[i-1,2]],color='red',s=50)#取特征第0,1列，绘制类别为0的，颜色red
    p2=ax.scatter3D(x[y==1,idx[i-1,0]],x[y==1,idx[i-1,1]],x[y==1,idx[i-1,2]],color='blue',s=50)#取特征第0,1列，绘制类别为1的，颜色blue
    p3=ax.scatter3D(x[y==2,idx[i-1,0]],x[y==2,idx[i-1,1]],x[y==2,idx[i-1,2]],color='green',s=50)#取特征第0,1列，绘制类别为2的，颜色green
    ax.legend([p1,p2,p3],['setosa','versicolor','virginica'],loc='upper left',fontsize=10)
    titles=np.array(['花瓣长度,宽度,花萼长度','花瓣长度,宽度,花萼宽度','花瓣长度,花萼长度,宽度','花瓣宽度,花萼长度,宽度'])
    ax.set_title(titles[i-1],fontsize=12)
    labels=np.array(['花瓣长度','花瓣宽度','花萼长度','花萼宽度'])
    ax.set_xlabel('('+labels[idx[i-1]][0]+')pedal length',fontsize=12)
    ax.set_ylabel('('+labels[idx[i-1]][1]+')pedal width',fontsize=12)
    ax.set_zlabel('('+labels[idx[i-1]][2]+')sepal length',fontsize=12)
plt.show()

png

seaborn 的 jointplot 函数可以在同一个图中画出二变量的散点图和单变量的柱状图

import seaborn as sns
columns=['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalLengthCm','Species']
sns.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=iris_df, size=5,color='green')

png

箱线图统计学知识
上限值：Q1-1.5×IQR

上相邻值：距离上限值最近的值
须线：上下分位数各自与上下相邻值的距离
上四分位数（Q1)：一组数据按顺序排列，从小至大第25%位置的数值
中位数：一组数据按顺序排列，从小至大第50%位置的数值
中位线（IQR）：Q3-Q1上四分位数至下四分位数的距离
下四分位数（Q3）：一组数据按顺序排列，从小至大第75%位置的数值
下相邻值：距离下限值最近的值
下限值：Q3+1.5×IQR
离群值（异常值）：一组数据中超过上下限的真实值

boxplot2

对照正态分布图来参考：

boxplot3

fig,axes = plt.subplots(2,2,figsize=(15,16))
sns.boxplot(x="Species", y="SepalLengthCm", data=iris_df,ax=axes[0,0])
sns.boxplot(x="Species", y="SepalWidthCm", data=iris_df,ax=axes[0,1])
sns.boxplot(x="Species", y="PetalLengthCm", data=iris_df,ax=axes[1,0])
sns.boxplot(x="Species", y="PetalWidthCm", data=iris_df,ax=axes[1,1])

png

fig,axes = plt.subplots(figsize=(12,8))
sns.boxplot(data=iris_df.drop('Species',axis=1))

png

# 用pandas实现上面的箱式图如下
iris_df.boxplot(by="Species", figsize=(12, 8))
iris_df.drop('Species',axis=1).boxplot(figsize=(12, 8))

array([[,
        ],
       [,
        ]],
      dtype=object)

png

iris_df.drop('Species',axis=1).plot(kind='box',subplots=True,layout=(2,2),sharex=False,figsize=(15,8))

SepalLengthCm       AxesSubplot(0.125,0.536818;0.352273x0.343182)
SepalWidthCm     AxesSubplot(0.547727,0.536818;0.352273x0.343182)
PetalLengthCm          AxesSubplot(0.125,0.125;0.352273x0.343182)
PetalWidthCm        AxesSubplot(0.547727,0.125;0.352273x0.343182)
dtype: object

png

绘制小提琴图(同箱型图类似)

fig,axes = plt.subplots(2,2,figsize=(15,16))
sns.violinplot(y=iris_df['SepalLengthCm'],ax=axes[0,0])
sns.violinplot(y=iris_df['SepalWidthCm'],ax=axes[0,1])
sns.violinplot(y=iris_df['PetalLengthCm'],ax=axes[1,0])
sns.violinplot(y=iris_df['PetalWidthCm'],ax=axes[1,1])

png

fig,axes = plt.subplots(2,2,figsize=(15,16))
sns.violinplot(x=iris_df['Species'],y=iris_df['SepalLengthCm'],ax=axes[0,0],linewidth=5,width=0.7)
sns.violinplot(x=iris_df['Species'],y=iris_df['SepalWidthCm'],ax=axes[0,1])
sns.violinplot(x=iris_df['Species'],y=iris_df['PetalLengthCm'],ax=axes[1,0])
sns.violinplot(x=iris_df['Species'],y=iris_df['PetalLengthCm'],ax=axes[1,1])

png

fig,axes = plt.subplots(2,2,figsize=(15,16))
sns.violinplot(y=iris_df['Species'],x=iris_df['SepalLengthCm'],ax=axes[0,0],orient='horizon')
sns.violinplot(y=iris_df['Species'],x=iris_df['SepalWidthCm'],ax=axes[0,1],orient='horizon')
sns.violinplot(y=iris_df['Species'],x=iris_df['PetalLengthCm'],ax=axes[1,0],orient='horizon')
sns.violinplot(y=iris_df['Species'],x=iris_df['PetalLengthCm'],ax=axes[1,1],orient='horizon')

png

平行坐标图

平行坐标图是可视化高维数据的常用方法。

from pandas.plotting import parallel_coordinates
plt.subplots(1,1,figsize=(12,8))
parallel_coordinates(iris_df,'Species')

png

六边形蜂窝图(颜色越深表示符合这个区间的数据越多)

iris_df.plot.hexbin(x='SepalLengthCm',y='SepalWidthCm',gridsize=25,figsize=(12,8))

1024 个赞

需要登录后方可回复, 如果你还没有账号请注册新账号

.net core xss攻击防御的方法

Rasine 2021-02-09

557

shuxuemoxing_iris visilization

Isabella 2020-01-26

782

iris鸢尾花数据集最全数据分析

Nafisa 2021-02-24

947

基于matplotlib对iris数据集进行数据分析

Alexandra 2020-09-27

704

【K-means算法】{1} —— 使用Python实现K-means算法并处理Iris数据集

Helen 2020-09-29

807

【Bisecting K-means算法】{1} —— 使用Python实现Bisecting K-means算法并处理Iris数据集

Dianne 2021-01-10

525

knn分类iris数据

Honey 2021-08-03

675

Python 实训 1 计算 iris 数据值的均值

Yvonne 2020-10-16

822

sns.load_dataset("iris")报错原因探究+解决办法

Odetta 2020-06-23

976

Iris数据集的 Fisher线性分类以及数据可视化

Faustine 2020-10-31

989

用Jupyter notebook完成Iris数据集的 Fisher线性分类，并学习数据可视化技术

Ilona 2021-07-28

783

线性分类器理论基础、Fisher判别算法、Iris数据集实战

Ilona 2020-03-13

502

线性分类器数学基础+Python中的Fisher判别+Iris数据集

Doria 2020-03-18

878

Iris数据集的Fisher线性分类及可视化

Noella 2021-03-20

847

机器学习--Iris数据集的Fisher线性分类以及数据可视化技术的学习

Nita 2020-09-24

578

人工智能基础学习：用Jupyter完成Iris数据集的 Fisher线性分类，并学习数据可视化技术

Nona 2020-01-08

718

Python-鸢尾花数据集Iris 数据可视化：读取数据、显示数据、描述性统计、散点图、直方图、KDE图、箱线图

Echo 2020-03-08

619

Python-线性判别分析（Fisher判别分析）使用鸢尾花数据集 Iris

Roxana 2021-05-04

961

详解Golang Iris框架的基本使用

Manda 2020-06-24

722

mac下安装golang框架iris的方法

Celeste 2021-07-23

561

我要提问

致谢

帮助他人，成就自己。

人生最大成功就是伸出热情而温暖的双手，尽自己所能去帮助身边的每一个人，只要无私的奉献，就会收获到美好的生活。

1024问感谢每一位朋友的帮助和支持。

软件开发网提供编程的基础软件技术培训教程,软件开发编程实例讲解Go,Node,HTML,CSS,Javascript,Python,Java,Ruby,C,PHP,MySQL等软件开发编程语言以及数据开发的基础知识，也提供大量的软件开发在线实例、从入门到精通就在1024问。

育儿网微养生全球行美食街育儿菜谱大全海南旅游女性养狗百科星座

	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2
5	5.4	3.9	1.7	0.4
6	4.6	3.4	1.4	0.3
7	5.0	3.4	1.5	0.2
8	4.4	2.9	1.4	0.2
9	4.9	3.1	1.5	0.1

	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2
5	5.4	3.9	1.7	0.4
6	4.6	3.4	1.4	0.3
7	5.0	3.4	1.5	0.2
8	4.4	2.9	1.4	0.2
9	4.9	3.1	1.5	0.1

	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2
5	5.4	3.9	1.7	0.4
6	4.6	3.4	1.4	0.3
7	5.0	3.4	1.5	0.2
8	4.4	2.9	1.4	0.2
9	4.9	3.1	1.5	0.1