吴恩达第三次作业——多元分类

Paula ·

更新时间:2024-09-21

· 808 次阅读

【此类专题仅用于学习】

本次主要通过多个逻辑回归分类来识别0∼90 \sim 90∼9的数字。 数据预处理 已有的数据是20∗2020*2020∗20的像素点，一共有500050005000组样例。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat

数据集是MATLAB的本机格式，所以要加载它在Python，我们需要使用一个SciPy工具。

data = loadmat('ex3data1.mat')

这里导入了数据，数据是分为X、YX、YX、Y，分别是(5000,400)(5000,400)(5000,400)和(5000,1)(5000,1)(5000,1)
我们通过下面的语句可以得到他们的大小。

data['X'].shape,data['Y'].shape

代价函数

我们通过逻辑回归求解.
我们首先写一个ggg函数，ggg代表一个常用的逻辑函数为SSS形函数，公式为： g(z)=11+e−zg\left( z \right)=\frac{1}{1+{{e}^{-z}}}g(z)=1+e−z1
合起来，我们得到逻辑回归模型的假设函数：
hθ(x)=11+e−θTX{{h}_{\theta }}\left( x \right)=\frac{1}{1+{{e}^{-{{\theta }^{T}}X}}}hθ(x)=1+e−θTX1
ggg函数：

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

带正则化的代价函数：
J(θ)=1m∑i=1m[−y(i)log⁡(hθ(x(i)))−(1−y(i))log⁡(1−hθ(x(i)))]+λ2m∑j=1nθj2J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2
我们可以分成左右两边，先转换成numpynumpynumpy的矩阵。
因为是对应项相乘，所以是np.multiplynp.multiplynp.multiply，这里XXX是5000∗4005000*4005000∗400，thetathetatheta是1∗4001*4001∗400，所以计算预测值是X∗theta.TX*theta.TX∗theta.T。
这里因为是numpynumpynumpy矩阵，可以直接用∗*∗表示点乘。
这里np.powernp.powernp.power表示对全体平方，np.power([],k)np.power([],k)np.power([],k)表示kkk次方。
因为正则化不包含常数项，所以进行了列表分片。

def cost(theta, X, y, learningRate):
    # INPUT：参数值theta，数据X，标签y，学习率
    # OUTPUT：当前参数值下的交叉熵损失
    # TODO：根据参数和输入的数据计算交叉熵损失函数
    # STEP1：将theta, X, y转换为numpy类型的矩阵
    # your code here  (appro ~ 3 lines)
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    # STEP2：根据公式计算损失函数（不含正则化）
    # your code here  (appro ~ 2 lines)
    m=len(X)
    cross_cost = np.multiply(-y,np.log(sigmoid(X*theta.T)))
    cross_cost = cross_cost-np.multiply(1-y,np.log(1-sigmoid(X*theta.T)))
    # STEP3：根据公式计算损失函数中的正则化部分
    # your code here  (appro ~ 1 lines)
    reg = (learningRate/(2*m))*np.sum(np.power(theta[1:],2))
    # STEP4：把上两步当中的结果加起来得到整体损失函数
    # your code here  (appro ~ 1 lines)
    whole_cost = np.sum(cross_cost)/m + reg
    return whole_cost

计算梯度

(以便使用TNC截断牛顿法进行优化梯度下降）

如果我们要使用梯度下降法令这个代价函数最小化，因为我们未对θ0{{\theta }_{0}}θ0 进行正则化，所以梯度下降算法将分两种情形：
θ0=θ0−a1m∑i=1m[hθ(x(i))−y(i)]x0(i)\theta_0=\theta_0-a\frac{1}{m}\sum\limits_{i=1}^m[h_{\theta}(x^{(i)})-y^{(i)}]x_0^{(i)}θ0=θ0−am1i=1∑m[hθ(x(i))−y(i)]x0(i)
θj=θj−a(1m∑i=1m[hθ(x(i))−y(i)]xj(i)+λmθj)\theta_j=\theta_j-a(\frac{1}{m}\sum\limits_{i=1}^m[h_{\theta}(x^{(i)})-y^{(i)}]x_j^{(i)}+\frac{\lambda}{m}\theta_j)θj=θj−a(m1i=1∑m[hθ(x(i))−y(i)]xj(i)+mλθj)
这里我们aaa就当做111。

首先还是转换成矩阵。
然后得到参数数量，一般采取的方式都是把thetathetatheta拉平
num=int(theta.ravel().shape[0])num=int(theta.ravel().shape[0])num=int(theta.ravel().shape[0])
直接计算errorerrorerror，即(hθ(x(i))−y(i))(h_{\theta}(x^{(i)})-y^{(i)})(hθ(x(i))−y(i))

我们可以发现[hθ(x(i))−y(i)]∗xj(i)[h_{\theta}(x^{(i)})-y^{(i)}]*x_j^{(i)}[hθ(x(i))−y(i)]∗xj(i),因为是要第jjj列所有和errorerrorerror对应乘，并且要得到numnumnum个结果。
可以想到用X.T∗errorX.T*errorX.T∗error，左边第一行变成了x1ix_1^{i}x1i，乘上所有的errorerrorerror。
最后因为第000项，没有正则化所以要修正。直接用grad[0,0]grad[0,0]grad[0,0]

def gradient(theta, X, y, learningRate):
    # INPUT：参数值theta，数据X，标签y，学习率
    # OUTPUT：当前参数值下的梯度
    # TODO：根据参数和输入的数据计算梯度
    # STEP1：将theta, X, y转换为numpy类型的矩阵
    # your code here  (appro ~ 3 lines)e
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    # STEP2：将theta矩阵拉直（转换为一个向量）
    # your code here  (appro ~ 1 lines)
    parameters = int(theta.ravel().shape[0])
    # STEP3：计算预测的误p差
    # your code here  (appro ~ 1 lines)    
    error = sigmoid(X*theta.T) - y
    # STEP4：根据上面的公式计算梯度
    # your code here  (appro ~ 1 lines)
    grad = ((X.T*error).T + learningRate*theta)/len(X)
    # STEP5：由于j=0时不需要正则化，所以这里重置一下
    # your code here  (appro ~ 1 lines)
    grad[0, 0] = np.sum(np.multiply(error,X[:,0]))/len(X)
    return np.array(grad).ravel()

分类器

因为需要分101010个类，我们采取的办法就是分别做101010次逻辑回归。
每个分类器在“类别 iii”和“不是iii”之间决定。我们将把分类器训练包含在一个函数中，该函数计算101010个分类器中的每个分类器的最终权重，并将权重返回为10∗(n+1)10*(n + 1)10∗(n+1)数组，其中nnn是参数数量。因为是分类之间互不影响，我们直接利用minimizeminimizeminimize进行优化梯度下降求最佳参数。

这里我们all_thetaall\_thetaall_theta是记录最佳参数的，然后利用下面的转换得到对于每一次的训练集的结果。

y_i = np.array([1 if label == i else 0 for label in y])
y_i = np.reshape(y_i, (rows, 1))

然后进行梯度下降
fmin=minimize(fun=cost,x0=theta,args=(tmpX,yi,learningrate),method=′TNC′,jac=gradient)fmin = minimize(fun=cost, x0=theta, args=(tmpX, y_i, learning_rate), method='TNC', jac=gradient)fmin=minimize(fun=cost,x0=theta,args=(tmpX,yi,learningrate),method=′TNC′,jac=gradient)
funfunfun表示代价函数，x0x0x0是初值，argsargsargs是(训练集参数，训练集结果，正则化参数)，method=′TNC′method='TNC'method=′TNC′表示截断牛顿法，jacjacjac表示梯度。
fmin.xfmin.xfmin.x即可得到参数结果。

最后总的分类器函数是

from scipy.optimize import minimize
def one_vs_all(X, y, num_labels, learning_rate):
    rows = X.shape[0]
    params = X.shape[1]
    # k X (n + 1) array for the parameters of each of the k classifiers
    all_theta = np.zeros((num_labels, params + 1))
    # insert a column of ones at the beginning for the intercept term
    tmpX = np.insert(X, 0, values=np.ones(rows), axis=1)
    # labels are 1-indexed instead of 0-indexed
    for i in range(1, num_labels + 1):
        theta = np.zeros(params + 1)
        y_i = np.array([1 if label == i else 0 for label in y])
        y_i = np.reshape(y_i, (rows, 1))
        # minimize the objective function
        fmin = minimize(fun=cost, x0=theta, args=(tmpX, y_i, learning_rate), method='TNC', jac=gradient)
        all_theta[i-1,:] = fmin.x
    return all_theta

这边插入一行元素，使用了tmpX=np.insert(X,0,values=np.ones(rows),axis=1)tmpX=np.insert(X,0,values=np.ones(rows),axis=1)tmpX=np.insert(X,0,values=np.ones(rows),axis=1)
表示在列的范畴，第000列，加上全111。

检验结果

这个可以输出多少种标签[[[从而我们可以推广至更多元分类。

np.unique(data['y'])#看下有几类标签

最后进行预测测试集。

def predict_all(X, all_theta):
    # INPUT：参数值theta，测试数据X
    # OUTPUT：预测值
    # TODO：对测试数据进行预测
    # STEP1：获取矩阵的维度信息
    rows = X.shape[0]
    params = X.shape[1]
    num_labels = all_theta.shape[0]
    # STEP2：把矩阵X加入一行1元素
    # your code here  (appro ~ 1 lines)
    tmpX = np.insert(X,0,values=np.ones(rows),axis=1)
    # STEP3：把矩阵X和all_theta转换为numpy型矩阵
    # your code here  (appro ~ 2 lines)
    tmpX = np.matrix(tmpX)
    all_theta = np.matrix(all_theta)
    # STEP4：计算样本属于每一类的概率
    # your code here  (appro ~ 1 lines)
    h = sigmoid(tmpX*all_theta.T)
    # STEP5：找到每个样本中预测概率最大的值
    # your code here  (appro ~ 1 lines)
    h_argmax = np.argmax(h, axis=1)
    # STEP6：因为我们的数组是零索引的，所以我们需要为真正的标签+1
    h_argmax = h_argmax + 1
    return h_argmax

这里的np.argmax(h,axis=1)np.argmax(h,axis=1)np.argmax(h,axis=1)是对每一行取所有列的最大值。
如果是要对每一列所有行取最值，则是axis=0axis=0axis=0

有效性

直接带回去比较。

y_pred = predict_all(data['X'], all_theta)
correct = [1 if a == b else 0 for (a, b) in zip(y_pred, data['y'])]
accuracy = (sum(map(int, correct)) / float(len(correct)))
print ('accuracy = {0}%'.format(accuracy * 100))

这里correctcorrectcorrect类似于之前，zipzipzip函数表示分别迭代。
这里的mapmapmap函数用法是(l,r)(l,r)(l,r)对rrr所有元素使用lll函数，返回列表。
这里我们统计所有正确数量，除以总数量。
这里的{0}\{0\}{0}表示pypypy的格式化输出，里面的编号表示了顺序。
得到准确率应为94.46%。

作者：mxYlulu

吴恩达分类

1024 个赞