求解逻辑回归----梯度下降

Ella ·
更新时间:2024-09-21
· 714 次阅读

文章目录案例简介数据可视化建立分类器sigmoid函数:映射到概率的函数model 函数: 返回预测结果值cost : 根据参数计算损失gradient : 计算每个参数的梯度方向descent : 进行参数更新精度 案例简介

参考资料
逻辑回归函数
Python数据分析与机器学习-逻辑回归案例分析
案例内容

现在有一份学生两次考试的结果的数据

根据数据建立一个逻辑回归模型来预测一个学生的入学概率。

数据内容:两个考试的申请人的分数和录取决定。

# 导入相应的包 import numpy as np import pandas as pd import matplotlib as mpl import warnings # 警告处理 import matplotlib.pyplot as plt import os from matplotlib.font_manager import FontProperties from sklearn.linear_model.coordinate_descent import ConvergenceWarning # 警告处理 %matplotlib inline # 设置显示中文字体 my_font = FontProperties(fname="/usr/share/fonts/chinese/simsun.ttc") # fontproperties = my_font # 设置正常显示符号 mpl.rcParams["axes.unicode_minus"] = False # 拦截异常 warnings.filterwarnings(action = 'ignore', category = ConvergenceWarning) # 加载数据 # 从本地导入数据 path = '/root/zhj/python3/code/data/LogiReg_data.txt' # header=0 是指将文件中第 0 行(一般理解应该是第一行)作为“列名”。 # 如果没有设置则默认取第一行, # 设置为 None 的时侯 Pandas 会用自然数 0、1、2……来标识列名。 pdData = pd.read_csv(path, header=0, names=['test1', 'test2', 'result']) # 为了区别header属性,用数据维度进行试验 pdData1 = pd.read_csv(path, names=['test1', 'test2', 'result']) pdData2 = pd.read_csv(path, header=None, names=['test1', 'test2', 'result']) print("查看前10行数据") print("="* 31) print(pdData.head(10)) print("="* 31) print("查看数据维度") print("header=0的数据维度:",pdData.shape) print("header默认值的数据维度:",pdData1.shape) print("header=None值的数据维度:",pdData2.shape) 查看前10行数据 =============================== test1 test2 result 0 30.286711 43.894998 0 1 35.847409 72.902198 0 2 60.182599 86.308552 1 3 79.032736 75.344376 1 4 45.083277 56.316372 0 5 61.106665 96.511426 1 6 75.024746 46.554014 1 7 76.098787 87.420570 1 8 84.432820 43.533393 1 9 95.861555 38.225278 0 =============================== 查看数据维度 header=0的数据维度: (99, 3) header默认值的数据维度: (100, 3) header=None值的数据维度: (100, 3) 数据可视化 # 根据result把数据分为两类 positive = pdData[pdData['result'] == 1] # 返回result为1的数据 negative = pdData[pdData['result'] == 0] # 设置图片大小,分辨率 fig, ax = plt.subplots(figsize=(20,8),dpi=80) # 绘制散点图----s:标量,默认为20;c:散点颜色;marker:散点形状;label:标签 ax.scatter(positive['test1'], positive['test2'], s=30, c='b', marker='o', label='合格') ax.scatter(negative['test1'], negative['test2'], s=30, c='r', marker='v', label='不合格') # 设置图例 ax.legend(prop=my_font) ax.set_xlabel('test1 Score') # 横坐标 ax.set_ylabel('test2 Score') # 纵坐标 # 展示图片 plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-M3yLiMrI-1585660078917)(output_4_0.png)]

建立分类器 sigmoid函数:映射到概率的函数

sigmoid 函数介绍

def sigmoid(z): return 1/(1 + np.exp(-z)) # 画出sigmoid图 nums = np.arange(-10, 10, step=1) # 生成-10到10的向量(含头不含尾),步进为1,即[-10,-9,...,8,9] print(nums) fig, ax = plt.subplots(figsize=(12,4)) ax.plot(nums, sigmoid(nums), 'r') plt.show() [-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fHVMWuf0-1585660078918)(output_7_1.png)]

model 函数: 返回预测结果值 def model(X,theta): return sigmoid(np.matmul(X,theta)) pdData.insert(0,'Ones',1) pdData.head() orig_data = pdData.as_matrix() print(orig_data) cols = orig_data.shape[1] print(cols) X = orig_data[:,0:cols-1] y = orig_data[:,cols-1:cols] theta = np.zeros([cols-1,1]) [[ 1. 30.28671077 43.89499752 0. ] [ 1. 35.84740877 72.90219803 0. ] [ 1. 60.18259939 86.3085521 1. ] [ 1. 79.03273605 75.34437644 1. ] [ 1. 45.08327748 56.31637178 0. ] [ 1. 61.10666454 96.51142588 1. ] [ 1. 75.02474557 46.55401354 1. ] [ 1. 76.0987867 87.42056972 1. ] [ 1. 84.43281996 43.53339331 1. ] [ 1. 95.86155507 38.22527806 0. ] [ 1. 75.01365839 30.60326323 0. ] [ 1. 82.30705337 76.4819633 1. ] [ 1. 69.36458876 97.71869196 1. ] [ 1. 39.53833914 76.03681085 0. ] [ 1. 53.97105215 89.20735014 1. ] [ 1. 69.07014406 52.74046973 1. ] [ 1. 67.94685548 46.67857411 0. ] [ 1. 70.66150955 92.92713789 1. ] [ 1. 76.97878373 47.57596365 1. ] [ 1. 67.37202755 42.83843832 0. ] [ 1. 89.67677575 65.79936593 1. ] [ 1. 50.53478829 48.85581153 0. ] [ 1. 34.21206098 44.2095286 0. ] [ 1. 77.92409145 68.97235999 1. ] [ 1. 62.27101367 69.95445795 1. ] [ 1. 80.19018075 44.82162893 1. ] [ 1. 93.1143888 38.80067034 0. ] [ 1. 61.83020602 50.25610789 0. ] [ 1. 38.7858038 64.99568096 0. ] [ 1. 61.37928945 72.80788731 1. ] [ 1. 85.40451939 57.05198398 1. ] [ 1. 52.10797973 63.12762377 0. ] [ 1. 52.04540477 69.43286012 1. ] [ 1. 40.23689374 71.16774802 0. ] [ 1. 54.63510555 52.21388588 0. ] [ 1. 33.91550011 98.86943574 0. ] [ 1. 64.17698887 80.90806059 1. ] [ 1. 74.78925296 41.57341523 0. ] [ 1. 34.18364003 75.23772034 0. ] [ 1. 83.90239366 56.30804622 1. ] [ 1. 51.54772027 46.85629026 0. ] [ 1. 94.44336777 65.56892161 1. ] [ 1. 82.36875376 40.61825516 0. ] [ 1. 51.04775177 45.82270146 0. ] [ 1. 62.22267576 52.06099195 0. ] [ 1. 77.19303493 70.4582 1. ] [ 1. 97.77159928 86.72782233 1. ] [ 1. 62.0730638 96.76882412 1. ] [ 1. 91.5649745 88.69629255 1. ] [ 1. 79.94481794 74.16311935 1. ] [ 1. 99.27252693 60.999031 1. ] [ 1. 90.54671411 43.39060181 1. ] [ 1. 34.52451385 60.39634246 0. ] [ 1. 50.28649612 49.80453881 0. ] [ 1. 49.58667722 59.80895099 0. ] [ 1. 97.64563396 68.86157272 1. ] [ 1. 32.57720017 95.59854761 0. ] [ 1. 74.24869137 69.82457123 1. ] [ 1. 71.79646206 78.45356225 1. ] [ 1. 75.39561147 85.75993667 1. ] [ 1. 35.28611282 47.02051395 0. ] [ 1. 56.2538175 39.26147251 0. ] [ 1. 30.05882245 49.59297387 0. ] [ 1. 44.66826172 66.45008615 0. ] [ 1. 66.56089447 41.09209808 0. ] [ 1. 40.45755098 97.53518549 1. ] [ 1. 49.07256322 51.88321182 0. ] [ 1. 80.27957401 92.11606081 1. ] [ 1. 66.74671857 60.99139403 1. ] [ 1. 32.72283304 43.30717306 0. ] [ 1. 64.03932042 78.03168802 1. ] [ 1. 72.34649423 96.22759297 1. ] [ 1. 60.45788574 73.0949981 1. ] [ 1. 58.84095622 75.85844831 1. ] [ 1. 99.8278578 72.36925193 1. ] [ 1. 47.26426911 88.475865 1. ] [ 1. 50.4581598 75.80985953 1. ] [ 1. 60.45555629 42.50840944 0. ] [ 1. 82.22666158 42.71987854 0. ] [ 1. 88.91389642 69.8037889 1. ] [ 1. 94.83450672 45.6943068 1. ] [ 1. 67.31925747 66.58935318 1. ] [ 1. 57.23870632 59.51428198 1. ] [ 1. 80.366756 90.9601479 1. ] [ 1. 68.46852179 85.5943071 1. ] [ 1. 42.07545454 78.844786 0. ] [ 1. 75.47770201 90.424539 1. ] [ 1. 78.63542435 96.64742717 1. ] [ 1. 52.34800399 60.76950526 0. ] [ 1. 94.09433113 77.15910509 1. ] [ 1. 90.44855097 87.50879176 1. ] [ 1. 55.48216114 35.57070347 0. ] [ 1. 74.49269242 84.84513685 1. ] [ 1. 89.84580671 45.35828361 1. ] [ 1. 83.48916274 48.3802858 1. ] [ 1. 42.26170081 87.10385094 1. ] [ 1. 99.31500881 68.77540947 1. ] [ 1. 55.34001756 64.93193801 1. ] [ 1. 74.775893 89.5298129 1. ]] 4 X[:5] array([[ 1. , 30.28671077, 43.89499752], [ 1. , 35.84740877, 72.90219803], [ 1. , 60.18259939, 86.3085521 ], [ 1. , 79.03273605, 75.34437644], [ 1. , 45.08327748, 56.31637178]]) y[:5] array([[ 0.], [ 0.], [ 1.], [ 1.], [ 0.]]) theta array([[ 0.], [ 0.], [ 0.]]) X.shape,y.shape,theta.shape ((99, 3), (99, 1), (3, 1)) cost : 根据参数计算损失

目的是求平均损失的最小值

def costFunction(X,y,theta): left = np.multiply(-y,np.log(model(X,theta))) # 同*,元素级乘法 right = np.multiply((1-y),np.log(1-model(X,theta))) return np.sum(left-right)/(len(X)) costFunction(X,y,theta) 0.69314718055994529 gradient : 计算每个参数的梯度方向 def gradient(X,y,theta): grad = np.zeros(theta.shape) error = np.matmul(X.T,(model(X,theta)-y)) grad = error/len(X) return grad gradient(X,y,theta) array([[ -0.10606061], [-12.30538878], [-11.77067239]]) descent : 进行参数更新 # 首先设定三种停止策略【迭代次数、损失值差距、梯度】 STOP_ITER = 0 STOP_COST = 1 STOP_GRAD = 2 def stopCriterion(type,value,threshold): # 设定三种不同的停止策略 if type == STOP_ITER: return value > threshold elif type == STOP_COST: return abs(value[-1]-value[-2]) < threshold elif type == STOP_GRAD: return np.linalg.norm(value) < threshold import time def descent(data,theta,batchSize, stopType, thresh,alpha): #梯度下降求解 init_time = time.time() i = 0 # 迭代次数 k = 0 # batch X,y = shuffleData(data) grad = np.zeros(theta.shape) # 计算梯度 costs = [costFunction(X,y,theta)] # 损失值 while True: grad = gradient(X[k:k+batchSize],y[k:k+batchSize],theta) k += batchSize # 取batch数量个数据 if k >= n: k = 0; X,y = shuffleData(data) # 重新洗牌 theta = theta - alpha*grad # 参数更新 costs.append(costFunction(X,y,theta)) # 计算新的损失 i += 1 if stopType == STOP_ITER: value = i elif stopType == STOP_COST: value = costs elif stopType == STOP_GRAD: value = grad if stopCriterion(stopType,value,thresh): break return theta,i-1,costs,grad,time.time()-init_time import numpy.random # 洗牌 def shuffleData(data): np.random.shuffle(data) cols = data.shape[1] X = data[:, 0:cols-1] y = data[:, cols-1:] return X, y def runExpe(data, theta, batchSize, stopType, thresh, alpha): #import pdb; pdb.set_trace(); theta, iter, costs, grad, dur = descent(data, theta, batchSize, stopType, thresh, alpha) name = "Original" if (data[:,1]>2).sum() > 1 else "Scaled" name += " data - learning rate: {} - ".format(alpha) if batchSize==n: strDescType = "Gradient" elif batchSize==1: strDescType = "Stochastic" else: strDescType = "Mini-batch ({})".format(batchSize) name += strDescType + " descent - Stop: " if stopType == STOP_ITER: strStop = "{} iterations".format(thresh) elif stopType == STOP_COST: strStop = "costs change < {}".format(thresh) else: strStop = "gradient norm < {}".format(thresh) name += strStop print ("***{}\nTheta: {} - Iter: {} - Last cost: {:03.2f} - Duration: {:03.2f}s".format( name, theta, iter, costs[-1], dur)) fig, ax = plt.subplots(figsize=(12,4)) ax.plot(np.arange(len(costs)), costs, 'r') ax.set_xlabel('Iterations') ax.set_ylabel('Cost') ax.set_title(name.upper() + ' - Error vs. Iteration') return theta # 选择的梯度下降方法是基于所有样本的 n=100 runExpe(orig_data, theta, n, STOP_ITER, thresh=5000, alpha=0.000001) ***Original data - learning rate: 1e-06 - Gradient descent - Stop: 5000 iterations Theta: [[-0.00026672] [ 0.00668449] [ 0.00450233]] - Iter: 5000 - Last cost: 0.63 - Duration: 0.95s array([[-0.00026672], [ 0.00668449], [ 0.00450233]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gGTs4Cl3-1585660078918)(output_24_2.png)]

# 根据损失值停止,设定阈值为1E-6 runExpe(orig_data, theta, n, STOP_COST, thresh=0.000001, alpha=0.001) ***Original data - learning rate: 0.001 - Gradient descent - Stop: costs change < 1e-06 Theta: [[-5.09232519] [ 0.04627476] [ 0.04185042]] - Iter: 109155 - Last cost: 0.38 - Duration: 20.83s array([[-5.09232519], [ 0.04627476], [ 0.04185042]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-722REYOT-1585660078919)(output_25_2.png)]

# 对比不同的梯度下降法 runExpe(orig_data, theta, 1, STOP_ITER, thresh=5000, alpha=0.001) ***Original data - learning rate: 0.001 - Stochastic descent - Stop: 5000 iterations Theta: [[ nan] [ nan] [ nan]] - Iter: 5000 - Last cost: nan - Duration: 0.22s array([[ nan], [ nan], [ nan]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cZj8e5zK-1585660078919)(output_26_3.png)]

很不稳定,试试把学习率调小

runExpe(orig_data, theta, 1, STOP_ITER, thresh=15000, alpha=0.000002) ***Original data - learning rate: 2e-06 - Stochastic descent - Stop: 15000 iterations Theta: [[ nan] [ nan] [ nan]] - Iter: 15000 - Last cost: nan - Duration: 1.12s array([[ nan], [ nan], [ nan]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WtCPuYJk-1585660078920)(output_28_3.png)]

速度快,但稳定性差,需要很小的学习率

runExpe(orig_data, theta, 16, STOP_ITER, thresh=15000, alpha=0.001) ***Original data - learning rate: 0.001 - Mini-batch (16) descent - Stop: 15000 iterations Theta: [[-1.03569128] [ 0.02012935] [ 0.00863927]] - Iter: 15000 - Last cost: 0.56 - Duration: 1.16s array([[-1.03569128], [ 0.02012935], [ 0.00863927]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dHN4HDSu-1585660078920)(output_30_2.png)]

浮动仍然比较大,尝试下对数据进行标准化,将数据按其属性(按列进行)减去其均值,然后除以其方差。最后得到的结果是,对每个属性/每列来说所有数据都聚集在0附近,方差值为1

from sklearn import preprocessing as pp scaled_data = orig_data.copy() scaled_data[:, 1:3] = pp.scale(orig_data[:, 1:3]) runExpe(scaled_data, theta, n, STOP_ITER, thresh=5000, alpha=0.001) ***Scaled data - learning rate: 0.001 - Gradient descent - Stop: 5000 iterations Theta: [[ 0.32653044] [ 0.84802277] [ 0.78686591]] - Iter: 5000 - Last cost: 0.38 - Duration: 1.02s array([[ 0.32653044], [ 0.84802277], [ 0.78686591]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pW4W3F13-1585660078921)(output_32_2.png)]

原始数据,只能达到达到0.61,而现在到达0.4以下,所以对数据做预处理是非常重要的

runExpe(scaled_data, theta, n, STOP_GRAD, thresh=0.02, alpha=0.001) ***Scaled data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.02 Theta: [[ 1.10868347] [ 2.57412148] [ 2.41283358]] - Iter: 58762 - Last cost: 0.22 - Duration: 12.75s array([[ 1.10868347], [ 2.57412148], [ 2.41283358]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MOZIj4H6-1585660078922)(output_34_2.png)]

更多的迭代次数会使得损失下降的更多

runExpe(scaled_data, theta, 16, STOP_GRAD, thresh=0.002*2, alpha=0.001) ***Scaled data - learning rate: 0.001 - Mini-batch (16) descent - Stop: gradient norm < 0.004 Theta: [[ 1.07757538] [ 2.51112557] [ 2.34671716]] - Iter: 54137 - Last cost: 0.22 - Duration: 5.31s array([[ 1.07757538], [ 2.51112557], [ 2.34671716]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yMfol8bI-1585660078922)(output_36_2.png)]

#设定阈值 def predict(X, theta): return [1 if x >= 0.5 else 0 for x in model(X, theta)] 精度 scaled_X = scaled_data[:, :3] y = scaled_data[:, 3] predictions = predict(scaled_X, theta) correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(predictions, y)] accuracy = (sum(map(int, correct)) % len(correct)) print ('accuracy = {0}%'.format(accuracy)) accuracy = 60%
作者:曾鸿举



逻辑回归 梯度下降 回归 梯度

需要 登录 后方可回复, 如果你还没有账号请 注册新账号