Xgboost内置建模方式详解一

Olinda ·
更新时间:2024-11-11
· 883 次阅读

内置建模方式的特点
1.交叉验证
2.添加预处理的交叉验证
3.自定义损失函数与评估准则
4.只用前n棵树预测

#内置建模方式:交叉验证与高级功能 #添加预处理的交叉验证,自定义损失函数和评估准则, #!/usr/bin/python import warnings warnings.filterwarnings("ignore") import numpy as np import pandas as pd import pickle import xgboost as xgb from sklearn.model_selection import train_test_split from sklearn.externals import joblib dtrain = xgb.DMatrix('./data/agaricus.txt.train') dtest = xgb.DMatrix('./data/agaricus.txt.test') # 基本例子,从csv文件中读取数据,做二分类 # 用pandas读入数据 data = pd.read_csv('./data/Pima-Indians-Diabetes.csv') # 做数据切分 train, test = train_test_split(data) # 转换成Dmatrix格式 feature_columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age'] target_column = 'Outcome' # 取出numpy array去初始化DMatrix对象 xgtrain = xgb.DMatrix(train[feature_columns].values, train[target_column].values) xgtest = xgb.DMatrix(test[feature_columns].values, test[target_column].values) #参数设定 param = {'max_depth':5, 'eta':0.1, 'silent':1, 'subsample':0.7, 'colsample_bytree':0.7, 'objective':'binary:logistic' } # 设定watchlist用于查看模型状态 watchlist = [(xgtest,'eval'), (xgtrain,'train')] num_round = 10 bst = xgb.train(param, xgtrain, num_round, watchlist) print(xgb.cv(param, dtrain, num_round, nfold=5,metrics={'error'}, seed = 0)) #添加预处理的交叉验证 #计算正负样本比,调整样本权重 def fpreproc(dtrain,dtest,param): label = dtrain.get_label() ratio = float(np.sum(label == 0)) / np.sum(label == 1) param['scale_pos_weight']=ratio return (dtrain,dtest,param) # 先做预处理,计算样本权重,再做交叉验证 print(xgb.cv(param, dtrain, num_round, nfold=5, metrics={'auc'}, seed = 0, fpreproc = fpreproc)) #自定义损失函数与评估准则 print("'使用自定义损失函数进行交叉验证") #自定义损失函数,需要提供损失函数的一阶导和二阶导 def logregobj(preds,dtrain): labels = dtrain.get_label() preds = 1.0 / (1.0 + np.exp(-preds)) grad = preds - labels hess = preds * (1.0 - preds) return grad, hess # 自定义评估准则,评估预估值和标准答案之间的差距 def evalerror(preds, dtrain): labels = dtrain.get_label() return 'error', float(sum(labels != (preds > 0.0))) / len(labels) watchlist = [(dtest, 'eval'), (dtrain, 'train')] param = {'max_depth': 3, 'eta': 0.1, 'silent': 1} num_round = 5 # 自定义损失函数训练 bst = xgb.train(param, dtrain, num_round, watchlist, logregobj, evalerror) # 交叉验证 xgb.cv(param, dtrain, num_round, nfold=5, seed=0, obj=logregobj, feval=evalerror)
作者:小菜鸡一号



xgboost

需要 登录 后方可回复, 如果你还没有账号请 注册新账号