与前两篇内置建模方式的不同点:
预估器建模:初始化模型:xgb_classifier=xgb.XGBClassifier(参数)
拟合模型:xgb_classifier.fit(x,y)
使用模型预测:xgb_classifier.predict(test_x)
内置方式建模:参数设定:
param = {‘max_depth’:5, ‘eta’:0.1, ‘silent’:1, ‘subsample’:0.7, ‘colsample_bytree’:0.7, ‘objective’:‘binary:logistic’ }
设定watchlist用于查看模型状态:
watchlist = [(xgtest,‘eval’), (xgtrain,‘train’)]
num_round = 10
bst = xgb.train(param, xgtrain, num_round, watchlist)
使用模型预测:preds = bst.predict(xgtest)
内置建模方式的优点:1.自定义损失函数 【见下节】
#预估器建模方式(sklearn形态)
#!/usr/bin/python
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import pickle
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
# 基本例子,从csv文件中读取数据,做二分类
# 用pandas读入数据
data = pd.read_csv('data/Pima-Indians-Diabetes.csv')
#做数据切分
train,test = train_test_split(data)
#去除特征X和目标Y的部分
feature_columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age']
target_column = 'Outcome'
train_X = train[feature_columns].values
train_y = train[target_column].values
test_X = test[feature_columns].values
test_y = test[target_column].values
#初始化模型
xgb_classifier = xgb.XGBClassifier(n_estimators=20,\
max_depth=4, \
learning_rate=0.1, \
subsample=0.7, \
colsample_bytree=0.7)
#拟合模型
xgb_classifier.fit(train_X,train_y)
#使用模型预测
preds = xgb_classifier.predict(test_X)
#判断准确率
print('错误率未%f' %((preds!=test_y).sum()/float(test_y.shape[0])))
#存储模型
joblib.dump(xgb_classifier,'data/0003.model')