xgboost学习笔记

Esta ·

更新时间:2024-11-13

· 506 次阅读

1. xgboost参数说明

最近在打kaggle，这里做一下xgboost笔记，这里不讲原理，只讲xgboost库的使用，以及一些参数调节。
传送门： xgboost参数说明

2. xgboost实战演练

用泰坦尼克号这个经典例子来说：

import numpy as np
import pandas as pd
from xgboost import XGBClassifier
from sklearn.cross_validation import KFold
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
train = pd.read_csv("datasets/titanic_train.csv")
test = pd.read_csv("datasets/titanic_test.csv")
#数据清洗
def clean(titanic):
    titanic["age"] = titanic["Age"].fillna(titanic["Age"].median())
    titanic["child"] = titanic["Age"].apply(lambda x:1 if x<15 else 0)#是否是小孩
    titanic["sex"] = titanic["Sex"].apply(lambda x:1 if x=="male" else 0)
    titanic["Embarked"] = titanic["Embarked"].fillna("S")
    def embark(Embark):
        if Embark == "S":
            return 1
        elif Embark == "C":
            return 2
        else:
            return 3
    titanic["embarked"] = titanic["Embarked"].apply(embark)
    titanic["family"] = titanic["SibSp"]+titanic["Parch"]+1
    titanic["cabin"] = titanic["Cabin"].apply(lambda x: 0 if x=="N" else 1)
    def getname(Name):
        if "Mrs" in str(Name):
            return 2
        elif "Mr" in str(Name):
            return 1
        else:
            return 0
    titanic["name"] = titanic["Name"].apply(getname)
    titanic["fare"] = titanic["Fare"].fillna(titanic["Fare"].median())
    return titanic
train_data = clean(train)
test_data = clean(test)
features = ["Pclass","sex","child","family","fare","embarked","cabin"]
clf = XGBClassifier(learning_rate=0.1,max_depth=2,silent=True,objective='binary:logistic')
param_test = {
    'n_estimators': [30,32,34,36,38,40,42,44,46,48,50],
    'max_depth': [2,3,4,5,6,7]
}#大杀器XGBoost
grid_search = GridSearchCV(estimator=clf , param_grid=param_test , scoring='accuracy',cv=5)
grid_search.fit(train[features],train["Survived"])
grid_search.grid_scores_,grid_search.best_params_,grid_search.best_score_
predict_data = grid_search.predict(test[features])

在xgboost中使用交叉验证
为了防止过拟合，需要交叉验证
import xgboost as xgb from Utils import pathUtils import pandas as pd from sklearn.model_selection import train_test_split from Process.genBasicData import genData from Utils import feaUtils train_data = genData(pathUtils.train_path) test_data = genData(pathUtils.test_path) param = {'max_depth': 3, 'learning_rate ': 0.01, 'silent': 1, 'objective': 'binary:logistic', "eval_metric":"auc", "scale_pos_weight":10, "subsample":0.8, "min_child_weight":1, } features = [i for i in list(train_data.columns) if i not in ["ID","y"]] dtrain = xgb.DMatrix(train_data[features],label=train_data['y']) dtest = xgb.DMatrix(test_data[features]) cv_res= xgb.cv(param,dtrain,num_boost_round=2000,early_stopping_rounds=30,nfold=10, metrics='auc',show_stdv=True) print(cv_res) #cv_res.shape[0]为最佳迭代次数 bst = xgb.train(param,dtrain,num_boost_round=cv_res.shape[0]) y_pre = bst.predict(dtest) res = pd.concat([test_data[["ID"]],pd.DataFrame(y_pre,columns=["pred"])],axis=1) res.to_csv(pathUtils.predict_root_path+"cv_res.csv",index=False)
作者：攻城猿bilibili

xgboost

1024 个赞

编辑举报

需要登录后方可回复, 如果你还没有账号请注册新账号

相关文章

PHP命名空间定义与用法实例分析

Petunia 2020-10-01

978

xgboost.libpath.XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you i

Ivy 2020-06-01

765

XGBoost、LightGBM、Catboost对比

Ophira 2021-01-22

867

XGBoost 算法原理

Fronde 2021-02-23

999

从GB到GBDT到XGBoost

Lacie 2020-06-17

921

XGBoost（extreme gradient boosting）的使用例子

Kamaria 2021-02-10

782

真假美猴王！基于XGBoost的『金融时序』 VS 『合成时序』

Fawn 2020-09-01

647

XGBoost多分类预测

Vanna 2020-02-16

950

xgboost学习笔记

Esta 2021-06-21

506

Xgboost内置建模方式详解一

Olinda 2021-06-09

883

XGBoost——机器学习（理论+图解+安装方法+python代码）

Adonia 2020-08-10

930

Xgboost预估器建模方式使用方法

Madeleine 2021-01-17

805

Xgboost与Gbdt的区别

Valarie 2020-07-28

692

RF、GBDT、XGBoost

Isleta 2020-04-25

895

Xgboost使用方法详解二

Izellah 2020-08-20

809

Xgboost使用方法详解一

Emma 2020-09-22

781

清华镜像源安装 NGboost XGboost Catboost

Kenisha 2021-01-23

535

机器学习算法基础七 XGBoost

Harmony 2021-01-11

635

Python安装和使用XGBoost

Winona 2021-01-17

654

[报错解决]安装xgboost报错python setup.py egg_info Check the logs for full command output.

Crystal 2021-06-29

772

我要提问

致谢

帮助他人，成就自己。

人生最大成功就是伸出热情而温暖的双手，尽自己所能去帮助身边的每一个人，只要无私的奉献，就会收获到美好的生活。

1024问感谢每一位朋友的帮助和支持。
软件开发网提供编程的基础软件技术培训教程,软件开发编程实例讲解Go,Node,HTML,CSS,Javascript,Python,Java,Ruby,C,PHP,MySQL等软件开发编程语言以及数据开发的基础知识，也提供大量的软件开发在线实例、从入门到精通就在1024问。

育儿网微养生全球行美食街育儿菜谱大全海南旅游女性养狗百科星座