【小工具】 hdfs路径信息获取程序

Xanthe ·
更新时间:2024-09-21
· 590 次阅读

hdfs路径信息获取程序前言shell源码执行语句生成结果说明 前言

经历了资源降维的打击之后,我终于认清现实,决定静下心来处理眼前的这个看似有规划,实则处处不合理的集群。
入职的第一周的周五,也就是前几天,我们智能数据部门的一台机器就因为磁盘故障而宕机了(据运维说是磁盘坏掉了,导致不能开机),当时100T出头的磁盘空间写到了85T,不得不说存储资源还是蛮紧张的;当我发现节点宕机之后,第一反应就是担心hdfs的故障恢复机制会不会将剩余的磁盘写满,当时也有同事提出暂定将备份机制调至2来规避这个问题,但是所以人忙来忙去直至下班也没有处理掉这个问题(解释一下,我是因为新入职各方面权限的限制而没有处理这个问题)。
果不其然,到周六晚上的时候集群罢工了,罢工的原因就是hdfs进行故障恢复直接将磁盘均写满至90%以上,而超过资源阈值以后节点在yarn上不对外提供服务,进一步导致整个集群的计算资源被锁死,调度全部暂停。
当时是晚十一点多我被叫来解决这个问题,那么处理过类似情况的人都知道,想要解决这个问题无非两种办法,一种是扩容,另一种就是删数据,相较于扩容,删数据这种办法不仅显得简单而且执行高效。但是公司hive仓库里面所维护的表统统为外部表(所以在hive里面操作删表是无用的)于是本人便用蠢笨的命令行(例如:hadoop dfs -du /路径 | awk ‘{s+=$1}END{print s/1024/1024/1024,“G”}’)去寻找可删的、占用空间较大的文件,以至于花费了相当长的时间及精力才处理完这个事件,作为教训,之后赶紧写了这个【路径信息获取脚本】,以便在日后能够帮助我和同事能够快速地了解hdfs文件分布明细,进一步分析相关问题。

shell源码 #!/bin/bash # 获取工作目录路径 $(dirname $0),意为取得当前执行的脚本文件的父目录 workdir=$(cd $(dirname $0); pwd) date=`date +%Y-%m-%d-%H:%M:%S` init(){ # 先删除,以便获取最新生成结果 rm -rf $workdir/hdfs_detail.txt touch $workdir/hdfs_detail.txt chmod 777 $workdir/hdfs_detail.txt rm -rf $workdir/path.txt touch $workdir/path.txt chmod 777 $workdir/path.txt echo "========================================================================================================" >> $workdir/hdfs_detail.txt echo " ___ _ __ _ __ __ " >> $workdir/hdfs_detail.txt echo " | | /\\ \"--------| | | / / | |------\" \"--------\" \\ \\ / / " >> $workdir/hdfs_detail.txt echo " | | / \\ / \"-------\" | | / / | |----\" | | \"----\" | \\ \\ / / " >> $workdir/hdfs_detail.txt echo " | | / /\\ \\ | | | | / / | | | | | | | | \\ \\ / / " >> $workdir/hdfs_detail.txt echo " | | / / \\ \\ | | | | / / | |----/ / | | | | \\ ^ / " >> $workdir/hdfs_detail.txt echo " | | / / \\ \\ | | | || | | |-----\" | | | | | | " >> $workdir/hdfs_detail.txt echo " | | / /------\\ \\ | | | | \\ \\ | | \\ \\ | | | | | | " >> $workdir/hdfs_detail.txt echo " __ | | / /--------\\ \\ | | | | \\ \\ | | \\ \\ | | | | | | " >> $workdir/hdfs_detail.txt echo " \\ \\___/ / / / \\ \\ \\ \"-------\" | | \\ \\ | | \\ \\ | \"----\" | | | TM。" >> $workdir/hdfs_detail.txt echo " \\_____/ /_/ \\_\\ \"--------| |_| \\_\\ |_| \\_\\ \"--------\" |_| 毛利老弟 " >> $workdir/hdfs_detail.txt echo "========================================================================================================" >> $workdir/hdfs_detail.txt echo "-----------------------------------------[HDFS明细探知]-------------------------------------------------" >> $workdir/hdfs_detail.txt echo "[Init Time]:$date" >> $workdir/hdfs_detail.txt echo "--" >> $workdir/hdfs_detail.txt echo "--" >> $workdir/hdfs_detail.txt # sed -i '/1111/a\2222' a.txt 在a.txt中找到所有符合1111得 后面加上2222 } hdfs_collect(){ echo " ----[ 汇总 ]---- " >> $workdir/hdfs_detail.txt echo "" >> $workdir/hdfs_detail.txt echo "| 大小 | 占用 | 当前目录 |" >> $workdir/hdfs_detail.txt hadoop dfs -ls / | awk '{print $8}' >> $workdir/path.txt hadoop dfs -du / | awk '{S+=$1}{M+=$2}END{printf "%-12s%-6s%-12s%-6s%-10s\n", S/1024/1024/1024/1024,"(T)",M/1024/1024/1024/1024,"(T)","根目录"}' >> $workdir/hdfs_detail.txt hadoop dfs -du / | awk '{printf "%-12s%-6s%-12s%-6s%-10s\n", $1/1024/1024/1024,"(G)",$2/1024/1024/1024,"(G)",$3}' >> $workdir/hdfs_detail.txt echo "" >> $workdir/hdfs_detail.txt echo "" >> $workdir/hdfs_detail.txt } hdfs_detail(){ echo " ----[ 明细 ]---- " >> $workdir/hdfs_detail.txt echo "" >> $workdir/hdfs_detail.txt # 一级目录 cat $workdir/path.txt | while read line do # 判空 if [ ${#line} != 0 ] && [ $line != "/auto_cron_flag" ] && [ $line != "/auto_cron_logs" ] && [ $line != "/auto_cron_script" ]; then # 根目录下目录的大小 hadoop dfs -du $line | awk '{S+=$1}{M+=$2}END{printf "%-0s%-12s%-6s%-12s%-6s%-10s\n","-- ", S/1024/1024/1024,"(G)",M/1024/1024/1024,"(G)","'$line'"}' >> $workdir/hdfs_detail.txt rm -rf $workdir/path1.txt touch $workdir/path1.txt chmod 777 $workdir/path1.txt hadoop fs -ls $line | awk '{print $8}' >> $workdir/path1.txt # 二级目录 cat $workdir/path1.txt | while read line1 do # 判空 if [ ${#line1} != 0 ]; then hadoop dfs -du $line1 | awk '{S+=$1}{M+=$2}END{printf "%-0s%-12s%-6s%-12s%-6s%-10s\n"," -- ", S/1024/1024/1024,"(G)",M/1024/1024/1024,"(G)","'$line1'"}' >> $workdir/hdfs_detail.txt rm -rf $workdir/path2.txt touch $workdir/path2.txt chmod 777 $workdir/path2.txt hadoop fs -ls $line1 | awk '{print $8}' >> $workdir/path2.txt # 三级目录 cat $workdir/path2.txt | while read line2 do # 判空 if [ ${#line2} != 0 ]; then hadoop dfs -du $line2 | awk '{S+=$1}{M+=$2}END{printf "%-0s%-12s%-6s%-12s%-6s%-10s\n"," -- ", S/1024/1024/1024,"(G)",M/1024/1024/1024,"(G)","'$line2'"}' >> $workdir/hdfs_detail.txt rm -rf $workdir/path3.txt touch $workdir/path3.txt chmod 777 $workdir/path3.txt hadoop fs -ls $line2 | awk '{print $8}' >> $workdir/path3.txt # 四级目录 cat $workdir/path3.txt | while read line3 do # 判空 if [ ${#line3} != 0 ]; then hadoop dfs -du $line3 | awk '{S+=$1}{M+=$2}END{printf "%-0s%-12s%-6s%-12s%-6s%-10s\n"," -- ", S/1024/1024/1024,"(G)",M/1024/1024/1024,"(G)","'$line3'"}' >> $workdir/hdfs_detail.txt fi done fi done fi done echo "" >> $workdir/hdfs_detail.txt fi done rm -rf $workdir/path.txt rm -rf $workdir/path1.txt rm -rf $workdir/path2.txt rm -rf $workdir/path3.txt } init hdfs_collect hdfs_detail echo "SUCCESS" 执行语句

将源代码复制、粘贴至hdfs_detail.sh ,然后执行:

sh hdfs_detail.sh 生成结果

等待执行完成以后(执行成功打印SUCCESS,当然中途Ctrl+C中断程序也是会打印SUCCESS),在同等目录下应该生成一个名为hdfs_detail.txt的文件。
生成效果如:

======================================================================================================== ___ _ __ _ __ __ | | /\ "--------| | | / / | |------" "--------" \ \ / / | | / \ / "-------" | | / / | |----" | | "----" | \ \ / / | | / /\ \ | | | | / / | | | | | | | | \ \ / / | | / / \ \ | | | | / / | |----/ / | | | | \ ^ / | | / / \ \ | | | || | | |-----" | | | | | | | | / /------\ \ | | | | \ \ | | \ \ | | | | | | __ | | / /--------\ \ | | | | \ \ | | \ \ | | | | | | \ \___/ / / / \ \ \ "-------" | | \ \ | | \ \ | "----" | | | TM。 \_____/ /_/ \_\ "--------| |_| \_\ |_| \_\ "--------" |_| 毛利老弟 ======================================================================================================== -----------------------------------------[HDFS明细探知]------------------------------------------------- [Init Time]:2019-11-06-16:10:25 -- -- ----[ 汇总 ]---- | 大小 | 占用 | 当前目录 | 22.5673 (T) 67.8291 (T) 根目录 0 (G) 0 (G) /******* 0.000814012 (G) 0.00244204 (G) /******* 13.9856 (G) 41.9567 (G) /******* 7.47824 (G) 22.4347 (G) /******* 114.452 (G) 343.355 (G) /******* 0 (G) 0 (G) /******* 0 (G) 0 (G) /******* 20357.2 (G) 61137.6 (G) /******* 0.898082 (G) 3.06924 (G) /******* 0 (G) 0 (G) /******* 0.851672 (G) 2.55501 (G) /******* 2614.86 (G) 7907.95 (G) /******* 1.67638e-08 (G) 0.375 (G) /******* ----[ 明细 ]---- -- 0 (G) 0 (G) /******* -- 114.452 (G) 343.355 (G) /******* -- 114.452 (G) 343.355 (G) /*******/Docker -- 79.268 (G) 237.804 (G) /*******/Docker/serv172_20_23_22 -- 0.0769568 (G) 0.23087 (G) /*******/Docker/serv172_20_23_22/docker_allinone_serv172_20_23_22.tar.gz.20190228201901 -- 26.5116 (G) 79.5347 (G) /*******/Docker/serv172_20_23_22/docker_allinone_serv172_20_23_22.tar.gz.20191104093001 -- 26.3217 (G) 78.965 (G) /*******/Docker/serv172_20_23_22/docker_allinone_serv172_20_23_22.tar.gz.20191105093001 -- 26.3578 (G) 79.0735 (G) /*******/Docker/serv172_20_23_22/docker_allinone_serv172_20_23_22.tar.gz.20191106093001 -- 35.1836 (G) 105.551 (G) /*******/Docker/serv172_20_2_24 -- 11.2747 (G) 33.824 (G) /*******/Docker/serv172_20_2_24/docker_allinone_serv172_20_2_24.tar.gz.20191104093001 -- 11.2812 (G) 33.8437 (G) /*******/Docker/serv172_20_2_24/docker_allinone_serv172_20_2_24.tar.gz.20191105093001 -- 11.7929 (G) 35.3786 (G) /*******/Docker/serv172_20_2_24/docker_allinone_serv172_20_2_24.tar.gz.20191106093001 -- 0.0625 (G) 0.1875 (G) /*******/Docker/serv172_20_2_24/docker_tomcat.tar.gz.20190228201901 -- 0.257463 (G) 0.772389 (G) /*******/Docker/serv172_20_2_24/docker_tomcat.tar.gz.20191104093001 -- 0.257463 (G) 0.772389 (G) /*******/Docker/serv172_20_2_24/docker_tomcat.tar.gz.20191105093001 -- 0.257463 (G) 0.772389 (G) /*******/Docker/serv172_20_2_24/docker_tomcat.tar.gz.20191106093001 -- 0 (G) 0 (G) /******* -- 0 (G) 0 (G) /******* -- 0 (G) 0 (G) /*******/cmtest 说明

只往根目录下查找了四层明细(可查到分区表的明细),剩下的就没有往下写了,有需要的朋友可根据自身的需求继续往下写几层,只要注意性能问题就好,实施过程如有问题可留言交流。


作者:Jack_Roy



程序 工具 hdfs

需要 登录 后方可回复, 如果你还没有账号请 注册新账号
相关文章