Hive 蚂蚁森林面试题训练

Bea ·
更新时间:2024-09-21
· 609 次阅读

背景说明:
以下表记录了用户每天的蚂蚁森林低碳生活领取的记录流水。
table_name:user_carbon
user_id data_dt low_carbon
用户 日期 减少碳排放(g)

蚂蚁森林植物换购表,用于记录申领环保植物所需要减少的碳排放量
table_name: plant_carbon
plant_id plant_name low_carbon
植物编号 植物名 换购植物所需要的碳

----题目
1.蚂蚁森林植物申领统计
问题:假设2017年1月1日开始记录低碳数据(user_low_carbon),假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,
剩余的能量全部用来领取“p002-沙柳” 。
统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息;以及他比后一名多领了几颗沙柳。
得到的统计结果如下表样式:
user_id plant_count less_count(比后一名多领了几颗沙柳)
u_101 1000 100
u_088 900 400
u_103 500 …

1,将日期修改成date日期
select
user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon;t1

2,选择使用者在2017-10-1以前,前11的总排放量,并排序.
select
user_id,sum(low_carbon) sumcarbon
from
(select
user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon)t1
where
dat<=‘2017-10-1’
group by
user_id
order by
sumcarbon desc
limit 11;

3,在2的基础上进行排序
select
user_id,sumcarbon,rank()over(order by sumcarbon desc) rown
from
(select
user_id,sum(low_carbon) sumcarbon
from
(select
user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon)t1
where
dat<=‘2017-10-1’
group by
user_id
order by
sumcarbon desc) t2
limit 11;t3

4,计算每人领取树的数量

select
user_id,floor((sumcarbon-215)/19) treenum
from
(select
user_id,sumcarbon,rank()over(order by sumcarbon desc) rown
from
(select
user_id,sum(low_carbon) sumcarbon
from
(select
user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon)t1
where
dat<=‘2017-10-1’
group by
user_id
order by
sumcarbon desc) t2
limit 11) t3
order by treenum desc;

5,计算辅助列
select
user_id, treenum,lead(treenum,1,‘9999’) over(order by treenum desc) treenum2
from
(select
user_id,floor((sumcarbon-215)/19) treenum
from
(select
user_id,sumcarbon,rank()over(order by sumcarbon desc) rown
from
(select
user_id,sum(low_carbon) sumcarbon
from
(select
user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon)t1
where
dat<=‘2017-10-1’
group by
user_id
order by
sumcarbon desc) t2
limit 11) t3
order by treenum desc) t4
order by treenum desc;t5

6,计算索要结果
select
user_id, treenum,treenum-treenum2 diff
from
(select
user_id, treenum,lead(treenum,1,‘9999’) over(order by treenum desc) treenum2
from
(select
user_id,floor((sumcarbon-215)/19) treenum
from
(select
user_id,sumcarbon,rank()over(order by sumcarbon desc) rown
from
(select
user_id,sum(low_carbon) sumcarbon
from
(select
user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon)t1
where
dat<=‘2017-10-1’
group by
user_id
order by
sumcarbon desc) t2
limit 11) t3
order by treenum desc) t4
order by treenum desc) t6
limit 10;

2、蚂蚁森林低碳用户排名分析
问题:查询user_low_carbon表中每日流水记录,条件为:
用户在2017年,连续三天(或以上)的天数里,
每天减少碳排放(low_carbon)都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_low_carbon表中的记录流水。
例如用户u_002符合条件的记录如下,因为2017/1/2~2017/1/5连续四天的碳排放量之和都大于等于100g:
seq(key) user_id data_dt low_carbon
xxxxx10 u_002 2017/1/2 150
xxxxx11 u_002 2017/1/2 70
xxxxx12 u_002 2017/1/3 30
xxxxx13 u_002 2017/1/3 80
xxxxx14 u_002 2017/1/4 150
xxxxx14 u_002 2017/1/5 101
备注:统计方法不限于sql、procedure、python,java等

1,求出每个用户每日碳排放量.并根据日期你进行升序排列,利用上一题的t1;
select
user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon;t1

select
user_id,dat,sum(low_carbon) sumcarbon
from
(select user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon )t1
group by
user_id,dat
having
sumcarbon>=100;t2

2,求出每个用户有几个日期达到要求
select
user_id ,dat,count(user_id) over(partition by user_id) cou
from
(select
user_id,dat,sum(low_carbon) sumcarbon
from
(select user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon )t1
group by
user_id,dat
having
sumcarbon>=100)t2;

select
user_id,dat,cou
from
(select
user_id ,dat,count(user_id) over(partition by user_id) cou
from
(select
user_id,dat,sum(low_carbon) sumcarbon
from
(select user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon )t1
group by
user_id,dat
having
sumcarbon>=100)t2)t3
where cou >=3;

3, 计算出每一行的前两行数值和后两行数值

select
user_id,dat,lag(dat,2) over(partition by user_id) a,lag(dat,1) over(partition by user_id) b,lead(dat,1) over(partition by user_id) c,lead(dat,2) over(partition by user_id) d
from
(select
user_id,dat,cou
from
(select
user_id ,dat,count(user_id) over(partition by user_id) cou
from
(select
user_id,dat,sum(low_carbon) sumcarbon
from
(select user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon )t1
group by
user_id,dat
having
sumcarbon>=100)t2)t3
where cou >=3)
t4;

4,得出结果
select
user_id,dat,datediff(dat,a) ,datediff(dat,b),datediff(dat,c) ,datediff(dat,d)
from
(select
user_id,dat,lag(dat,2) over(partition by user_id) a,lag(dat,1) over(partition by user_id) b,lead(dat,1) over(partition by user_id) c,lead(dat,2) over(partition by user_id) d
from
(select
user_id,dat,cou
from
(select
user_id ,dat,count(user_id) over(partition by user_id) cou
from
(select
user_id,dat,sum(low_carbon) sumcarbon
from
(select user_id,date_format(regexp_replace(data_dt,’/’,’-’),‘YYYY-MM-DD’) dat,low_carbon
from
user_carbon )t1
group by
user_id,dat
having
sumcarbon>=100)t2)t3
where cou >=3)
t4)
t5
where
(datediff(dat,a)=2 and datediff(dat,b)=1)
or
(datediff(dat,b)=1 and datediff(dat,c)=-1)
or
(datediff(dat,c)=-1 and datediff(dat,d)=-2)
;


作者:阿拉祖



面试题 蚂蚁森林 面试 训练 hive

需要 登录 后方可回复, 如果你还没有账号请 注册新账号