Hive函数row_number实现

来源:互联网 发布:行程路线图制作软件 编辑:程序博客网 时间:2024/06/10 11:37

需求:

查询一批用户最后三次登陆时间,ip数据

理解需求是实现分组取前n个值,实现方式是先按照uid字段升序或倒序,时间字段倒序排序数据集合,然后遍历数据集合,用row_number函数遍历uid字段,相同则row_number值+1,取row_number<=3

row_number实现

import org.apache.hadoop.hive.ql.exec.UDF;public class RowNumber extends UDF {    private static int MAX_VALUE = 50;    private static String comparedColumn[] = new String[MAX_VALUE];    private static int rowNum = 1;    public int evaluate(Object... args) {        String columnValue[] = new String[args.length];        for (int i = 0; i < args.length; i++){            columnValue[i] = args[i].toString();        }        if (rowNum == 1) {            for (int i = 0; i < columnValue.length; i++)                comparedColumn[i] = columnValue[i];        }        for (int i = 0; i < columnValue.length; i++) {            if (!comparedColumn[i].equals(columnValue[i])) {                for (int j = 0; j < columnValue.length; j++) {                    comparedColumn[j] = columnValue[j];                }                rowNum = 1;                return rowNum++;            }        }        return rowNum++;    }}

使用

add jar /xx/xx/user_function/mr-function-1.0.0.jar;create temporary function row_number as 'com.xxx.xxx.RowNumber';SELECT c.user_id,c.time,c.ip FROM(SELECT b.* FROM uids_20150831 a LEFT JOIN (SELECT * FROM login WHERE date >=20150101) b ON a.user_id = b.user_iddistribute BY user_id sort BY user_id ASC,time DESC ) c WHERE row_number(c.user_id)<=3

参考:
http://blog.csdn.net/liuj2511981/article/details/8741276
http://blog.csdn.net/yfkiss/article/details/7885262
http://blog.csdn.net/yfkiss/article/details/7885262

0 0
原创粉丝点击