hive 使用方法-建表及导入导出数据(一)

来源：互联网发布：家常菜教学视频软件编辑：程序博客网时间：2024/06/10 04:43

1. 学习资料

hive 官网：不区分版本，所有信息都在一个文件汇总，会标记适合哪些版本

ddl https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

一、ddl 及 dml

1) Create/Drop/Alter/Use Database

create database if not exists test_db;show databases;CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name[stored as file_format] 注：file_format:  文件存储的格式，默认为TextFile，sequencefile(hdfs底层的二进制文件) eg1: 创建一个student表，将本地文件student加载到表中。  create table if not exists student(   num int,   name string  )  row format delimited fields terminated by '\t'  stored  as textfile;  load data local inpath '/opt/tools/student' into table student; vi /opt/tools/student       1       xiaohong       2       xiaowang加载日志信息：        Loading data to table default.student        Table default.student stats: [numFiles=1, numRows=0, totalSize=22, rawDataSize=0]        OK        Time taken: 0.971 seconds

2) Truncate Table 删除表

TRUNCATE TABLE table_name [PARTITION partition_spec];  //清除数据，表结构还在。drop table ； //表文件和结构都删除查看mysql元数据是否被删除:查看 metastore表中的 TBLS

3) 表创建的方式 3种

* 子查询->将子查询出来的结果赋予给新的表中。    create table if not exists student2 as select * from student;* like --> 只复制结构，没有数据。     create table student_3 like student2;* 普通建表     create table student1(        id int,        name string    )

4）表的类型

内部表-管理表（删除元数据和表的文件夹）
外部表(删除元数据，不会删除表文件夹;)
- 记录数据的存放路径,不移动数据。
- 保证数据文件的安全性,方便共享数据。
  create external table if not exists (
  id int ,
  name string
  )

注意：先创建内部表，然后再建外部表；
修改基本不会涉及，update只能改管理表，外部表不能进行update;

    i）管理表        create table dept(         deptno int,        dname string,        loc string        )        row format delimited fields terminated by '\t';        load data local inpath '/opt/tools/dept.txt' into table dept;      create table emp(        empno int,        ename string,        job string,        mgr int,        hiredate string,        sal double,        comm double,        deptno int        )        row format delimited fields terminated by '\t';        load data local inpath '/opt/tools/emp.txt' into table emp;将本地的文件 copy到hdfs：Copying data from file:/opt/tools/dept.txtCopying file: file:/opt/tools/dept.txtLoading data to table default.deptTable default.dept stats: [numFiles=1, numRows=0, totalSize=79, rawDataSize=0]OKii）外部表  多个表操作同一个文件 指定数据文件目录用：（ location ）create external table emp_ext(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)row format delimited fields terminated by '\t';load data local inpath '/opt/tools/emp.txt' into table emp_ext;(load data inpath 把原来的数据移走)(load data local inpath 把本地数据拷贝到hdfs的表目录下）

5）分区表

分区表

先过滤后查询,提高运行效率（类比mapreduce中的分区，指定多个reduce，可提高运行效率）将数据进行分区，归类，在创建表时定义一个逻辑分区字段，查询表时不用全盘扫描所有数据，只需扫描指定分区中的数据即可。提高查询性能，普通表需要先查询然后过滤，分区表是先过滤，后查询。

普通表：client提交 sql -> 读取 metastore -> 找到hdfs中存储的table -> 加载数据文件 ->过滤

分区表：client提交 sql -> 读取 metastore -> 找到hdfs中存储的table -> 过滤 -> 加载数据文件

eg:create table emp_part(    empno int,    ename string,    job string,    mgr int,    hiredate string,    sal double,    comm double,    deptno int    )PARTITIONED BY(date string) --分区字段名称不能和字段重复row format delimited fields terminated by '\t';load data local inpath  "/opt/tools/emp.txt" into table emp_part partition(date="20171104"); -- 必须指明分区字段load data local inpath "/opt/tools/emp.txt" into table emp_part partition (date="20171105") ;select * from emp_part where date ="20171104";    hive (default)>  load data local inpath  "/opt/tools/emp.txt" into table emp_part partition(date="20171104");    Loading data to table default.emp_part partition (date=20171104)    Partition default.emp_part{date=20171104} stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0]    OKcreate external table emp_epart(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)partitioned by (data string , hour string )row format delimited fields terminated by '\t';

6) 加载数据

override 覆盖原有数据（比load 增加了一个删除操作）
load data (local) inpath ‘/tmp/’ into table tale_name ;
load 不覆盖，直接增加文件。
【Noted】 location ：在建表时指定数据文件存放位置，并不是加载数据的语句；用于设置数据存放路径，多用于外部表，多个外部表可共用同一个数据。

7)导入数据（6种）

load data local inpath “/opt/emp.txt” into table emp; –本地(将文件直接移动到表目录下)
load data inpath –hdfs （将文件直接移动到表目录下)
load data local inpath “/opt/emp.txt” overwrite into table emp;
子查询 create table emp_as as select * from emp;
适合于保存结果集，或者保存中间结果集。
insert into table emp_like select * from emp;
首先创建一个 emp_insert:
insert into table emp_like emp;
location

注：

- insert into 语句并没有执行reduce任务,有group by的才有reduce；- load  local 和 hdfs 根据数据量的不同来选择；- 子查询插入数据和insert 插入数据后存储的数据文件一样，方式相同。- load 是最普遍的操作。

8) 导出方式

insert (insert overwrite [local] directory ‘path’ select sql;)
eg:
- 数据文件导出到本地：
  insert overwrite local directory ‘/opt/’ select * from emp; –默认分隔符 \001 (控制符) –将表中的数据保存到本地，输出文件目录（不用指定到文件）可提前存在。先删了，然后再创建。
- 指定分隔符
  insert overwrite local directory ‘/opt/nihao01’ row format delimited fields terminated by ‘\t’ select * from emp; –map 任务
- 将数据保存到hdfs
  insert overwrite directory ‘/’ row format delimited fields terminated by ‘\t’ select * from emp; – 会报错，不支持这种形式？？
  insert overwrite directory ‘/opt/nihao01’ select * from emp; – move数据
通过hdfs 的shell ： dfs -get 命令来获取数据文件。
通过hive -e -f 参数将输出的结果重定向到本地文件中。
sqoop （做hive和关系型数据库交换数据的桥梁，进行导入导出）

注意：hive支持导入和导出

export table tb_name to 'hdfs_path'import table tb_name from 'hdfs_path'

9) 导出数据的location 示例

location : 用于指定表数据文件所存放的目录，必须是目录,不能具体到文件,创建表后只需将数据文件放到指定目录，select table 就可以查出数据；（location多用于external 表，不需要移动数据文件的位置，只要指定数据位置即可对表中数据进行操作。）

1、数据文件字hdfs上。（原来就有的数据文件） create external table emp_ext(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)row format delimited fields terminated by '\t'location "/nihao";2、数据文件在本地： create external table emp_ext1(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)row format delimited fields terminated by '\t'location "file:///opt/datas";create table nihao21(    id int,    name string)row format delimited fields terminated by '\t'location '/nihao/';

阅读全文

0 0