HIVE入门_1
来源:互联网 发布:郑州软件定制 编辑:程序博客网 时间:2024/05/19 22:44
RUNNING HIVE
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse$ export HIVE_HOME=<hive-install-dir>#Running Hive CLI$ $HIVE_HOME/bin/hive#Running HiveServer2 and Beeline$ $HIVE_HOME/bin/hiveserver2$ $HIVE_HOME/bin/beeline -u jdbc:hive2://$HS2_HOST:$HS2_PORT$ $HIVE_HOME/bin/beeline -u jdbc:hive2://
DDL
Data Definition Language
#Creating Hive Tableshive> CREATE TABLE pokes (foo INT, bar STRING);hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);#Browsing through Tableshive> SHOW TABLES;hive> SHOW TABLES '.*s'; # follows Java regular expressions.hive> DESCRIBE invites;#Altering and Dropping Tableshive> ALTER TABLE events RENAME TO 3koobecaf;hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');hive> ALTER TABLE invites REPLACE COLUMNS (foo INT, bar STRING, baz INT COMMENT 'baz replaces new_col2');hive> ALTER TABLE invites REPLACE COLUMNS (foo INT COMMENT 'only keep the first column');hive> DROP TABLE pokes;
Metadata Store
Metadata is in an embedded Derby database whose disk storage location is determined by the Hive configuration variable named javax.jdo.option.ConnectionURL. By default this location is ./metastore_db (see conf/hive-default.xml).
DML
Data Manipulation Language
#loadinghive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');hive> LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-08');#The above command will load data from an HDFS file/directory to the table.#Note that loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous.hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
SQL
Examples are shown:
Some example queries are shown below. They are available in build/dist/examples/queries.
More are available in the Hive sources at ql/src/test/queries/positive.
#SELECTS and FILTERShive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='2008-08-15';hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100;hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a;hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles a;hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*) FROM invites a WHERE a.ds='2008-08-15';hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a;#GROUP BYhive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(*) WHERE a.foo > 0 GROUP BY a.bar;hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(*) FROM invites a WHERE a.foo > 0 GROUP BY a.bar;#JOINhive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo;#MULTITABLE INSERTFROM srcINSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 and src.key < 200INSERT OVERWRITE TABLE dest3 PARTITION(ds='2008-04-08', hr='12') SELECT src.key WHERE src.key >= 200 and src.key < 300INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out' SELECT src.value WHERE src.key >= 300;#STREAMINGhive> FROM invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM(a.foo, a.bar) AS (oof, rab) USING '/bin/cat' WHERE a.ds > '2008-08-09';
Simple Example Use Cases
参考资料
- get started
0 0
- HIVE入门_1
- ARM入门文章_1
- ARM入门文章_1
- 数据结构入门_1
- 驱动入门_1
- Servlet入门笔记_1
- SparkSQL入门_1
- python入门笔记_1
- MyBatis快速入门_1
- Graphql入门_1
- Vue入门篇_1
- Smarty快速入门_1
- C#编程入门_1
- Linux入门_1
- 从零开始入门密码学_1
- Hive优化_1. 数据文件优化篇
- Hive 入门
- Hive 入门
- 杭电(1269)迷宫城堡(强联通之定义算法)
- nyoj 236 心急的C小加
- 执行go get出现 go: GOPATH entry is relative错误
- Scrum敏捷开发
- urI转码
- HIVE入门_1
- 1019. General Palindromic Number
- 各种标签
- 同时启动myeclipse和eclipse如何解决端口被占用
- Android五大布局之相对布局
- AsyncTask源码解析
- KVO
- 雨
- Integer Long大小比较