hive的安装+msyql远程元数据库+hive的进阶优化

来源:互联网 发布:互盾数据恢复软件破解 编辑:程序博客网 时间:2024/06/11 02:43
1、安装mysql-5.5
(1)、rpm包安装mysql
[root@itr-mastertest01 installpackage]# scp MySQL-client-5.5.40-1.linux2.6.x86_64.rpm MySQL-server-5.5.40-1.linux2.6.x86_64.rpm itr-mastertest02:/usr/local/
[root@itr-mastertest02 installpackage]# rpm -qa | grep mysql  [查询是否自带mysql]
[root@itr-mastertest02 conf]# rpm -e mysql-libs-5.1.71-1.el6.x86_64 --nodeps[不验证依赖卸载]
[root@itr-mastertest02 installpackage]# rpm -qa | grep mysql   [再次查看卸载是否成功]
[root@itr-mastertest02 local]# rpm -i MySQL-server-5.5.40-1.linux2.6.x86_64.rpm[安装服务端]
[root@itr-mastertest02 local]# mysqld_safe &[后台启动,jobs查看]
[root@itr-mastertest02 local]# rpm -i MySQL-client-5.5.40-1.linux2.6.x86_64.rpm [安装客户端]


(2)、修改数据库配置信息
[root@itr-mastertest02 local]# mysql_secure_installation
NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MySQL
            SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!


            In order to log into MySQL to secure it, we'll need the current
            password for the root user.  If you've just installed MySQL, and
            you haven't set the root password yet, the password will be blank,
            so you should just press enter here.


            Enter current password for root (enter for none):   //当前密码,第一次使用为null
                OK, successfully used password, moving on...


                Setting the root password ensures that nobody can log into the MySQL
                root user without the proper authorisation.


                Set root password? [Y/n] Y      //设置密码
                New password:                   /admin
                Re-enter new password:          //admin
                Password updated successfully!
                Reloading privilege tables..
                ... Success!


                By default, a MySQL installation has an anonymous user, allowing anyone
                to log into MySQL without having to have a user account created for
                them.  This is intended only for testing, and to make the installation
                go a bit smoother.  You should remove them before moving into a
                production environment.


                Remove anonymous users? [Y/n] n     //是否删除匿名用户
                ... skipping.


                Normally, root should only be allowed to connect from 'localhost'.  This
                ensures that someone cannot guess at the root password from the network.


                Disallow root login remotely? [Y/n] n       //是否允许远程登录
                ... skipping.


                By default, MySQL comes with a database named 'test' that anyone can
                access.  This is also intended only for testing, and should be removed
                before moving into a production environment.


                Remove test database and access to it? [Y/n] n  //是否删除测试数据库
                ... skipping.


                Reloading the privilege tables will ensure that all changes made so far
                will take effect immediately.


                Reload privilege tables now? [Y/n] Y        //重新加载权限表
                ... Success!


                Cleaning up...


                All done!  If you've completed all of the above steps, your MySQL
                installation should now be secure.


                Thanks for using MySQL!


(3)、登录mysql数据库验证
[root@itr-mastertest02 local]# mysql -uroot -padmin


(4)创建数据库并授权
[root@itr-mastertest02 conf]#  mysql -uroot -padmin
mysql> create database hive;
mysql> grant all on hive.* to 'root'@'%' identified by 'admin'; [授权root在任何位置(%)远程可以登陆]
mysql> flush privileges;[刷新权限]


(5)使用的是SQLyog数据库可视化软件连接验证


2、hive-0.13.1的安装
(1)、解压
[root@itr-mastertest01 installpackage]# tar -zxvf apache-hive-0.13.1-bin.tar.gz -C /usr/local/


(2)、scp hive-0.13.1 [hive安装在itr-mastertest02]
[root@itr-mastertest01 local]# scp -rq hive-0.13.1 itr-mastertest02:/usr/local/


(3)、cp配置文件
[root@itr-mastertest02 conf]# cp hive-exec-log4j.properties.template hive-exec-log4j.properties
[root@itr-mastertest02 conf]# cp hive-log4j.properties.template hive-log4j.properties
[root@itr-mastertest02 conf]# cp hive-env.sh.template hive-env.sh
[root@itr-mastertest02 conf]# cp hive-default.xml.template hive-site.xml


4、配置hive-config.sh
vi /usr/local/hive-0.13.1/bin/hive-config.sh
export JAVA_HOME=/usr/local/jdk1.7.0_45
export HIVE_HOME=/usr/local/hive-0.13.1
export HADOOP_HOME=/usr/local/hadoop-2.4.0


(5)、配置hive-site.xml [mysql做metastore]
//连接地址,基本配置
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://itr-mastertest02:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
//mysql驱动
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
//用户名
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
//用户密码
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>admin</value>
</property>
//默认数据库位置
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
</property>




//关闭本地metastore, hive配置远程metastore
<property>
  <name>hive.metastore.local</name>
  <value>false</value>
  <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
//hive远程metastore的thrift地址        
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://itr-mastertest02:9083</value>
</property>




//hiveserver2的配置
<property>  
<name>hive.support.concurrency</name>   
<description>Enable Hive's Table Lock Manager Service</description>   
<value>true</value>  
</property> 
<property>  
<name>hive.zookeeper.quorum</name>  
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>itr-mastertest01,itr-mastertest02,itr-nodetest01</value>
</property> 
<property>
  <name>hive.zookeeper.client.port</name>
  <value>2181</value>
  <description>The port of zookeeper servers to talk to. This is only needed for read/write locks.</description>
</property>
<property>  
<name>hive.server2.thrift.bind.host</name>
<value>itr-mastertest02</value>
<description>Bind host on which to run the HiveServer2 Thrift interface. Can    be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>  
</property> 




//关闭推测式执行,一些优化项
<property>
 <name>hive.mapred.reduce.tasks.speculative.execution</name>
 <value>false</value>
</property>
<property>
 <name>mapreduce.reduce.speculative</name>
 <value>false</value>
</property>
小表mapjoin
<property>
 <name>hive.ignore.mapjoin.hint</name>
 <value>false</value>
</property>
<property>
 <name>hive.mapjoin.smalltable.filesize</name>
 <value>500000000</value>
</property>
并行执行
<property>
 <name>hive.exec.parallel</name>
 <value>true</value>
</property>
<property>
 <name>hive.exec.parallel.thread.number</name>
 <value>16</value>
</property>
客户端显示
<property>
 <name>hive.cli.print.current.db</name>
 <value>true</value>
</property>
<property>
 <name>hive.cli.print.header</name>
 <value>true</value>
</property>
关闭自动统计
<property>
 <name>hive.stats.autogather</name>
 <value>false</value>
</property>


(6)、cp mysql-connector-java-5.1.26-bin.jar 到hive-0.13.1/lib


(7)、启动metastore
    [root@itr-mastertest02 local]# hive-0.13.1/bin/hive --service metastore &       [jobs查看]


(8)、启动hive
    [root@itr-mastertest02 local]# hive-0.13.1/bin/hive


(9)、启动hiveserver2 [jdbc服务]
    hive-0.13.1/bin/hive --service hiveserver2 start


(10)、测试hiveserver2
    使用beeline控制台连接hiveserver2:
    /usr/local/hive-0.13.1/beeline> !connect jdbc:hive2://hadoop0:10000 root admin org.apache.hive.jdbc.HiveDriver


(11)、hive安装目录加入环境变量中
    [root@itr-mastertest02 local]# vi /etc/profile


附录1:hiveserver1和hiveserver2的区别
Hiveserver1 和hiveserver2的JDBC区别:
HiveServer version    Connection URL             Driver Class
HiveServer2           jdbc:hive2://:              org.apache.hive.jdbc.HiveDriver
HiveServer1           jdbc:hive://:               org.apache.hadoop.hive.jdbc.HiveDriver


参考地址:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients


3、hive的语法介绍
(1) 创建表
    hive (default)> create table t1(int id);     [创建表]
    hive (default)> create table t2(id int,name string) row format delimited fields terminated by '\t';   [指定行分割符]




(2) 加载数据
    hive (default)> load data local inpath '/usr/local/testdata/id' into table t1;  [linux本地加载数据到hive,copy数据]
    hive (default)> load data local inpath '/usr/local/name' into table t2;         [linux本地加载数据到hive,copy数据]


(3) 删除表
    hive (default)> drop table t1;      [受控表managed table,从hdfs删除数据,metastore元数据信息删除]
    mysql> select * from TBLS;  [TBL_TYPE字段记录表类型MANAGED_TABLE or other]


(4) 受控表(MANAGED_TABLE)包括内部表,分区表,桶表。


(5) 分区表[数据在不同文件目录中,扫描范围]
    hive (default)> create table t3(id int,name string) partitioned by(grade int) row format delimited fields terminated by '\t';    [创建分区表]
    hive (default)> load data local inpath '/usr/local/name' into table t3 partition(grade=1);  [加载数据]
    hive (default)> select * from t3 where grade=1;     [分区字段可以作用过滤条件,是一个文件夹的标识]


    hive (default)> create table enroll(id int,name string) partitioned by(year int,month int) row format delimited fields terminated by '\t';
    load data local inpath '/usr/local/name' into table enroll partition(year=2014,month=11-12);


(6) 桶表(bucket table)
    表链接中使用,提高效率
    hive (default)> create table bucket_test(id int,name string) clustered by(id)  into 3 buckets;   [按照id分为3个桶]
    hive (default)> set hive.enforce.bucketing=true;        [启用桶表]


    insert overwrite table buckets select ... from ...      [插入数据]
    hive (default)> insert overwrite table bucket_test select id,name from t2;


(7) 外部表EXTERNAL_TABLE[删除表,只删除表定义,对HDFS的数据不会删除]{外部分区表}
    [root@itr-mastertest02 local]# hadoop fs -put name /testdata/external_teble
    hive (default)> create external table external_test(id int,name string) row format delimited fields terminated by '\t' location '/testdata/external_table';
    hive (default)> select id,name from external_test order by id desc;


4、命令行工具
(1) [root@itr-mastertest02 local]# hive
(2) [root@itr-mastertest02 local]# hive -e "select * from t2";
(3) [root@itr-mastertest02 local]# hive -e "select * from t2" >> a
(4) [root@itr-mastertest02 local]# hive -S -e "select * from t2" >> a
(5) [root@itr-mastertest02 local]# hive --hiveconf hive.querylog.location=/usr/local/hive-0.13.1/logs
(6) hive (default)> set hive.querylog.location;
    hive.querylog.location=/tmp/root
(7) hive -f file
(8) source file
(9) /root/.hiverc 和 /root/.hivehistory 


搜狗数据
use sougoulibs;

create external table sogou(dt string, websession string, word string, s_seq int, c_seq int, website string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile location '/labs/sogou/';


大数据讨论群:288396468

0 0