HBase-1.2.1和Phoenix-4.7.0分布式安装指南
来源:互联网 发布:mac hosts文件会没有吗 编辑:程序博客网 时间:2024/05/19 23:14
目录
目录 1
1. 前言 2
2. 概念 2
2.1. Region name 2
3. 约定 2
4. 相关端口 3
5. 下载HBase 3
6. 安装步骤 3
6.1. 修改conf/regionservers 3
6.2. 修改conf/hbase-site.xml 3
6.2.1. hbase.master.info.port 5
6.2.2. hbase.master.info.bindAddress 5
6.3. 修改conf/hbase-env.sh 5
7. 系统设置 6
8. 启动运行 6
9. 基本的HBase命令 7
10. 分拆Region 8
11. 合并Region 8
12. 备HMaster配置 9
13. 访问控制配置 9
13.1. 修改配置 9
13.2. 权限管理 10
13.2.1. 授权权限 10
13.2.2. 收回权限 10
13.2.3. 更改权限 10
13.2.4. 查看权限 10
14. 常用hbase shell命令 10
15. 常见错误 11
16. 启动HBase thrift2 server 15
17. 启动HBase rest server 15
17.1. Cluster-Wide 15
17.2. Namespace 16
17.3. Table 16
17.4. Get 17
17.5. Scan 18
17.6. Put 18
18. 相关文档 19
附1:元数据 19
附2:phoenix的安装 20
1. 前言
本文将HBase-1.2.1安装在Hadoop-2.7.2上,关于Hadoop-2.7.2的安装,请参见《Hadoop-2.7.2分布式安装手册》一文。安装环境为64位SuSE-Linux 10.1版本。
本文将在HBase官方提供的quickstart.html文件的指导下进行,在docs/getting_started目录下可找到quickstart.html,或直接浏览在线的:http://hbase.apache.org/book/quickstart.html。
安装使用外置的ZooKeeper,有关ZooKeeper的安装,请参见《ZooKeeper-3.4.6分布式安装指南》一文。
关于分布式安装,请浏览:http://hbase.apache.org/book/standalone_dist.html#distributed,关于HBase使用外置的ZooKeeper配置,请浏览:http://hbase.apache.org/book/zookeeper.html。
所有在线的文档,均会出现在二进制安装包解压后的docs目录下。本文的安装环境为64位SuSE 10.1 Linux。
2. 概念
2.1. Region name
Region name用来标识一个Region,它的格式为:表名,StartKey,随机生成的RegionID,如:
test,83--G40V6UdCnEHKSKqR_yjJo798594847946710200000795,1461323021820.d4cc7afbc2d6bf3843c121fedf4d696d.
上述test为表名,中间蓝色串为Startkey,最后红色部分为Region ID(注意包含了2个点号)。如果为第一个Region,则StartKey为空,比如变成这样:
t_user,,1461549916081.f4e17b0d99f2d77da44ccb184812c345.
3. 约定
假设将Hadoop-2.7.2安装在/data/hadoop/current目录,而/data/hadoop/current实际是到/data/hadoop/hadoop-2.7.2的软链接。
HBase安装目录为/data/hadoop/hbase,而/data/hadoop/hbase实际是到hbase-1.2.1-hadoop2的软链接。
4. 相关端口
2888
ZooKeeper,如果是Leader,用来监听Follower的连接
3888
ZooKeeper,用于Leader选举
2181
ZooKeeper,用来监听客户端的连接
16010
hbase.master.info.port,HMaster的http端口
16000
hbase.master.port,HMaster的RPC端口
16030
hbase.regionserver.info.port,HRegionServer的http端口
16020
hbase.regionserver.port,HRegionServer的RPC端口
8080
hbase.rest.port,HBase REST server的端口
9095
hbase.thrift.info.port,HBase Thrift Server的http端口号
5. 下载HBase
官网:http://hbase.apache.org/,在这里即可找到下载HBase的链接。
下载国内映像站点:http://mirror.bit.edu.cn/apache/hbase/,HBase-1.2.1版本的下载网址:http://mirror.bit.edu.cn/apache/hbase/hbase-1.2.1/。选择下载hbase-1.2.1-hadoop2-bin.tar.gz。
6. 安装步骤
6.1. 修改conf/regionservers
regionservers类似于Hadoop的slaves文件,不需要在RegionServer机器上执行些修改。
将所有HRegionServers的IP或主机名一行一行的例举在在regionservers文件中,注意必须一行一个,不能一行多个。本文配置如下:
hadoop@VM_40_171_sles10_64:~/hbase/conf> cat regionservers
10.12.154.77
10.12.154.78
10.12.154.79
6.2. 修改conf/hbase-site.xml
需要在所有机器上做同样的操作,可以借助scp命令,先配置好一台,然后复制过去,如:scp hbase-site.xml hadoop@10.12.154.79:/data/hadoop/hbase/conf/。
hbase-site.xml是HBase的配置文件。默认的hbase-site.xml是空的,如下所示:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
</configuration>
没关系,就用它。不要用docs目录下的hbase-default.xml,这个会让你看得难受。
编辑hbase-site.xml,添加如下内容(摘自standalone_dist.html,搜索“Fully-distributed”):
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://172.25.40.171:9001/hbase</value>
<description>The directory shared by RegionServers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>DEVNET-154-77,DEVNET-154-70,DEVNET-154-79</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>600000</value>
<description>Time(ms) difference of regionserver from master</description>
</property>
</configuration>
“hbase.zookeeper.quorum”可以填写IP列表。hdfs://172.25.40.171:9001对应hdfs-site.xml中的“dfs.namenode.rpc-address”。“hbase.zookeeper.quorum”配置为ZooKeeper集群各节点主机名或IP。
如果HDFS是cluster模式,那么hbase.rootdir请改成集群方式,如:
<property>
<name>hbase.rootdir</name>
<value>hdfs://test/hbase</value>
</property>
即值为core-site.xml中的fs.defaultFS值,再加上hbase目录。上述示例中的test,实际为hdfs-site.xml中的dfs.nameservices的值。
更多的信息,可以浏览:http://hbase.apache.org/book/config.files.html。
6.2.1. hbase.master.info.port
用于指定HMaster的http端口。
6.2.2. hbase.master.info.bindAddress
用于指定HMaster的http的IP地址,如果不设定该值,可能使用IPv6地址。
6.3. 修改conf/hbase-env.sh
需要在所有机器上做同样的操作,可以借助scp命令,先配置好一台,然后复制过去,如:scp hbase-site.xml hadoop@10.12.154.79:/data/hadoop/hbase/conf/,修改内容如下:
1) 设置JAVA_HOME
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/data/jdk
上述/data/jdk是JDK的安装目录。
2) 设置HBASE_MANAGES_ZK
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
如果HBASE_MANAGES_ZK值为true,则表示使用HBase自带的ZooKeeper,建议单独部署ZooKeeper,这样便于ZooKeeper同时为其它系统提供服务。
3) 设置HBASE_CLASSPATH
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/data/hadoop/current/etc/hadoop
这个设置是不是有点让人迷惑?CLASSPATH怎么指向了hadoop的conf目录?这个设置是让hbase能找到hadoop,名字确实没取好。
除此之外,还可以考虑在hbase的conf目录下建立hadoop的hdfs-site.xml软链接。
7. 系统设置
在启动HBase之前完成即可,但这步需要root操作,在文件/etc/security/limits.conf中增加两项:limits和nproc,如:
hadoop - nofile 32768
hadoop hard nproc 320000
hadoop soft nproc 320000
nofile指定单个进程可以打开的文件个数,nproc指定最多进程数。“hadoop”需要改成实际的用户名。
为使limits生效,需要确保文件/etc/pam.d/login中有如下一行:
session required pam_limits.so
如果由crond拉起,则还需要将上面这一行加入到/etc/pam.d/crond中。
完成修改后,不需要重启机器,只需要得新登录一下即可生效,可以使用命令“ulimit -a”查看生效前后的变化。
8. 启动运行
进入HBASE_HOME/bin目录,执行start-hbase.sh即可启动HBase。请使用JDK提供的jps命令,分别查看HMaster和HRegionServer进程是否已经起来,同时检查日志文件是否有错误。
9. 基本的HBase命令
通过执行“hbase shell”进入命令行操作界面。详细请浏览官方文档:quickstart.html。
# 查看有哪些表
list
hbase(main):003:0> create 'test', 'cf' # 创建表test,一个列族cf
0 row(s) in 1.2200 seconds
hbase(main):003:0> list 'test'
..
1 row(s) in 0.0550 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1' # 往表test的cf列族的a字段插入值value1
0 row(s) in 0.0560 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0370 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0450 seconds
hbase(main):007:0> scan 'test' # 扫描表test
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1288380727188, value=value1
row2 column=cf:b, timestamp=1288380738440, value=value2
row3 column=cf:c, timestamp=1288380747365, value=value3
3 row(s) in 0.0590 seconds
hbase(main):008:0> get 'test', 'row1' # 从表test取一行数据
COLUMN CELL
cf:a timestamp=1288380727188, value=value1
1 row(s) in 0.0400 seconds
# 取某列的数据
get 'test', 'row1', 'cf1:col1'
# 或者
get 'test', 'row1', {COLUMN=>'cf1:col1'}
hbase(main):012:0> disable 'test'
0 row(s) in 1.0930 seconds
hbase(main):013:0> drop 'test'
0 row(s) in 0.0770 seconds
# 清空一个表
truncate 'test'
# 查表行数方法
count ‘test’
# 删除行中的某个列值
delete 't1','row1','cf1:col1'
# 删除整行
deleteall 't1','row1'
# 退出hbase shell
hbase(main):014:0> exit
查表行数第二种方法:
bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'test'
10. 分拆Region
分拆Region最简单的方式是利用HBase web提供的Split功能,只需要输入被分拆的Region Key即可,如要拆分名为“test,03333333,1467613810867.38b8ef87bbf2f1715998911aafc8c7b3.”的Resion,只需要输入:test,03333333,1467613810867,然后点Split即可。
38b8ef87bbf2f1715998911aafc8c7b3为Region的ENCODED名,是一个MD5值,即md5(test,03333333,1467613810867)的结果。
在hbase shell中操作为:split 'regionName', 'splitKey'。
11. 合并Region
预分Region时,可能会产生一些过小或空的Region,这个时候可以考虑合并空的和过小的Region。
如果需要合并Region,可以使用工具org.apache.hadoop.hbase.util.Merge,但要求停集群,如:
$ ./hbase org.apache.hadoop.hbase.util.Merge
For hadoop 0.21+, Usage: bin/hbase org.apache.hadoop.hbase.util.Merge [-Dfs.defaultFS=hdfs://nn:port] <table-name> <region-1> <region-2>
hbase shell内置了合并region命令merge_region。
hbase shell通过调用lib/ruby目录下的ruby脚本来完成许多命令,这些命令的脚本全用ruby编码,均位于lib/ruby/shell/commands目录下。不能直接运行lib/ruby/shell/commands目录下的ruby脚本,它们只是各种功能的ruby模块,需进入hbase shell环境后运行,文件名即为命令名,不带参数运行,可以得到用法,如:
hbase(main):001:0> merge_region
ERROR: wrong number of arguments (0 for 2)
Here is some help for this command:
Merge two regions. Passing 'true' as the optional third parameter will force
a merge ('force' merges regardless else merge will fail unless passed
adjacent regions. 'force' is for expert use only).
NOTE: You must pass the encoded region name, not the full region name so
this command is a little different from other region operations. The encoded
region name is the hash suffix on region names: e.g. if the region name were
TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396. then
the encoded region name portion is 527db22f95c8a9e0116f0cc13c680396
Examples:
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true
实际上,编码的Region名ENCODED_REGIONNAME是一个MD5值。在线合并示例:
hbase(main):003:0> merge_region '000d96eef8380430d650c6936b9cef7d','b27a07c88dbbc070f716ee87fab15106'
0 row(s) in 0.0730 seconds
12. 备HMaster配置
备HMaster可以有0到多个,配置和主HMaster完全相同,所以只需要复制一份已配置好的HMaster过去即可,然后同样的命令启动。启动好后,一样可以执行HBase shell命令。
13. 访问控制配置
13.1. 修改配置
为启用HBase的访问控制,需在hbase-site.xml文件中增加如下两个配置项:
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>
org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController
</value>
</property>
13.2. 权限管理
可以通过HBase shell进行权限管理,可以控制表(Table)和列族(Column Family)两个级别的权限,superuser为超级用户:
13.2.1. 授权权限
grant <user> <permissions> <table> [ <column family> [ <column qualifier> ] ]
permissions取值为0或字母R、W、C和A的组合(R:read,W:write,C:create,A:admin)。
13.2.2. 收回权限
revoke <user> <table> [ <column family> [ <column qualifier> ] ]
13.2.3. 更改权限
alter 'tablename', {OWNER => 'username'}
13.2.4. 查看权限
查看用户有哪些权限:user_permission <table>。
14. 常用hbase shell命令
以下命令均直接在hbase shell中运行:
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.util.Bytes
# 包含所有列
scan 'test',{STARTROW =>'2016081100AA1600011516', STOPROW =>'2016081124ZZ1600011516',LIMIT=>2, FILTER=>SingleColumnValueFilter.new(Bytes.toBytes('cf1'),Bytes.toBytes('id'),CompareFilter::CompareOp.valueOf('EQUAL'),Bytes.toBytes('1299840901201608111600011516'))}
# 不包含过滤的列的其它所有列
import org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter
scan 'test',{STARTROW =>'2016081100AA1600011516', STOPROW =>'2016081124ZZ1600011516',LIMIT=>2, FILTER=>SingleColumnValueExcludeFilter.new(Bytes.toBytes('cf1'),Bytes.toBytes('id'),CompareFilter::CompareOp.valueOf('EQUAL'),Bytes.toBytes('1299840901201608111600011516'))}
# 预分区建表(splits是针对整个表的,而非某列族,因此独立的{})
create 'test',{NAME => 'cf1', VERSIONS => 1},{SPLITS_FILE => 'splits.txt'}
15. 常见错误
本文的实践过程中遇到了如下一些错误:
1) 错误1:Host key not found from database
遇到如下错误,说明不能免密码登录DEVNET-154-70、DEVNET-154-77和DEVNET-154-79,假设用户名为hadoop,则可以试试ssh hadoop@DEVNET-154-70来检查是否能免密码登录:
./start-hbase.sh
DEVNET-154-70: Host key not found from database.
DEVNET-154-70: Key fingerprint:
DEVNET-154-70: xihad-rotuf-lykeh-mapup-kylin-kybub-sohid-bucaf-gafyg-vecuc-tyxux
DEVNET-154-70: You can get a public key's fingerprint by running
DEVNET-154-70: % ssh-keygen -F publickey.pub
DEVNET-154-70: on the keyfile.
DEVNET-154-70: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument
DEVNET-154-77: Host key not found from database.
DEVNET-154-77: Key fingerprint:
DEVNET-154-77: xuhog-tavip-donon-vuvac-tycyh-sysyz-zacur-didoz-fugif-vosar-ruxyx
DEVNET-154-77: You can get a public key's fingerprint by running
DEVNET-154-77: % ssh-keygen -F publickey.pub
DEVNET-154-77: on the keyfile.
DEVNET-154-77: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument
DEVNET-154-79: Host key not found from database.
DEVNET-154-79: Key fingerprint:
DEVNET-154-79: xolim-mysyg-bozes-zilyz-futaf-tatig-zaryn-pilaf-betyf-meduf-tixux
DEVNET-154-79: You can get a public key's fingerprint by running
DEVNET-154-79: % ssh-keygen -F publickey.pub
DEVNET-154-79: on the keyfile.
DEVNET-154-79: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument
2) 错误2:Failed deleting my ephemeral node
原因可能是因为之前配置错误,比如使用HBase自带的ZooKeeper启动过,后改为使用外围的ZooKeeper再启动。
2014-04-22 16:26:17,452 WARN [regionserver60020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/DEVNET-154-79,60020,1398155173411 already deleted, retry=false
2014-04-22 16:26:17,453 WARN [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/DEVNET-154-79,60020,1398155173411
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:156)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1273)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1262)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1273)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1003)
at java.lang.Thread.run(Thread.java:744)
3) 错误3:Master rejected startup because clock is out of sync
来自RegionServer端的日志,HMaster拒绝RegionServer的连接。这个错误是因为HMaster上的时间和RegionServer上的时间相差超过30秒。两种解决办法:一是同步时间,二是修改hbase-site.xml中的hbase.master.maxclockskew(HMaster端的hdfs-site.xml文件):。
2014-04-22 16:34:36,701 FATAL [regionserver60020] regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server DEVNET-154-79,60020,1398155672511 has been rejected; Reported time is too far out of sync with master. Time difference of 175968ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:316)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:216)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1281)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:284)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1998)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:839)
at java.lang.Thread.run(Thread.java:744)
将hbase.master.maxclockskew改成可以容忍10分钟:
<property>
<name>hbase.master.maxclockskew</name>
<value>600000</value>
<description>Time(ms) difference of regionserver from master</description>
</property>
4) UnknownHostException: mycluster
下面这个错误是因为底层的HDFS变更了hdfs-site.xml中的配置项dfs.nameservices。hbase-site.xml中的配置项hbase.rootdir要跟着同步更新:
2015-12-01 15:33:23,200 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2636)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2651)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2634)
... 5 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:258)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:602)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:547)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:1002)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:565)
16. 启动HBase thrift2 server
bin/hbase-daemon.sh start thrift2 --framed --hsha --workers 100
--hsha表示使用HshaServer,--workers表示HshaServer的工作线程数。更多信息请参考:
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html
默认端口号为9090,相应的http端口为9095。
17. 启动HBase rest server
bin/hbase-daemon.sh start rest -p 8080
简单访问示例(假设在10.143.136.232上启动了HBase rest server):
1) 查看HBase版本:
http://10.143.136.232:8080/version/cluster
2) 查看集群状态
http://10.143.136.232:8080/status/cluster
3) 列出所有非系统表
http://10.143.136.232:8080/
4) 列出表test的所有regions
http://10.143.136.232:8080/test/regions
5) 取rowkey为100000797550117的整行数据(返回结果需要base64解密)
http://10.143.136.232:8080/test/100000797550117
6) 取rowkey为100000797550117,列族cf1下列field0列的数据(返回结果需要base64解密)
http://10.143.136.232:8080/test/100000797550117/cf1:field0
更多请浏览:
http://hbase.apache.org/book.html#_rest
17.1. Cluster-Wide
Endpoint
HTTP Verb
说明
示例
/version/cluster
GET
查看HBase版本
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/version/cluster"
/status/cluster
GET
查看集群状态
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/status/cluster"
/
GET
列出所有的非系统表
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/"
注:可浏览器中直接打开,如:http://10.143.136.232:8080/version/cluster。
17.2. Namespace
Endpoint
HTTP Verb
说明
示例
/namespaces
GET
列出所有namespaces
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/namespaces/"
/namespaces/namespace
GET
查看指定namespace的说明
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/namespaces/special_ns"
/namespaces/namespace
POST
创建一个新的namespace
curl -vi -X POST \
-H "Accept: text/xml" \
"example.com:8000/namespaces/special_ns"
/namespaces/namespace/tables
GET
列出指定namespace下的所有表
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/namespaces/special_ns/tables"
/namespaces/namespace
PUT
修改一个已存在的namespace
curl -vi -X PUT \
-H "Accept: text/xml" \
"http://example.com:8000/namespaces/special_ns
/namespaces/namespace
DELETE
删除一个namespace,前提是该namespace已为空
curl -vi -X DELETE \
-H "Accept: text/xml" \
"example.com:8000/namespaces/special_ns"
注:斜体部分是需要输入的。
17.3. Table
Endpoint
HTTP Verb
说明
示例
/table/schema
GET
查看指定表的schema
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/schema"
/table/schema
POST
使用schema创建一个新的表或修改已存在表的schema
curl -vi -X POST \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" /></TableSchema>' \
"http://example.com:8000/users/schema"
/table/schema
PUT
使用schema更新已存在的表
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" KEEP_DELETED_CELLS="true" /></TableSchema>' \
"http://example.com:8000/users/schema"
/table/schema
DELETE
删除表
curl -vi -X DELETE \
-H "Accept: text/xml" \
"http://example.com:8000/users/schema"
/table/regions
GET
列出表的所有regions
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/regions
17.4. Get
Endpoint
HTTP Verb
说明
示例
/table/row/column:qualifier/timestamp
GET
取指定表指定列族下指定列的指定时间戳的值,返回的值为经过base64编码的,因此使用时需要做base64解码
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/row1"
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/row1/cf:a/1458586888395"
/table/row/column:qualifier
GET
取指定表的指定列族下指定列的值
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/row1/cf:a"
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/row1/cf:a/"
/table/row/column:qualifier/?v=number_of_versions
GET
取指定表的指定列族下指定列的指定版本值
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/row1/cf:a?v=2"
17.5. Scan
Endpoint
HTTP Verb
说明
示例
/table/scanner/
PUT
创建一个scanner
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '<Scanner batch="1"/>' \
"http://example.com:8000/users/scanner/"
/table/scanner/
PUT
带Filter创建一个scanner,过滤器可以写在一个文本文件中,格式如:
<Scanner batch="100">
<filter>
{
"type": "PrefixFilter",
"value": "u123"
}
</filter>
</Scanner>
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type:text/xml" \
-d @filter.txt \
"http://example.com:8000/users/scanner/"
/table/scanner/scanner-id
GET
取下一批数据,如果已无数据,则返回的HTTP代码为204
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:8000/users/scanner/145869072824375522207"
table/scanner/scanner-id
DELETE
删除指定的scanner,释放资源
curl -vi -X DELETE \
-H "Accept: text/xml" \
"http://example.com:8000/users/scanner/145869072824375522207"
17.6. Put
Endpoint
HTTP Verb
说明
示例
/table/row_key
PUT
往指定表写一行数据,注意行键、列族、列名和列值都必须采用base64编码
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="cm93NQo="><Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell></Row></CellSet>' \
"http://example.com:8000/users/fakerow"
curl -vi -X PUT \
-H "Accept: text/json" \
-H "Content-Type: text/json" \
-d '{"Row":[{"key":"cm93NQo=", "Cell": [{"column":"Y2Y6ZQo=", "$":"dmFsdWU1Cg=="}]}]}'' \
"example.com:8000/users/fakerow"
18. 相关文档
《HBase-1.2.1分布式安装指南》
《Hive 0.12.0安装指南》
《ZooKeeper-3.4.6分布式安装指南》
《Hadoop 2.3.0源码反向工程》
《在Linux上编译Hadoop-2.7.2》
《Accumulo-1.5.1安装指南》
《Drill 1.0.0安装指南》
《Shark 0.9.1安装指南》
更多,敬请关注技术博客:http://aquester.cublog.cn。
附1:元数据
hbase在zookeeper上的目录结构:
[zk: localhost:2181(CONNECTED) 24] ls /hbase
[replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, region-in-transition, online-snapshot, acl, master, running, recovering-regions, draining, namespace, hbaseid, table]
从0.96版本开始root-region-server被meta-region-server替代,原来的root被删除了,新的meta像原来的root一样,只有一个Region,不再会有多个Region。
从0.96版本开始引入了namespace,删除了-ROOT-表,之前的.META.表被hbase:meta表替代,其中hbase为namespace名。namespace可以认为类似于MySQL中的DB名,用于对表进行逻辑分组管理。
客户端对hbase提供DML操作不需要访问master,但DDL操作依赖master,在hbase shell中的list也依赖于master。
在主hbase master的web上,可以看到有三个系统表:hbase:acl,hbase:meta和hbase:namespace,注意hbase:acl和hbase:namespace的元数据也存储在hbase:meta中,这可以通过在hbase shell中执行scan 'hbase:meta'观察到。
hbase(main):015:0* scan 'hbase:meta',{LIMIT=>10}
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:regioninfo, timestamp=1460426830411, value={ENCODED => 0bbdf170c309223c0ce830facdff9edd, NAME => 'hbase:acl,,1460426731436.0bbdf
facdff9edd. 170c309223c0ce830facdff9edd.', STARTKEY => '', ENDKEY => ''}
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:seqnumDuringOpen, timestamp=1461653766642, value=\x00\x00\x00\x00\x00\x00\x002
facdff9edd.
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:server, timestamp=1461653766642, value=hadoop-034:16020
facdff9edd.
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:serverstartcode, timestamp=1461653766642, value=1461653610096
第一列,即红色串为Region name;serverstartcode为Regsion server加载region的时间;server为Region server的IP和端口;regioninfo结构为:
1) ENCODED 为Region name的MD5值
2) NAME 为Region name
3) STARTKEY 为空表示为第一个Region
4) ENDKEY 如果也为空,则表示该表只有一个Region
附2:phoenix的安装
Phoenix的安装非常简单。官网有说明(http://phoenix.incubator.apache.org/download.html),二进制安装包可从http://www.apache.org/dyn/closer.cgi/incubator/phoenix/上下载,本文下载的是phoenix-4.7.0-incubating.tar.gz,注意和HBase的兼容关系:
Phoenix版本
HBase版本
Phoenix 2.x
HBase 0.94.x
Phoenix 3.x
HBase 0.94.x
Phoenix 4.x
HBase 0.98.1+
安装步骤为:
1) 将phoenix-4.7.0-incubating.tar.gz上传到Phoenix客户端机器,假设安装到/data/hadoop
2) 解压phoenix-4.7.0-incubating.tar.gz,解压后生成phoenix-4.7.0-incubating目录
3) 建立软链接:ln -s phoenix-4.7.0-incubating phoenix
4) 将/data/hadoop/phoenix/hadoop-2/phoenix-4.7.0-incubating-client.jar添加到CLASSPATH
5) 将phoenix/common目录下的phoenix-core-4.7.0-incubating.jar复制到所有HBase region server的的CLASSPATH中,比如HBase的lib目录
6) 重启HBase集群
运行phoenix也非常简单,命令格式为:
sqlline.py zookeeper file.sql
示例:
hadoop@VM-40-171-sles10-64:~/phoenix/bin> ./sqlline.py 10.12.154.78
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:10.12.154.78 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:10.12.154.78
Connected to: Phoenix (version 4.0)
Driver: org.apache.phoenix.jdbc.PhoenixDriver (version 4.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
53/53 (100%) Done
Done
sqlline version 1.1.2
0: jdbc:phoenix:10.12.154.78> select * from test;
Error: ERROR 1012 (42M03): Table undefined. tableName=TEST (state=42M03,code=1012)
0: jdbc:phoenix:10.12.154.78> create table test ( a int, b string);
Error: ERROR 601 (42P00): Syntax error. Unsupported sql type: INT (state=42P00,code=601)
0: jdbc:phoenix:10.12.154.78> create table test (a integer, b integer);
Error: ERROR 509 (42888): The table does not have a primary key. tableName=TEST (state=42888,code=509)
0: jdbc:phoenix:10.12.154.78> create table test (a integer primary key, b integer) ;
No rows affected (1.424 seconds)
0: jdbc:phoenix:10.12.154.78> UPSERT INTO TEST VALUES (1, 1);
1 row affected (0.099 seconds)
0: jdbc:phoenix:10.12.154.78> UPSERT INTO TEST VALUES (2, 12);
1 row affected (0.02 seconds)
0: jdbc:phoenix:10.12.154.78> select * from test;
+------------+------------+
| A | B |
+------------+------------+
| 1 | 1 |
| 2 | 12 |
+------------+------------+
2 rows selected (0.116 seconds)
0: jdbc:phoenix:10.12.154.78>
有关语法请浏览:http://phoenix.incubator.apache.org/language/index.html,有关数据类型请浏览:http://phoenix.incubator.apache.org/language/datatypes.html。
- HBase-1.2.1和Phoenix-4.7.0分布式安装指南
- HBase-0.98.0和Phoenix-4.0.0分布式安装指南
- HBase-0.98.0和Phoenix-4.0.0分布式安装指南
- HBase分布式安装指南
- phoenix-4.8.1-HBase-1.2安装(详细图文)
- Phoenix 4.x HBase 0.98.1安装
- HBase + Phoenix 安装试用
- Phoenix和Hbase整合
- phoenix 3.1 + hbase 0.94.21 的安装和使用
- phoenix 3.1 + hbase 0.94.21 的安装和使用
- phoenix hbase 安装 eclipse 测试
- hadoop+hbase+zookeeper+phoenix安装
- CDH5.8 HBase安装Phoenix
- CDH5.8 HBase安装Phoenix
- Phoenix安装、连接Hbase、配置
- HBase学习04-phoenix安装
- CDH5.8HBase安装Phoenix
- HBase和Phoenix的整合
- 网关协议学习:CGI、FastCGI、WSGI(来自标点符的《网关协议学习:CGI、FastCGI、WSGI》)
- 长臂挖掘机建模要如何运用MapleSim
- 分组交换
- spring源码分析之spring-jdbc模块详解
- 2种方式实现局部刷新
- HBase-1.2.1和Phoenix-4.7.0分布式安装指南
- 定义变量的地方!
- 高速缓存与主存的三种映射方式
- jquery ui tree
- 在存储过程中访问视图授权
- Leetcode #283 Move Zeroes
- C++ 数字与字符串互转
- apns server 官方文档
- AsyncTask下载