HBase-1.2.1和Phoenix-4.7.0分布式安装指南

来源：互联网发布：mac hosts文件会没有吗编辑：程序博客网时间：2024/05/19 23:14

1. 前言

本文将HBase-1.2.1安装在Hadoop-2.7.2上，关于Hadoop-2.7.2的安装，请参见《Hadoop-2.7.2分布式安装手册》一文。安装环境为64位SuSE-Linux 10.1版本。

本文将在HBase官方提供的quickstart.html文件的指导下进行，在docs/getting_started目录下可找到quickstart.html，或直接浏览在线的：http://hbase.apache.org/book/quickstart.html。

安装使用外置的ZooKeeper，有关ZooKeeper的安装，请参见《ZooKeeper-3.4.6分布式安装指南》一文。

关于分布式安装，请浏览：http://hbase.apache.org/book/standalone_dist.html#distributed，关于HBase使用外置的ZooKeeper配置，请浏览：http://hbase.apache.org/book/zookeeper.html。

所有在线的文档，均会出现在二进制安装包解压后的docs目录下。本文的安装环境为64位SuSE 10.1 Linux。

2. 概念

2.1. Region name

Region name用来标识一个Region，它的格式为：表名,StartKey,随机生成的RegionID，如：

test,83--G40V6UdCnEHKSKqR_yjJo798594847946710200000795,1461323021820.d4cc7afbc2d6bf3843c121fedf4d696d.

上述test为表名，中间蓝色串为Startkey，最后红色部分为Region ID（注意包含了2个点号）。如果为第一个Region，则StartKey为空，比如变成这样：

t_user,,1461549916081.f4e17b0d99f2d77da44ccb184812c345.

3. 约定

假设将Hadoop-2.7.2安装在/data/hadoop/current目录，而/data/hadoop/current实际是到/data/hadoop/hadoop-2.7.2的软链接。

HBase安装目录为/data/hadoop/hbase，而/data/hadoop/hbase实际是到hbase-1.2.1-hadoop2的软链接。

4. 相关端口

2888

ZooKeeper，如果是Leader，用来监听Follower的连接

3888

ZooKeeper，用于Leader选举

2181

ZooKeeper，用来监听客户端的连接

16010

hbase.master.info.port，HMaster的http端口

16000

hbase.master.port，HMaster的RPC端口

16030

hbase.regionserver.info.port，HRegionServer的http端口

16020

hbase.regionserver.port，HRegionServer的RPC端口

8080

hbase.rest.port，HBase REST server的端口

9095

hbase.thrift.info.port，HBase Thrift Server的http端口号

5. 下载HBase

官网：http://hbase.apache.org/，在这里即可找到下载HBase的链接。

下载国内映像站点：http://mirror.bit.edu.cn/apache/hbase/，HBase-1.2.1版本的下载网址：http://mirror.bit.edu.cn/apache/hbase/hbase-1.2.1/。选择下载hbase-1.2.1-hadoop2-bin.tar.gz。

6. 安装步骤

6.1. 修改conf/regionservers

regionservers类似于Hadoop的slaves文件，不需要在RegionServer机器上执行些修改。

将所有HRegionServers的IP或主机名一行一行的例举在在regionservers文件中，注意必须一行一个，不能一行多个。本文配置如下：

hadoop@VM_40_171_sles10_64:~/hbase/conf> cat regionservers

10.12.154.77

10.12.154.78

10.12.154.79

6.2. 修改conf/hbase-site.xml

需要在所有机器上做同样的操作，可以借助scp命令，先配置好一台，然后复制过去，如：scp hbase-site.xml hadoop@10.12.154.79:/data/hadoop/hbase/conf/。

hbase-site.xml是HBase的配置文件。默认的hbase-site.xml是空的，如下所示：

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

/**

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

* http://www.apache.org/licenses/LICENSE-2.0

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

-->

</configuration>

没关系，就用它。不要用docs目录下的hbase-default.xml，这个会让你看得难受。

编辑hbase-site.xml，添加如下内容（摘自standalone_dist.html，搜索“Fully-distributed”）：

<name>hbase.rootdir</name>

<value>hdfs://172.25.40.171:9001/hbase</value>

<description>The directory shared by RegionServers.</description>

</property>

<name>hbase.cluster.distributed</name>

<description>The mode the cluster will be in. Possible values are

false: standalone and pseudo-distributed setups with managed Zookeeper

true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)

</description>

</property>

<name>hbase.zookeeper.quorum</name>

<value>DEVNET-154-77,DEVNET-154-70,DEVNET-154-79</value>

<description>Comma separated list of servers in the ZooKeeper Quorum.

For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".

By default this is set to localhost for local and pseudo-distributed modes

of operation. For a fully-distributed setup, this should be set to a full

list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh

this is the list of servers which we will start/stop ZooKeeper on.

</description>

</property>

<name>hbase.master.maxclockskew</name>

<description>Time(ms) difference of regionserver from master</description>

</property>

</configuration>

“hbase.zookeeper.quorum”可以填写IP列表。hdfs://172.25.40.171:9001对应hdfs-site.xml中的“dfs.namenode.rpc-address”。“hbase.zookeeper.quorum”配置为ZooKeeper集群各节点主机名或IP。

如果HDFS是cluster模式，那么hbase.rootdir请改成集群方式，如：

<name>hbase.rootdir</name>

<value>hdfs://test/hbase</value>

</property>

即值为core-site.xml中的fs.defaultFS值，再加上hbase目录。上述示例中的test，实际为hdfs-site.xml中的dfs.nameservices的值。

更多的信息，可以浏览：http://hbase.apache.org/book/config.files.html。

6.2.1. hbase.master.info.port

用于指定HMaster的http端口。

6.2.2. hbase.master.info.bindAddress

用于指定HMaster的http的IP地址，如果不设定该值，可能使用IPv6地址。

6.3. 修改conf/hbase-env.sh

需要在所有机器上做同样的操作，可以借助scp命令，先配置好一台，然后复制过去，如：scp hbase-site.xml hadoop@10.12.154.79:/data/hadoop/hbase/conf/，修改内容如下：

1) 设置JAVA_HOME

# The java implementation to use. Java 1.6 required.

export JAVA_HOME=/data/jdk

上述/data/jdk是JDK的安装目录。

2) 设置HBASE_MANAGES_ZK

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=false

如果HBASE_MANAGES_ZK值为true，则表示使用HBase自带的ZooKeeper，建议单独部署ZooKeeper，这样便于ZooKeeper同时为其它系统提供服务。

3) 设置HBASE_CLASSPATH

# Extra Java CLASSPATH elements. Optional.

export HBASE_CLASSPATH=/data/hadoop/current/etc/hadoop

这个设置是不是有点让人迷惑？CLASSPATH怎么指向了hadoop的conf目录？这个设置是让hbase能找到hadoop，名字确实没取好。

除此之外，还可以考虑在hbase的conf目录下建立hadoop的hdfs-site.xml软链接。

7. 系统设置

在启动HBase之前完成即可，但这步需要root操作，在文件/etc/security/limits.conf中增加两项：limits和nproc，如：

hadoop - nofile 32768

hadoop hard nproc 320000

hadoop soft nproc 320000

nofile指定单个进程可以打开的文件个数，nproc指定最多进程数。“hadoop”需要改成实际的用户名。

为使limits生效，需要确保文件/etc/pam.d/login中有如下一行：

session required pam_limits.so

如果由crond拉起，则还需要将上面这一行加入到/etc/pam.d/crond中。

完成修改后，不需要重启机器，只需要得新登录一下即可生效，可以使用命令“ulimit -a”查看生效前后的变化。

8. 启动运行

进入HBASE_HOME/bin目录，执行start-hbase.sh即可启动HBase。请使用JDK提供的jps命令，分别查看HMaster和HRegionServer进程是否已经起来，同时检查日志文件是否有错误。

9. 基本的HBase命令

通过执行“hbase shell”进入命令行操作界面。详细请浏览官方文档：quickstart.html。

# 查看有哪些表

list

hbase(main):003:0> create 'test', 'cf' # 创建表test，一个列族cf

0 row(s) in 1.2200 seconds

hbase(main):003:0> list 'test'

1 row(s) in 0.0550 seconds

hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1' # 往表test的cf列族的a字段插入值value1

0 row(s) in 0.0560 seconds

hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'

0 row(s) in 0.0370 seconds

hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'

0 row(s) in 0.0450 seconds

hbase(main):007:0> scan 'test' # 扫描表test

ROW COLUMN+CELL

row1 column=cf:a, timestamp=1288380727188, value=value1

row2 column=cf:b, timestamp=1288380738440, value=value2

row3 column=cf:c, timestamp=1288380747365, value=value3

3 row(s) in 0.0590 seconds

hbase(main):008:0> get 'test', 'row1' # 从表test取一行数据

COLUMN CELL

cf:a timestamp=1288380727188, value=value1

1 row(s) in 0.0400 seconds

# 取某列的数据

get 'test', 'row1', 'cf1:col1'

# 或者

get 'test', 'row1', {COLUMN=>'cf1:col1'}

hbase(main):012:0> disable 'test'

0 row(s) in 1.0930 seconds

hbase(main):013:0> drop 'test'

0 row(s) in 0.0770 seconds

# 清空一个表

truncate 'test'

# 查表行数方法

count ‘test’

# 删除行中的某个列值

delete 't1','row1','cf1:col1'

# 删除整行

deleteall 't1','row1'

# 退出hbase shell

hbase(main):014:0> exit

查表行数第二种方法：

bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'test'

10. 分拆Region

分拆Region最简单的方式是利用HBase web提供的Split功能，只需要输入被分拆的Region Key即可，如要拆分名为“test,03333333,1467613810867.38b8ef87bbf2f1715998911aafc8c7b3.”的Resion，只需要输入：test,03333333,1467613810867，然后点Split即可。

38b8ef87bbf2f1715998911aafc8c7b3为Region的ENCODED名，是一个MD5值，即md5(test,03333333,1467613810867)的结果。

在hbase shell中操作为：split 'regionName', 'splitKey'。

11. 合并Region

预分Region时，可能会产生一些过小或空的Region，这个时候可以考虑合并空的和过小的Region。

如果需要合并Region，可以使用工具org.apache.hadoop.hbase.util.Merge，但要求停集群，如：

$ ./hbase org.apache.hadoop.hbase.util.Merge

For hadoop 0.21+, Usage: bin/hbase org.apache.hadoop.hbase.util.Merge [-Dfs.defaultFS=hdfs://nn:port] <table-name> <region-1> <region-2>

hbase shell内置了合并region命令merge_region。

hbase shell通过调用lib/ruby目录下的ruby脚本来完成许多命令，这些命令的脚本全用ruby编码，均位于lib/ruby/shell/commands目录下。不能直接运行lib/ruby/shell/commands目录下的ruby脚本，它们只是各种功能的ruby模块，需进入hbase shell环境后运行，文件名即为命令名，不带参数运行，可以得到用法，如：

hbase(main):001:0> merge_region

ERROR: wrong number of arguments (0 for 2)

Here is some help for this command:

Merge two regions. Passing 'true' as the optional third parameter will force

a merge ('force' merges regardless else merge will fail unless passed

adjacent regions. 'force' is for expert use only).

NOTE: You must pass the encoded region name, not the full region name so

this command is a little different from other region operations. The encoded

region name is the hash suffix on region names: e.g. if the region name were

TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396. then

the encoded region name portion is 527db22f95c8a9e0116f0cc13c680396

Examples:

hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'

hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true

实际上，编码的Region名ENCODED_REGIONNAME是一个MD5值。在线合并示例：

hbase(main):003:0> merge_region '000d96eef8380430d650c6936b9cef7d','b27a07c88dbbc070f716ee87fab15106'

0 row(s) in 0.0730 seconds

12. 备HMaster配置

备HMaster可以有0到多个，配置和主HMaster完全相同，所以只需要复制一份已配置好的HMaster过去即可，然后同样的命令启动。启动好后，一样可以执行HBase shell命令。

13. 访问控制配置

13.1. 修改配置

为启用HBase的访问控制，需在hbase-site.xml文件中增加如下两个配置项：

<name>hbase.coprocessor.master.classes</name>

<value>org.apache.hadoop.hbase.security.access.AccessController</value>

</property>

<name>hbase.coprocessor.region.classes</name>

<value>

org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController

</value>

</property>

13.2. 权限管理

可以通过HBase shell进行权限管理，可以控制表（Table）和列族（Column Family）两个级别的权限，superuser为超级用户：

13.2.1. 授权权限

grant <user> <permissions> <table> [ <column family> [ <column qualifier> ] ]

permissions取值为0或字母R、W、C和A的组合（R：read，W：write，C：create，A：admin）。

13.2.2. 收回权限

revoke <user> <table> [ <column family> [ <column qualifier> ] ]

13.2.3. 更改权限

alter 'tablename', {OWNER => 'username'}

13.2.4. 查看权限

查看用户有哪些权限：user_permission <table>。

14. 常用hbase shell命令

以下命令均直接在hbase shell中运行：

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter

import org.apache.hadoop.hbase.filter.CompareFilter

import org.apache.hadoop.hbase.util.Bytes

# 包含所有列

scan 'test',{STARTROW =>'2016081100AA1600011516', STOPROW =>'2016081124ZZ1600011516',LIMIT=>2, FILTER=>SingleColumnValueFilter.new(Bytes.toBytes('cf1'),Bytes.toBytes('id'),CompareFilter::CompareOp.valueOf('EQUAL'),Bytes.toBytes('1299840901201608111600011516'))}

# 不包含过滤的列的其它所有列

import org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter

scan 'test',{STARTROW =>'2016081100AA1600011516', STOPROW =>'2016081124ZZ1600011516',LIMIT=>2, FILTER=>SingleColumnValueExcludeFilter.new(Bytes.toBytes('cf1'),Bytes.toBytes('id'),CompareFilter::CompareOp.valueOf('EQUAL'),Bytes.toBytes('1299840901201608111600011516'))}

# 预分区建表（splits是针对整个表的，而非某列族,因此独立的{}）

create 'test',{NAME => 'cf1', VERSIONS => 1},{SPLITS_FILE => 'splits.txt'}

15. 常见错误

本文的实践过程中遇到了如下一些错误：

1) 错误1：Host key not found from database

遇到如下错误，说明不能免密码登录DEVNET-154-70、DEVNET-154-77和DEVNET-154-79，假设用户名为hadoop，则可以试试ssh hadoop@DEVNET-154-70来检查是否能免密码登录：

./start-hbase.sh

DEVNET-154-70: Host key not found from database.

DEVNET-154-70: Key fingerprint:

DEVNET-154-70: xihad-rotuf-lykeh-mapup-kylin-kybub-sohid-bucaf-gafyg-vecuc-tyxux

DEVNET-154-70: You can get a public key's fingerprint by running

DEVNET-154-70: % ssh-keygen -F publickey.pub

DEVNET-154-70: on the keyfile.

DEVNET-154-70: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument

DEVNET-154-77: Host key not found from database.

DEVNET-154-77: Key fingerprint:

DEVNET-154-77: xuhog-tavip-donon-vuvac-tycyh-sysyz-zacur-didoz-fugif-vosar-ruxyx

DEVNET-154-77: You can get a public key's fingerprint by running

DEVNET-154-77: % ssh-keygen -F publickey.pub

DEVNET-154-77: on the keyfile.

DEVNET-154-77: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument

DEVNET-154-79: Host key not found from database.

DEVNET-154-79: Key fingerprint:

DEVNET-154-79: xolim-mysyg-bozes-zilyz-futaf-tatig-zaryn-pilaf-betyf-meduf-tixux

DEVNET-154-79: You can get a public key's fingerprint by running

DEVNET-154-79: % ssh-keygen -F publickey.pub

DEVNET-154-79: on the keyfile.

DEVNET-154-79: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument

2) 错误2：Failed deleting my ephemeral node

原因可能是因为之前配置错误，比如使用HBase自带的ZooKeeper启动过，后改为使用外围的ZooKeeper再启动。

2014-04-22 16:26:17,452 WARN [regionserver60020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/DEVNET-154-79,60020,1398155173411 already deleted, retry=false

2014-04-22 16:26:17,453 WARN [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/DEVNET-154-79,60020,1398155173411

at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:156)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1273)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1262)

at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1273)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1003)

at java.lang.Thread.run(Thread.java:744)

3) 错误3：Master rejected startup because clock is out of sync

来自RegionServer端的日志，HMaster拒绝RegionServer的连接。这个错误是因为HMaster上的时间和RegionServer上的时间相差超过30秒。两种解决办法：一是同步时间，二是修改hbase-site.xml中的hbase.master.maxclockskew（HMaster端的hdfs-site.xml文件）：。

2014-04-22 16:34:36,701 FATAL [regionserver60020] regionserver.HRegionServer: Master rejected startup because clock is out of sync

org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server DEVNET-154-79,60020,1398155672511 has been rejected; Reported time is too far out of sync with master. Time difference of 175968ms > max allowed of 30000ms

at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:316)

at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:216)

at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1281)

at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)

at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:744)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:408)

at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)

at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)

at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:284)

at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1998)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:839)

at java.lang.Thread.run(Thread.java:744)

将hbase.master.maxclockskew改成可以容忍10分钟：

<name>hbase.master.maxclockskew</name>

<description>Time(ms) difference of regionserver from master</description>

</property>

4) UnknownHostException: mycluster

下面这个错误是因为底层的HDFS变更了hdfs-site.xml中的配置项dfs.nameservices。hbase-site.xml中的配置项hbase.rootdir要跟着同步更新：

2015-12-01 15:33:23,200 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting

java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer

at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2636)

at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64)

at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)

at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2651)

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2634)

... 5 more

Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster

at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)

at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:258)

at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:602)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:547)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:1002)

at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:565)

16. 启动HBase thrift2 server

bin/hbase-daemon.sh start thrift2 --framed --hsha --workers 100

--hsha表示使用HshaServer，--workers表示HshaServer的工作线程数。更多信息请参考：

https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html

默认端口号为9090，相应的http端口为9095。

17. 启动HBase rest server

bin/hbase-daemon.sh start rest -p 8080

简单访问示例（假设在10.143.136.232上启动了HBase rest server）：

1) 查看HBase版本：

http://10.143.136.232:8080/version/cluster

2) 查看集群状态

http://10.143.136.232:8080/status/cluster

3) 列出所有非系统表

http://10.143.136.232:8080/

4) 列出表test的所有regions

http://10.143.136.232:8080/test/regions

5) 取rowkey为100000797550117的整行数据（返回结果需要base64解密）

http://10.143.136.232:8080/test/100000797550117

6) 取rowkey为100000797550117，列族cf1下列field0列的数据（返回结果需要base64解密）

http://10.143.136.232:8080/test/100000797550117/cf1:field0

更多请浏览：

http://hbase.apache.org/book.html#_rest

17.1. Cluster-Wide

Endpoint

HTTP Verb

说明

示例

/version/cluster

GET

查看HBase版本

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/version/cluster"

/status/cluster

GET

查看集群状态

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/status/cluster"

GET

列出所有的非系统表

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/"

注：可浏览器中直接打开，如：http://10.143.136.232:8080/version/cluster。

17.2. Namespace

Endpoint

HTTP Verb

说明

示例

/namespaces

GET

列出所有namespaces

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/namespaces/"

/namespaces/namespace

GET

查看指定namespace的说明

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/namespaces/special_ns"

/namespaces/namespace

POST

创建一个新的namespace

curl -vi -X POST \

-H "Accept: text/xml" \

"example.com:8000/namespaces/special_ns"

/namespaces/namespace/tables

GET

列出指定namespace下的所有表

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/namespaces/special_ns/tables"

/namespaces/namespace

PUT

修改一个已存在的namespace

curl -vi -X PUT \

-H "Accept: text/xml" \

"http://example.com:8000/namespaces/special_ns

/namespaces/namespace

DELETE

删除一个namespace，前提是该namespace已为空

curl -vi -X DELETE \

-H "Accept: text/xml" \

"example.com:8000/namespaces/special_ns"

注：斜体部分是需要输入的。

17.3. Table

Endpoint

HTTP Verb

说明

示例

/table/schema

GET

查看指定表的schema

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/schema"

/table/schema

POST

使用schema创建一个新的表或修改已存在表的schema

curl -vi -X POST \

-H "Accept: text/xml" \

-H "Content-Type: text/xml" \

-d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" /></TableSchema>' \

"http://example.com:8000/users/schema"

/table/schema

PUT

使用schema更新已存在的表

curl -vi -X PUT \

-H "Accept: text/xml" \

-H "Content-Type: text/xml" \

-d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" KEEP_DELETED_CELLS="true" /></TableSchema>' \

"http://example.com:8000/users/schema"

/table/schema

DELETE

删除表

curl -vi -X DELETE \

-H "Accept: text/xml" \

"http://example.com:8000/users/schema"

/table/regions

GET

列出表的所有regions

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/regions

17.4. Get

Endpoint

HTTP Verb

说明

示例

/table/row/column:qualifier/timestamp

GET

取指定表指定列族下指定列的指定时间戳的值，返回的值为经过base64编码的，因此使用时需要做base64解码

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/row1"

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/row1/cf:a/1458586888395"

/table/row/column:qualifier

GET

取指定表的指定列族下指定列的值

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/row1/cf:a"

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/row1/cf:a/"

/table/row/column:qualifier/?v=number_of_versions

GET

取指定表的指定列族下指定列的指定版本值

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/row1/cf:a?v=2"

17.5. Scan

Endpoint

HTTP Verb

说明

示例

/table/scanner/

PUT

创建一个scanner

curl -vi -X PUT \

-H "Accept: text/xml" \

-H "Content-Type: text/xml" \

-d '<Scanner batch="1"/>' \

"http://example.com:8000/users/scanner/"

/table/scanner/

PUT

带Filter创建一个scanner，过滤器可以写在一个文本文件中，格式如：

{

"type": "PrefixFilter",

"value": "u123"

}

</filter>

</Scanner>

curl -vi -X PUT \

-H "Accept: text/xml" \

-H "Content-Type:text/xml" \

-d @filter.txt \

"http://example.com:8000/users/scanner/"

/table/scanner/scanner-id

GET

取下一批数据，如果已无数据，则返回的HTTP代码为204

curl -vi -X GET \

-H "Accept: text/xml" \

"http://example.com:8000/users/scanner/145869072824375522207"

table/scanner/scanner-id

DELETE

删除指定的scanner，释放资源

curl -vi -X DELETE \

-H "Accept: text/xml" \

"http://example.com:8000/users/scanner/145869072824375522207"

17.6. Put

Endpoint

HTTP Verb

说明

示例

/table/row_key

PUT

往指定表写一行数据，注意行键、列族、列名和列值都必须采用base64编码

curl -vi -X PUT \

-H "Accept: text/xml" \

-H "Content-Type: text/xml" \

-d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="cm93NQo="><Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell></Row></CellSet>' \

"http://example.com:8000/users/fakerow"

curl -vi -X PUT \

-H "Accept: text/json" \

-H "Content-Type: text/json" \

-d '{"Row":[{"key":"cm93NQo=", "Cell": [{"column":"Y2Y6ZQo=", "$":"dmFsdWU1Cg=="}]}]}'' \

"example.com:8000/users/fakerow"

18. 相关文档

《HBase-1.2.1分布式安装指南》

《Hive 0.12.0安装指南》

《ZooKeeper-3.4.6分布式安装指南》

《Hadoop 2.3.0源码反向工程》

《在Linux上编译Hadoop-2.7.2》

《Accumulo-1.5.1安装指南》

《Drill 1.0.0安装指南》

《Shark 0.9.1安装指南》

更多，敬请关注技术博客：http://aquester.cublog.cn。

附1：元数据

hbase在zookeeper上的目录结构：

[zk: localhost:2181(CONNECTED) 24] ls /hbase

[replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, region-in-transition, online-snapshot, acl, master, running, recovering-regions, draining, namespace, hbaseid, table]

从0.96版本开始root-region-server被meta-region-server替代，原来的root被删除了，新的meta像原来的root一样，只有一个Region，不再会有多个Region。

从0.96版本开始引入了namespace，删除了-ROOT-表，之前的.META.表被hbase:meta表替代，其中hbase为namespace名。namespace可以认为类似于MySQL中的DB名，用于对表进行逻辑分组管理。

客户端对hbase提供DML操作不需要访问master，但DDL操作依赖master，在hbase shell中的list也依赖于master。

在主hbase master的web上，可以看到有三个系统表：hbase:acl，hbase:meta和hbase:namespace，注意hbase:acl和hbase:namespace的元数据也存储在hbase:meta中，这可以通过在hbase shell中执行scan 'hbase:meta'观察到。

hbase(main):015:0* scan 'hbase:meta',{LIMIT=>10}

hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:regioninfo, timestamp=1460426830411, value={ENCODED => 0bbdf170c309223c0ce830facdff9edd, NAME => 'hbase:acl,,1460426731436.0bbdf

facdff9edd. 170c309223c0ce830facdff9edd.', STARTKEY => '', ENDKEY => ''}

hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:seqnumDuringOpen, timestamp=1461653766642, value=\x00\x00\x00\x00\x00\x00\x002

facdff9edd.

hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:server, timestamp=1461653766642, value=hadoop-034:16020

facdff9edd.

hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:serverstartcode, timestamp=1461653766642, value=1461653610096

第一列，即红色串为Region name；serverstartcode为Regsion server加载region的时间；server为Region server的IP和端口；regioninfo结构为：

1) ENCODED 为Region name的MD5值

2) NAME 为Region name

3) STARTKEY 为空表示为第一个Region

4) ENDKEY 如果也为空，则表示该表只有一个Region

附2：phoenix的安装

Phoenix的安装非常简单。官网有说明（http://phoenix.incubator.apache.org/download.html），二进制安装包可从http://www.apache.org/dyn/closer.cgi/incubator/phoenix/上下载，本文下载的是phoenix-4.7.0-incubating.tar.gz，注意和HBase的兼容关系：

Phoenix版本

HBase版本

Phoenix 2.x

HBase 0.94.x

Phoenix 3.x

HBase 0.94.x

Phoenix 4.x

HBase 0.98.1+

安装步骤为：

1) 将phoenix-4.7.0-incubating.tar.gz上传到Phoenix客户端机器，假设安装到/data/hadoop

2) 解压phoenix-4.7.0-incubating.tar.gz，解压后生成phoenix-4.7.0-incubating目录

3) 建立软链接：ln -s phoenix-4.7.0-incubating phoenix

4) 将/data/hadoop/phoenix/hadoop-2/phoenix-4.7.0-incubating-client.jar添加到CLASSPATH

5) 将phoenix/common目录下的phoenix-core-4.7.0-incubating.jar复制到所有HBase region server的的CLASSPATH中，比如HBase的lib目录

6) 重启HBase集群

运行phoenix也非常简单，命令格式为：

sqlline.py zookeeper file.sql

示例：

hadoop@VM-40-171-sles10-64:~/phoenix/bin> ./sqlline.py 10.12.154.78

Setting property: [isolation, TRANSACTION_READ_COMMITTED]

issuing: !connect jdbc:phoenix:10.12.154.78 none none org.apache.phoenix.jdbc.PhoenixDriver

Connecting to jdbc:phoenix:10.12.154.78

Connected to: Phoenix (version 4.0)

Driver: org.apache.phoenix.jdbc.PhoenixDriver (version 4.0)

Autocommit status: true

Transaction isolation: TRANSACTION_READ_COMMITTED

Building list of tables and columns for tab-completion (set fastconnect to true to skip)...

53/53 (100%) Done

Done

sqlline version 1.1.2

0: jdbc:phoenix:10.12.154.78> select * from test;

Error: ERROR 1012 (42M03): Table undefined. tableName=TEST (state=42M03,code=1012)

0: jdbc:phoenix:10.12.154.78> create table test ( a int, b string);

Error: ERROR 601 (42P00): Syntax error. Unsupported sql type: INT (state=42P00,code=601)

0: jdbc:phoenix:10.12.154.78> create table test (a integer, b integer);

Error: ERROR 509 (42888): The table does not have a primary key. tableName=TEST (state=42888,code=509)

0: jdbc:phoenix:10.12.154.78> create table test (a integer primary key, b integer) ;

No rows affected (1.424 seconds)

0: jdbc:phoenix:10.12.154.78> UPSERT INTO TEST VALUES (1, 1);

1 row affected (0.099 seconds)

0: jdbc:phoenix:10.12.154.78> UPSERT INTO TEST VALUES (2, 12);

1 row affected (0.02 seconds)

0: jdbc:phoenix:10.12.154.78> select * from test;

+------------+------------+

| A | B |

+------------+------------+

| 1 | 1 |

| 2 | 12 |

+------------+------------+

2 rows selected (0.116 seconds)

0: jdbc:phoenix:10.12.154.78>

有关语法请浏览：http://phoenix.incubator.apache.org/language/index.html，有关数据类型请浏览：http://phoenix.incubator.apache.org/language/datatypes.html。

0 0