Mongodb分片时,两台服务器时间不同步caught exception while doing balance: error checking clock skew of cluster

来源:互联网 发布:java offsetdatetime 编辑:程序博客网 时间:2024/06/12 01:26

Tue Nov 29 09:16:11 [Balancer] SyncClusterConnection connecting to [192.168.150.116:27012]
Tue Nov 29 09:16:11 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27015]
Tue Nov 29 09:16:11 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27016]
Tue Nov 29 09:16:11 [Balancer] ~ScopedDbConnection: _conn != null
Tue Nov 29 09:16:11 [Balancer] caught exception while doing balance: error checking clock skew of cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 :: caused by :: 13650 clock skew of the cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 is too far out of bounds to allow distributed locking.

192.168.150.116的mongos日志
Tue Nov 29 09:43:33 [Balancer] SyncClusterConnection connecting to [192.168.150.116:27012]
Tue Nov 29 09:43:33 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27015]
Tue Nov 29 09:43:33 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27016]
Tue Nov 29 09:43:33 [Balancer] ~ScopedDbConnection: _conn != null
Tue Nov 29 09:43:33 [Balancer] caught exception while doing balance: error checking clock skew of cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 :: caused by :: 13650 clock skew of the cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 is too far out of bounds to allow distributed locking.
Tue Nov 29 09:44:03 [Balancer] SyncClusterConnection connecting to [192.168.150.116:27012]
Tue Nov 29 09:44:03 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27015]
Tue Nov 29 09:44:03 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27016]
Tue Nov 29 09:44:03 [LockPinger] creating distributed lock ping thread for 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 and process WebServer:27013:1322457524:1804289383 (sleeping for 30000ms)
Tue Nov 29 09:44:03 [LockPinger] SyncClusterConnection connecting to [192.168.150.116:27012]
Tue Nov 29 09:44:03 [LockPinger] SyncClusterConnection connecting to [192.168.150.100:27015]
Tue Nov 29 09:44:03 [LockPinger] SyncClusterConnection connecting to [192.168.150.100:27016]
Tue Nov 29 09:44:04 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed438e30be277fd8006c95a
Tue Nov 29 09:44:04 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 09:44:14 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed438ee0be277fd8006c95b
Tue Nov 29 09:44:14 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 09:44:25 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed438f80be277fd8006c95c
Tue Nov 29 09:44:25 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 09:44:35 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed439030be277fd8006c95d
Tue Nov 29 09:44:35 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 09:44:45 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed4390d0be277fd8006c95e
Tue Nov 29 09:44:45 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 09:44:56 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed439170be277fd8006c95f

192.168.150.100的mongos日志
Tue Nov 29 09:44:11 [Balancer] SyncClusterConnection connecting to [192.168.150.116:27012]
Tue Nov 29 09:44:11 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27015]
Tue Nov 29 09:44:11 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27016]
Tue Nov 29 09:44:11 [Balancer] ~ScopedDbConnection: _conn != null
Tue Nov 29 09:44:11 [Balancer] caught exception while doing balance: error checking clock skew of cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 :: caused by :: 13650 clock skew of the cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 is too far out of bounds to allow distributed locking.
Tue Nov 29 09:44:18 [Balancer] SyncClusterConnection connecting to [192.168.150.116:27012]
Tue Nov 29 09:44:18 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27015]
Tue Nov 29 09:44:18 [Balancer] SyncClusterConnection connecting to [192.168.150.100:27016]
Tue Nov 29 09:44:18 [LockPinger] creating distributed lock ping thread for 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 and process localhost.localdomain:27010:1322457595:1804289383 (sleeping for 30000ms)
Tue Nov 29 09:44:18 [LockPinger] SyncClusterConnection connecting to [192.168.150.116:27012]
Tue Nov 29 09:44:18 [LockPinger] SyncClusterConnection connecting to [192.168.150.100:27015]
Tue Nov 29 09:44:18 [LockPinger] SyncClusterConnection connecting to [192.168.150.100:27016]
Tue Nov 29 09:44:18 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed438f215a9197441ec2236
Tue Nov 29 09:44:18 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 09:44:29 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed438fc15a9197441ec2237
Tue Nov 29 09:44:29 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 09:44:39 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed4390715a9197441ec2238
Tue Nov 29 09:44:39 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.


同步时间之前
116的时间比100的时间慢了38秒

同步两台服务器的时间,100同步116的时间
在116用root编辑/etc/ntp.conf,加入下面这段
## add for rac
server 127.127.1.0
fudge  127.127.1.0 stratum 11
driftfile /var/lib/ntp/drift
broadcastdelay 0.008
然后在100用root编辑/etc/ntp.conf,加入下面这段
## add for rac
server 192.168.150.116 prefer
driftfile /var/lib/ntp/drift
broadcastdelay 0.008
然后在两台服务器上执行下面的命令使NTP服务启动
/etc/init.d/ntpd start

同步时间之后
116的时间比100的时间慢了15秒,从日志上看,已经不报异常了。
同步时间稳定后,100的时间比116的时间慢了2秒
现在就是不明白,在做分片时,两台服务器的时间相差多少才不报异常,这个阀值是多少,没有那个地方看到相关的文档说明。
看来最好是将两台服务器的时间同步一致,就不会出问题。

最后查看日志
192.168.150.116的mongos日志
Tue Nov 29 10:08:59 [LockPinger] cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 pinged successfully at Tue Nov 29 10:08:59 2011 by distributed lock pinger '192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016/WebServer:27013:1322457524:1804289383', sleeping for 30000ms
Tue Nov 29 10:09:09 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed43ec50be277fd8006c9ea
Tue Nov 29 10:09:09 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 10:09:19 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed43ecf0be277fd8006c9eb
Tue Nov 29 10:09:19 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 10:09:29 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed43ed90be277fd8006c9ec
Tue Nov 29 10:09:30 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 10:09:40 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed43ee40be277fd8006c9ed
Tue Nov 29 10:09:40 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 10:09:50 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed43eee0be277fd8006c9ee
Tue Nov 29 10:09:50 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 10:10:00 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed43ef80be277fd8006c9ef
Tue Nov 29 10:10:00 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.
Tue Nov 29 10:10:11 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' acquired, ts : 4ed43f020be277fd8006c9f0
Tue Nov 29 10:10:11 [Balancer] distributed lock 'balancer/WebServer:27013:1322457524:1804289383' unlocked.

192.168.150.100的mongos日志
Tue Nov 29 10:08:57 [LockPinger] cluster 192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016 pinged successfully at Tue Nov 29 10:08:57 2011 by distributed lock pinger '192.168.150.116:27012,192.168.150.100:27015,192.168.150.100:27016/localhost.localdomain:27010:1322457595:1804289383', sleeping for 30000ms
Tue Nov 29 10:09:07 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43ec315a9197441ec22c6
Tue Nov 29 10:09:07 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 10:09:18 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43ecd15a9197441ec22c7
Tue Nov 29 10:09:18 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 10:09:28 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43ed815a9197441ec22c8
Tue Nov 29 10:09:28 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 10:09:38 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43ee215a9197441ec22c9
Tue Nov 29 10:09:38 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 10:09:49 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43eec15a9197441ec22ca
Tue Nov 29 10:09:49 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 10:09:59 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43ef715a9197441ec22cb
Tue Nov 29 10:09:59 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 10:10:09 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43f0115a9197441ec22cc
Tue Nov 29 10:10:09 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.
Tue Nov 29 10:10:19 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' acquired, ts : 4ed43f0b15a9197441ec22cd
Tue Nov 29 10:10:20 [Balancer] distributed lock 'balancer/localhost.localdomain:27010:1322457595:1804289383' unlocked.

原创粉丝点击