cassandra节点异常数据处理——HintedHandOff
来源:互联网 发布:淘宝天猫lee鼎汉店真假 编辑:程序博客网 时间:2024/06/11 11:02
cassandra的HintedHandOff的功能主要是当节点掉线,或者各个副本节点中网络闪断的时候,此时数据的保存,等节点间恢复正常以后,副本数据就可以一致了。
1、Hints表的表结构
当副本节点传输数据失败以后,数据就会被保存在hints系统表中。
hints系统表的表结构为,compositeCF,rowkey为target_id,即节点的token值的uuid;
compositeName分别为hint_id 由产生hint数据的时间生成的uuid;和message_version为产生hint数据的时候数据的版本号;
普通字段有:mutation主要存在需要传输的数据RowMutation。
这里每一条hint数据就相当于一个Column
*
* CREATE TABLE hints (
* target_id uuid,
* hint_id timeuuid,
* message_version int,
* mutation blob,
* PRIMARY KEY (target_id, hint_id, message_version)
* ) WITH COMPACT STORAGE;
当数据传输失败以后,就会直接调用RowMutation中的hintFor方法,将数据转化为需要插入hints表中的数据格式。
/** * Returns mutation representing a Hints to be sent to <code>address</code> * as soon as it becomes available. See HintedHandoffManager for more details. */ public static RowMutation hintFor(RowMutation mutation, UUID targetId) throws IOException { RowMutation rm = new RowMutation(Table.SYSTEM_KS, UUIDType.instance.decompose(targetId)); UUID hintId = UUIDGen.getTimeUUID(); // determine the TTL for the RowMutation // this is set at the smallest GCGraceSeconds for any of the CFs in the RM // this ensures that deletes aren't "undone" by delivery of an old hint int ttl = Integer.MAX_VALUE; for (ColumnFamily cf : mutation.getColumnFamilies()) ttl = Math.min(ttl, cf.metadata().getGcGraceSeconds()); // serialize the hint with id and version as a composite column name QueryPath path = new QueryPath(SystemTable.HINTS_CF, null, HintedHandOffManager.comparator.decompose(hintId, MessagingService.current_version)); rm.add(path, ByteBuffer.wrap(FBUtilities.serialize(mutation, serializer, MessagingService.current_version)), System.currentTimeMillis(), ttl); return rm; }
2、Hint数据的产生
cassandra这种nosql型数据库,为了保证数据的安全性,需要将一份数据存储在几个副本中,防止一个节点掉线或者节点数据有损坏的时候,数据还可以通过其他的节点访问和恢复,这样又不会影响当前的业务,最后也可以恢复数据。
副本间数据进行传输的时候,当发送失败,以后判断是否记录hint数据的条件有两个:
(1)系统是否开启hintedHandOff功能,通过cassandra.yaml文件中的hinted_handoff_enabled配置项决定,默认开启
(2)Gossiper节点监控工具判断节点掉线时间是否超过了记录hint的最大时间,记录hint的最大时间为cassandra.yaml文件中的max_hint_window_in_ms决定,文件中的默认值是3hour,但是如果没有该配置项则是1hour
public static boolean shouldHint(InetAddress ep) { if (!DatabaseDescriptor.hintedHandoffEnabled()) { HintedHandOffManager.instance.metrics.incrPastWindow(ep); return false; } boolean hintWindowExpired = Gossiper.instance.getEndpointDowntime(ep) > DatabaseDescriptor.getMaxHintWindow(); if (hintWindowExpired) { HintedHandOffManager.instance.metrics.incrPastWindow(ep); logger.trace("not hinting {} which has been down {}ms", ep, Gossiper.instance.getEndpointDowntime(ep)); } return !hintWindowExpired; }
当决定写hint数据的时候,就会将需要写的hint数据提交给HintRunnable线程,完成记录hint数据的责任。此时系统会记录正在运行的hint任务个数,以及异常节点产生的hint任务。
public static Future<Void> submitHint(final RowMutation mutation, final InetAddress target, final AbstractWriteResponseHandler responseHandler, final ConsistencyLevel consistencyLevel) { // local write that time out should be handled by LocalMutationRunnable assert !target.equals(FBUtilities.getBroadcastAddress()) : target; HintRunnable runnable = new HintRunnable(target) { public void runMayThrow() throws IOException { logger.debug("Adding hint for {}", target); writeHintForMutation(mutation, target); // Notify the handler only for CL == ANY if (responseHandler != null && consistencyLevel == ConsistencyLevel.ANY) responseHandler.response(null); } }; return submitHint(runnable); } private static Future<Void> submitHint(HintRunnable runnable) { totalHintsInProgress.incrementAndGet(); hintsInProgress.get(runnable.target).incrementAndGet(); return (Future<Void>) StageManager.getStage(Stage.MUTATION).submit(runnable); }
系统记录的正在运行的hint任务个数,以及异常节点产生的hint任务是为了防止正常节点因为处理hint数据过多导致系统内存溢出,也需要保护在线的节点,所以如果正在运行的hint任务不能超过1024 * FBUtilities.getAvailableProcessors();
// avoid OOMing due to excess hints. we need to do this check even for "live" nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead. // The idea is that if we have over maxHintsInProgress hints in flight, this is probably due to // a small number of nodes causing problems, so we should avoid shutting down writes completely to // healthy nodes. Any node with no hintsInProgress is considered healthy. if (totalHintsInProgress.get() > maxHintsInProgress && (hintsInProgress.get(destination).get() > 0 && shouldHint(destination))) { throw new OverloadedException("Too many in flight hints: " + totalHintsInProgress.get()); }
3、Hint的发送机制
hint发送触发的原因有
(1)当节点启动加入集群的时候,就会启动一个后台进程,每个10min就会进行一次hint数据的处理。
(2)节点状态发送改变的时候,变为live状态以后,也会触发发送该节点hint数据的请求。
首先查询出hints表里面的数据,(隐患:这里会查询出所有的数据,如果hint表中数据过多,则会OutMemery),将每个节点的数据顺序处理。
然后只需要用其rowkey换成为节点ip,调用HintedHandoff线程池中的分发线程进行数据分发,HintedHandoff线程池中的线程个数由cassandra.yaml文件中的max_hints_delivery_threads: 2决定。
Runnable runnable = new Runnable() { public void run() { scheduleAllDeliveries(); metrics.log(); } }; StorageService.optionalTasks.scheduleWithFixedDelay(runnable, 10, 10, TimeUnit.MINUTES);单个节点的数据发送,如果判断hintedHandOff暂停了以后,则也不会进行发送,这个暂停可以由nodetool中的PAUSEHANDOFF命令暂停,由RESUMEHANDOFF 命令恢复。
在得知节点的schema与本节点的schema版本一致,并且是在线节点以后则开始hint数据的发送
(1)查询每128个条记录作为一批需要发送的hint数据,
(2)将每条数据的Mutation列,发序列化为对应的Rowmutation,构造成消息,
(3)发送成功以后,则写入一个deleteColumn,删除对应的记录
(4)等制定节点的所有数据发送完成以后,就会强制flush hints系统表,将hint的所有SStable进行一次全量SSTable的compact。
private void deliverHintsToEndpointInternal(InetAddress endpoint) throws IOException, DigestMismatchException, InvalidRequestException, InterruptedException { ColumnFamilyStore hintStore = Table.open(Table.SYSTEM_KS).getColumnFamilyStore(SystemTable.HINTS_CF); if (hintStore.isEmpty()) return; // nothing to do, don't confuse users by logging a no-op handoff // check if hints delivery has been paused if (hintedHandOffPaused) { logger.debug("Hints delivery process is paused, aborting"); return; } logger.debug("Checking remote({}) schema before delivering hints", endpoint); try { waitForSchemaAgreement(endpoint); } catch (TimeoutException e) { return; } if (!FailureDetector.instance.isAlive(endpoint)) { logger.debug("Endpoint {} died before hint delivery, aborting", endpoint); return; } // 1. Get the key of the endpoint we need to handoff // 2. For each column, deserialize the mutation and send it to the endpoint // 3. Delete the subcolumn if the write was successful // 4. Force a flush // 5. Do major compaction to clean up all deletes etc. // find the hints for the node using its token. UUID hostId = Gossiper.instance.getHostId(endpoint); logger.info("Started hinted handoff for host: {} with IP: {}", hostId, endpoint); final ByteBuffer hostIdBytes = ByteBuffer.wrap(UUIDGen.decompose(hostId)); DecoratedKey epkey = StorageService.getPartitioner().decorateKey(hostIdBytes); final AtomicInteger rowsReplayed = new AtomicInteger(0); ByteBuffer startColumn = ByteBufferUtil.EMPTY_BYTE_BUFFER; int pageSize = PAGE_SIZE; // read less columns (mutations) per page if they are very large if (hintStore.getMeanColumns() > 0) { int averageColumnSize = (int) (hintStore.getMeanRowSize() / hintStore.getMeanColumns()); pageSize = Math.min(PAGE_SIZE, DatabaseDescriptor.getInMemoryCompactionLimit() / averageColumnSize); pageSize = Math.max(2, pageSize); // page size of 1 does not allow actual paging b/c of >= behavior on startColumn logger.debug("average hinted-row column size is {}; using pageSize of {}", averageColumnSize, pageSize); } // rate limit is in bytes per second. Uses Double.MAX_VALUE if disabled (set to 0 in cassandra.yaml). int throttleInKB = DatabaseDescriptor.getHintedHandoffThrottleInKB(); RateLimiter rateLimiter = RateLimiter.create(throttleInKB == 0 ? Double.MAX_VALUE : throttleInKB * 1024); while (true) { // check if hints delivery has been paused during the process if (hintedHandOffPaused) { logger.debug("Hints delivery process is paused, aborting"); break; } QueryFilter filter = QueryFilter.getSliceFilter(epkey, new QueryPath(SystemTable.HINTS_CF), startColumn, ByteBufferUtil.EMPTY_BYTE_BUFFER, false, pageSize); ColumnFamily hintsPage = ColumnFamilyStore.removeDeleted(hintStore.getColumnFamily(filter), (int)(System.currentTimeMillis() / 1000)); if (pagingFinished(hintsPage, startColumn)) { if (ByteBufferUtil.EMPTY_BYTE_BUFFER.equals(startColumn)) { // we've started from the beginning and could not find anything (only maybe tombstones) break; } else { // restart query from the first column until we read an empty row; // that will tell us everything was delivered successfully with no timeouts startColumn = ByteBufferUtil.EMPTY_BYTE_BUFFER; continue; } } for (final IColumn hint : hintsPage.getSortedColumns()) { // Skip tombstones: // if we iterate quickly enough, it's possible that we could request a new page in the same millisecond // in which the local deletion timestamp was generated on the last column in the old page, in which // case the hint will have no columns (since it's deleted) but will still be included in the resultset // since (even with gcgs=0) it's still a "relevant" tombstone. if (!hint.isLive()) continue; if (hintedHandOffPaused) { logger.debug("Hints delivery process is paused, aborting"); break; } startColumn = hint.name(); ByteBuffer[] components = comparator.split(hint.name()); int version = Int32Type.instance.compose(components[1]); DataInputStream in = new DataInputStream(ByteBufferUtil.inputStream(hint.value())); RowMutation rm; try { rm = RowMutation.serializer.deserialize(in, version); } catch (UnknownColumnFamilyException e) { logger.debug("Skipping delivery of hint for deleted columnfamily", e); deleteHint(hostIdBytes, hint.name(), hint.maxTimestamp()); continue; } MessageOut<RowMutation> message = rm.createMessage(); rateLimiter.acquire(message.serializedSize(MessagingService.current_version)); WrappedRunnable callback = new WrappedRunnable() { public void runMayThrow() throws IOException { rowsReplayed.incrementAndGet(); deleteHint(hostIdBytes, hint.name(), hint.maxTimestamp()); } }; IAsyncCallback responseHandler = new WriteResponseHandler(endpoint, WriteType.UNLOGGED_BATCH, callback); MessagingService.instance().sendRR(message, endpoint, responseHandler); } // check if node is still alive and we should continue delivery process if (!FailureDetector.instance.isAlive(endpoint)) { logger.debug("Endpoint {} died during hint delivery, aborting", endpoint); return; } } try { compact().get(); } catch (Exception e) { throw new RuntimeException(e); } logger.info(String.format("Finished hinted handoff of %s rows to endpoint %s", rowsReplayed, endpoint)); if (hintedHandOffPaused) { logger.info("Hints delivery process is paused, not delivering further hints"); } }
4、Hint数据的清理
hint数据发送完成以后,会产生对应的DeleteColumn,然后进行Flush,和Compact。
- cassandra节点异常数据处理——HintedHandOff
- Cassandra HintedHandoff
- Cassandra学习笔记 --- 关于Cassandra的节点通讯机制——Gossip
- 关于Cassandra的节点通讯机制——Gossip
- cassandra节点异常:java.lang.OutOfMemoryError: Direct buffer memory
- cassandra异常
- Cassandra节点管理
- cassandra虚拟节点
- Cassandra单节点安装
- 异常数据处理
- Cassandra配置多节点集群
- Cassandra 1.2.0 win7下安装双节点cassandra集群
- 动力节点——异常(二十)
- Cassandra-1.2.0多节点集群搭建
- Cassandra 1.2.0 集群 删除节点
- cassandra 1.2节点数据负载均衡
- Cassandra集群各个节点之间的通讯
- windows 下Cassandra 单节点安装配置
- ACM之遗失的袜子
- 结构体单步调试和撞错体验
- 2013ACM多校联合(1)_CSUST
- linux动态链接库
- ACM之车位选择
- cassandra节点异常数据处理——HintedHandOff
- 框架和工具类
- -bash: scp: command not found问题解决
- Delphi下程序内存泄露报告
- 怎样在Win7系统建立并开启Wifi热点
- 题目105:单词替换
- drupal 7 menu_alter
- poj 3356 AGTC(lcs 变形题)
- Delphi 自定义窗体类名