【Apache Nutch系列】Nutch2.0配置安装异常集锦
来源:互联网 发布:跳跃网络账号重设 编辑:程序博客网 时间:2024/06/11 20:58
1、java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:108) at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221) at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) at org.apache.nutch.crawl.Crawler.run(Crawler.java:136) at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 11 more
官方文档说明如下:
N.B. It's possible to encounter the following exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration; this is caused by the fact that sometimes the hbase TEST jar is deployed in the lib dir. To resolve this just copy the lib over from your installed HBase dir into the build lib dir. (This issue is currently in progress).解决方法:
我们把$HBASE_HOME/lib下的所有包,拷贝到$NUTCH_HOME/runtime/local/lib目录下。运行即可
2、java.lang.NoSuchMethodError:org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V
HBASE官方JIRA BUG编号:HBASE-8273
这个是HBASE-5357引入的问题,原因是HBASE-5357将HColumnDescriptor.setMaxVersions 返回值修改成返回HColumnDescriptor,而不是返回void,所以改变了HColumnDescriptor setMaxVersions 方法的签名。所以它只会得到与Integer.intValue编译仍然不会找到setMaxVersions(INT)
Cloudera 官网说明
Column family manipulations are binary-incompatible between CDH4.2 and CDH4.0/CDH4.1Because of HBASE-5357, code compiled against CDH4.0 and CDH4.1 will fail with java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V, if used with the CDH4.2 libraries. The reason is that the setter methods in HColumnDescriptor were modified to return HColumnDescriptor instead of void, which changes their signature. Code that only does data manipulations, using the HTable class, will still work without recompilation.Bug: HBASE-8273Severity: MediumAnticipated Resolution: None planned; use workaround.Workaround: Code compiled against CDH4.0 and 4.1 that uses HColumnDescriptor must be recompiled against CDH4.2 in order to work with the CDH4.2 libraries. Code compiled against CDH4.0 and CDH4.1 running with those libraries does not have this problem.
原因:这边我使用的hadoop和hbase启动是没有问题的,也就是说是gora-hbase插件的问题
解决方法:
将gora-hbase插件中涉及使用到HColumnDescriptor的代码重新编译可解决。
具体要编译那些类后续会列出
3、java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore
hadoop@nutch1:/data/projects/apache-nutch-2.2.1/runtime/local$ bin/nutch crawl urls/seed.txt -dir crawl -depth 3 -topN 5Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:188) at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:89) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:73) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221) at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) at org.apache.nutch.crawl.Crawler.run(Crawler.java:136) at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)解决方法:
方法1:下载gora-0.3,然后对该目录下的gora-hbase进行编译生成gora-hbase.jar,然后将jar包放到$NUTCH/runtime/local/lib目录下
方法2:修改$NUTCH_HOME/ivy/ivy.xml
将<dependency org="org.apache.gora" name="gora-hbase" rev="0.3" conf="*->default" />去掉注释。然后再重新编译一次。这样ivy会为你生成gora-hbase的插件
4、java.lang.NullPointerException
java.lang.NullPointerExceptionat org.apache.avro.util.Utf8.<init>(Utf8.java:37)at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
查看GeneratorReducer第100行代码如下:
batchId = newUtf8(conf.get(GeneratorJob.BATCH_ID));
可以看到是获取GeneratorJob.BATCH_ID。也就是generate.batch.id这个值的时候报空了!
解决方法:添加以下三行
// generate batchId int randomSeed = Math.abs(new Random().nextInt()); String batchId = (curTime / 1000) + "-" + randomSeed; getConf().set(BATCH_ID, batchId);
- 【Apache Nutch系列】Nutch2.0配置安装异常集锦
- nutch系列2——nutch2.x的编译、安装和配置
- 【Apache Nutch系列】Nutch2.2+hadoop+hbase+zookeeper环境部署
- nutch2.3.1 nutch-site.xml配置
- [Nutch]Apache Solr的安装和配置
- Centos7安装配置Apache Nutch 1.12
- 【Nutch2.2.1基础教程之1】nutch相关异常
- 【Nutch2.2.1基础教程之1】nutch相关异常
- nutch 安装部署 以nutch2.3.1 为例
- nutch2.0 配置mysql数据库
- Nutch2.0 之 Apache Gora 介绍
- Nutch2.0 之 Apache Gora 介绍
- nutch安装及配置
- 【Apache Hadoop系列】hadoop伪分布式配置问题集锦
- Apache Nutch 1.7 单机安装
- apache-nutch-1.10 安装笔记
- Apache Nutch 1.7 + Solr 4.4.0安装笔记
- 【Nutch2.3基础教程】集成Nutch/Hadoop/Hbase/Solr构建搜索引擎:安装及运行【集群环境】
- DBSCAN 聚类
- 互联网世界9种基本的商业模式
- RPM 和 yum 机制
- bond
- 每周技术讲座
- 【Apache Nutch系列】Nutch2.0配置安装异常集锦
- struts2源码之仿写struts2
- firefox uploadify http 302 error
- 智能卡 接触式 7816-3
- ios程序发布后,收集Crash崩溃信息
- 判断 iPhone 是否已插入 SIM 卡的方法
- javascript Date format(js日期格式化)
- LeetCode_Jump Game II
- Shell split character line by line