打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
hadoop

hadoop-2.2.0配合hive-0.12.0使用orc存储引发的bug

分类: hadoop hive 103人阅读 评论(1) 收藏 举报
环境:
hadoop版本:hadoop-2.2.0 (官网下载并编译为64位版本)
hive版本:hive-0.12.0(官网下载后解压)
集群状态良好,尝试普通hive以及mapreduce均成功。

测试新版hive的orc存储格式,步骤如下:

create external table text_test (id string,text string)  row format delimited fields terminated by '\t' STORED AS textfile LOCATION '/user/hive/warehouse/text_test';

create external table orc_test (id string,text string) row format delimited fields terminated by '\t' STORED AS orc LOCATION '/user/hive/warehouse/orc_test';

hive> desc text_test;
OK
id                      string                  None                
text                    string                  None    

hive> desc orc_test;
OK
id                      string                  from deserializer   
text                    string                  from deserializer 

hive> select * from text_test;
OK
1       myew
2       ccsd
3       33

hive> insert overwrite table orc_test select * from text_test;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1394433490694_0016, Tracking URL = http://zw-34-69:8088/proxy/application_1394433490694_0016/
Kill Command = /opt/hadoop/hadoop/bin/hadoop job  -kill job_1394433490694_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-03-13 17:00:49,899 Stage-1 map = 0%,  reduce = 0%
2014-03-13 17:01:10,097 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1394433490694_0016 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1394433490694_0016_m_000000 (and more) from job job_1394433490694_0016

Task with the most failures(4):
-----
Task ID:
  task_1394433490694_0016_m_000000

URL:
  http://zw-34-69:8088/taskdetails.jsp?jobid=job_1394433490694_0016&tipid=task_1394433490694_0016_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses.
        at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.getSerializedSize(OrcProto.java:3046)
        at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
        at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry.getSerializedSize(OrcProto.java:4129)
        at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
        at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndex.getSerializedSize(OrcProto.java:4641)
        at com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:75)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:548)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1328)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
        ... 8 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec


随后开始漫长的google、baidu、bing之旅,终于找到了解决办法:http://web.archiveorange.com/archive/v/S2z2uV6yqpmtC3rgpsrs
感谢两位前辈辛勤的研究。

总结一下问题原因:
编译hadoop-2.2.0时用的protobuf-2.5.0版本,而编译hive-0.12.0时用的protobuf-2.4.1版本,从而造成了冲突。
解决办法:
重新使用protobuf-2.5.0来编译hive-0.12.0

1. 安装protobuf:下载:https://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz
                               解压:tar -xzvf protobuf-2.5.0.tar.gz
                               进入:cd protobuf-2.5.0
                               编译安装:
  1. ./configure  
  2. make   
  3. make check   
  4. make install  (root权限)
3. 安装ant:下载地址 http://ant.apache.org/bindownload.cgi  我下载的1.9.2版本 apache-ant-1.9.2-bin.tar.gz。
                 (1.9.3版本编译会报错 http://www.mailinglistarchive.com/html/dev@ant.apache.org/2014-01/msg00009.html
                   解压: tar -xzvf apache-ant-1.9.2-bin.tar.gz
                   配置Path:vi ~/.bash_profile
                                    export ANT_HOME=/opt/hadoop/apache-ant-1.9.2
                                    PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ANT_HOME/bin:$PATH
                                    export PATH
                                    保存退出后执行: . ~/.bash_profile  使配置生效。
4. 更改ant编译时使用的protobuf版本:
                   修改release-0.12.0/ivy/libraries.properties文件,将protobuf.version=2.4.1修改为protobuf.version=2.5.0
5. 在hive目录中编译protobuf:cd release-0.12.0
                                                       ant protobuf
6. 编译hive:
                   在release-0.12.0目录下执行:ant clean package
                   漫长的等待......(要联网)
7. 编译好的内容在release-0.12.0/build/dist/中

回头执行:insert overwrite table orc_test select * from text_test;成功。

hive --orcfiledump <hdfs-location-of-orc-file>

hive> select * from orc_test;
OK
1       myew
2       ccsd
3       33

hive --orcfiledump /user/hive/warehouse/orc_test/000000_0

Rows: 3
Compression: ZLIB
Compression size: 262144
Type: struct<_col0:string,_col1:string>

Statistics:
  Column 0: count: 3
  Column 1: count: 3 min: 1 max: 3
  Column 2: count: 3 min: 33 max: myew

Stripes:
  Stripe: offset: 3 data: 31 rows: 3 tail: 50 index: 59
    Stream: column 0 section ROW_INDEX start: 3 length 9
    Stream: column 1 section ROW_INDEX start: 12 length 23
    Stream: column 2 section ROW_INDEX start: 35 length 27
    Stream: column 1 section DATA start: 62 length 6
    Stream: column 1 section LENGTH start: 68 length 5
    Stream: column 2 section DATA start: 73 length 13
    Stream: column 2 section LENGTH start: 86 length 7
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2
    Encoding column 2: DIRECT_V2

更多0
主题推荐
存储 hadoop bug hive application
博文推荐
利用sqoop从mysql向多分区hiv...
haoop执行reduce后合并结果文件
hadoop HDFS原理基础知识
剖析淘宝TDDL(TAOBAO DIST...
Oracle VM VirtualBox...
iOS提交后申请加急审核
android之viewFlipper简...
驱动程序与应用程序之间的通信
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
【hive报错】org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSer
HIVE的使用
大数据学习路线分享UDF函数
Ubuntu 编译安装 hadoop 2.2.0
mahout及java编写mapreduce
Hive自定义函数
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服