Hive表的分区与分桶

打开APP

未登录

开通VIP，畅享免费电子书等14项超值服

开通VIP

首页

好书

留言交流

下载APP

联系客服

Hive表的分区与分桶

userphoto

苏氏IT馆 >《hadoop》

2017.11.08

关注

1.Hive分区表

Hive使用select语句进行查询的时候一般会扫描整个表内容，会消耗很多时间做没必要的工作。Hive可以在创建表的时候指定分区空间，这样在做查询的时候就可以很好的提高查询的效率。

创建分区表的语法：

[java] view plain copy

create table tablename(
name string
)partitioned by(key,type...);

示例

[java] view plain copy

drop table if exists employees;
create table if not exists employees(
name string,
salary float,
subordinate array<string>,
deductions map<string,float>,
address struct<street:string,city:string,num:int>
) partitioned by (date_time string,type string)
row format delimited fields terminated by '\t'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile
location '/hive/inner';

附：上述语句表示在建表时划分了date_time和type两个分区也叫双分区，一个分区的话就叫单分区，上述语句执行完以后我们查看表的结果会发现多了分区的两个字段。

[java] view plain copy

desc employees;

结果如下：

注：在文件系统中的表现为date_time为一个文件夹，type为date_time的子文件夹。

向分区表中插入数据(要指定分区)

[java] view plain copy

hive> load data local inpath '/usr/local/src/employee_data' into table employees partition(date_time='2015-01_24',type='userInfo');
Copying data from file:/usr/local/src/employee_data
Copying file: file:/usr/local/src/employee_data
Loading data to table default.employees partition (date_time=2015-01_24, type=userInfo)
OK
Time taken: 0.22 seconds
hive>

数据插入后在文件系统中显示为：

注：从上图中我们就可以发现type分区是作为子文件夹的形式存在的。

添加分区：

[java] view plain copy

alter table employees add if not exists partition(date_time='2088-08-18',type='liaozhongmin');

注：我们可以先添加分区，再向对应的分区中添加数据。

查看分区：

[java] view plain copy

show partitions employees;

附：employees在这里表示表名。

删除不想要的分区

[java] view plain copy

alter table employees drop if exists partition(date_time='2015-01_24',type='userInfo');

再次查看分区：

2.Hive桶表

对于每一个表或者是分区，Hive可以进一步组织成桶，也就是说桶是更为细粒度的数据范围划分。Hive是针对某一列进行分桶。Hive采用对列值哈希，然后除以桶的个数求余的方式决定该条记录存放在哪个桶中。分桶的好处是可以获得更高的查询处理效率。使取样更高效。

示例：

[java] view plain copy

create table bucketed_user(
id int,
name string
)
clustered by(id) sorted by(name) into 4 buckets
row format delimited fields terminated by '\t'
stored as textfile;

我们使用用户id来确定如何划分桶(Hive使用对值进行哈希并将结果除于桶的个数取余数的方式进行分桶)

另外一个要注意的问题是使用桶表的时候我们要开启桶表：

[java] view plain copy

set hive.enforce.bucketing = true;

现在我们将表employees中name和salary查询出来再插入到这张表中：

[java] view plain copy

insert overwrite table bucketed_user select salary,name from employees;

我们通过查询语句可以查看插进来的数据：

数据在文件中的表现形式如下，分成了四个桶：

当从桶表中进行查询时，hive会根据分桶的字段进行计算分析出数据存放的桶中，然后直接到对应的桶中去取数据，这样做就很好的提高了效率。

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。

打开APP，阅读全文并永久保存查看更多类似文章

猜你喜欢

类似文章

【热】打开小程序，算一算2024你的财运

Hive 基础之：分区、桶、Sort Merge Bucket Join

Hive之分区（Partitions）和桶（Buckets）

Hive DDL DML及SQL操作

Hive 之Table、External Table、Partition(五)

万字长文|一文了解基于Flink构建流批一体数仓的技术点

更多类似文章 >>

生活服务

热点新闻

分享收藏导长图关注下载文章

绑定账号成功
后续可登录账号畅享VIP特权！

如果VIP功能使用有故障，
可点击这里联系客服！

联系客服