大数据学习路线：hive的数据类型

ive的数据类型：1.基本数据类型

类型

描述

示例

TINYINT

1字节有符号整数

SMALLINT

2字节有符号整数

INT

4字节有符号整数

BIGINT

8字节有符号整数

FLOAT

4字节单精度浮点数

1.0

DOUBLE

8字节双精度浮点数

1.0

BOOLEAN

true/false

TRUE

STRING

字符串

‘a’,”a”

BINARY

字节数组

TIMESTAMP

精度到纳秒的时间戳

132550245000，‘2016-01-01 03:04:05.123456789'

新增数据类型TIMESTAMP的值可以是：

· 　　整数：距离Unix新纪元时间（1970年1月1日，午夜12点）的秒数

· 　　浮点数：距离Unix新纪元时间的秒数，精确到纳秒（小数点后保留9位数）

· 　　字符串：JDBC所约定的时间字符串格式，格式为：YYYY-MM-DD hh:mm:ss:fffffffff

BINARY数据类型用于存储变长的二进制数据。

2.复杂数据类型

类型

描述

示例

ARRAY

一组有序字段，字段的类型必须相同

array(1,2)

MAP

一组无需的键值对，键的类型必须是原子的，值可以是任何类型。同一个映射的键的类型必须相同，值的类型也必须相同。

map(‘a’,1,’b’,2)

STRUCT

一组命名的字段，字段的类型可以不同

struct(‘a’,1,1,0)

3.数据类型应用举例

##创建员工表，使用默认分割符

CREATE TABLE employee(

name STRING,

salary FLOAT,

leader ARRAY<STRING>,

deductions MAP<STRING,FLOAT>,

address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>

)

;

4.列的分割符

HiveQL文本文件数据编码表

类型

描述

对于文本文件来说，每行都是一条记录，因此换行符可以分割记录

^A(Ctrl+A)

用于分隔字段（列）。在CREATE TABLE语句中可以使用八进制编码\001表示

用于分隔ARRARY或者STRUCT中的元素，或用于MAP中键-值对之间的分隔。在CREATE TABLE语句中可以使用八进制编码\002表示

用于MAP中键和值之间的分隔。在CREATE TABLE语句中可以使用八进制编码\003表示

CREATE TABLE employee(

name STRING,

salary FLOAT,

subordinates ARRAY<STRING>,

deductions MAP<STRING,FLOAT>,

address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

· 　　[ROW FORMAT DELIMITED]关键字，是用来设置创建的表在加载数据的时候，支持的列分隔符；

· 　　FIELDS TERMINATED BY '\001' ，字符\001是^A的八进制数。这个子句表明Hive将使用^A字符作为列分隔符。

· 　　COLLECTION ITEMS TERMINATED BY '\002' ，字符\002是^B的八进制数。这个子句表明Hive将使用^B字符作为集合元素的分隔符。

· 　　MAP KEYS TERMINATED BY '\003' ，字符\003是^C的八进制数。这个子句表明Hive将使用^C字符作为map的键和值之间的分隔符。

· 　　LINES TERMINATED BY '\n' 、STORED AS TEXTFILE这个两个子句不需要ROW FORMAT DELIMITED 关键字

· 　　Hive目前对于LINES TERMINATED BY…仅支持字符‘\n’，行与行之间的分隔符只能为‘\n’。

hive的基本命令

1.数据库的创建：

本质上是在hdfs上创建一个目录，使用comment加入数据库的描述信息，描述信息放在引号里。数据库的属性信息放在描述信息之后用with dbproperties 加入，属性信息放在括号内，属性名和属性值放在引号里，用等号连接有多条属性用逗号分隔

##创建一个数据库名为myhive,加入描述信息及属性信息

create database myhive comment 'this is myhive db'

with dbproperties ('author'='me','date'='2018-4-21')

;

##查看属性信息

describe database extended myhive;

##在原有数据库基础上加入新的属性信息

alter database myhive set dbproperties ('id'='1');

##切换库

use myhive;

##删除数据库

drop database myhive;

2.表的创建

默认创建到当前数据库(default是hive默认库)，创建表的本质也是在hdfs上创建一个目录

==================练习array的使用，本地数据加载，对比hive与mysql的区别========================

##创建数据array.txt映射表t_array

create table if not exists t_array(

id int comment 'this is id',

score array<tinyint>

)

comment 'this is my table'

row format delimited fields terminated by ','

collection items terminated by '|'

tblproperties ('id'='11','author'='me')

;

##从本地加载数据array.txt文件

load data local inpath '/testdata/array.txt' into table t_array;

##查询表里面的数据

select * from t_array;

##查询id=1的第一条成绩信息

select score[0] from t_array where id=1;

##查询id=2的成绩条数

select size(score) from t_array where id=2;

##查询一共有多少条数据

select count(*) from t_array;

##把arra1.txt追加的方式从本地加载进这个表中

load data local inpath '/testdata/array1.txt' into table t_array;

##把test.txt追加的方式从本地加载进这个表中

load data local inpath '/testdata/test.txt' into table t_array;

##从本地覆盖方式加载数据array.txt文件至t_array表中

load data local inpath '/testdata/array.txt' overwrite into table t_array;

====================练习map的使用，查看表的创建过程，创建表的同时指定数据位置===================

##创建数据map.txt的映射表t_map

create table if not exists t_map(

id int,

score map<string,int>

)

row format delimited fields terminated by ','

collection items terminated by '|'

map keys terminated by ':'

stored as textfile

;

##从hdfs加载数据，map.txt在hdfs上的位置位置被移动。

load data local inpath '/testdata/map.txt' into table t_map;

##查询id=1的数学成绩

select score['math'] from t_map where id=1;

##查询每个人考了多少科

select size(score) from t_map;

##查看表的创建过程

show create table t_map;

CREATE TABLE `t_map1`(

`id` int,

`score` map<string,int>)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

COLLECTION ITEMS TERMINATED BY '|'

MAP KEYS TERMINATED BY ':'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://linux5:8020/user/hive/warehouse/t_map'

;

##创建表的同时指定数据的位置

create table if not exists t_map2(

id int,

score map<string,int>

)

row format delimited fields terminated by ','

collection items terminated by '|'

map keys terminated by ':'

stored as textfile

location '/test'

;

##删除表

drop table test2;

====================练习struct的使用，外部表的创建，总结内部表外部表的区别=====================

##创建数据struct.txt的映射表t_struct(使用external关键字并指定数据位置创建外部表)

create external table if not exists t_struct(

id int,

grade struct<score:int,desc:string,point:string>

)

row format delimited fields terminated by ','

collection items terminated by '|'

location '/external'

##查看score>90的信息

select * from t_struct where grade.score>90;

##创建外部表t_struct1

create external table if not exists t_struct1(

id int,

grade struct<score:int,desc:string,point:string>

)

row format delimited fields terminated by ','

collection items terminated by '|'

;

##insert into 方式追加数据

insert into table t_struct1 select * from t_struct;

##删除表：只有元数据被删除，数据文件仍然存储在hdfs上

drop table t_struct;

3.为hive表加载数据：

将数据文件copy到对应的表目录下面(如果是hdfs上的目录，将是剪切)。

##load方式从本地加载数据，会将数据拷贝到表所对应的hdfs目录

#追加

load data local inpath '本地数据路径' into table tablename

#覆盖

load data local inpath '本地数据路径' overwrite into table tablename

##load方式从hdfs加载数据,会将数据移动到对应的hdfs目录

#追加

load data inpath 'hdfs数据路径' into table tablename

#覆盖

load data inpath 'hdfs数据路径' into table tablename

##通过查询语句向表中插入数据

#追加

insert into table table1 select * from table2

#覆盖

insert overwrite into table table1 select * from table2

4.内部表与外部表

内部表：在Hive 中创建表时，默认情况下Hive 负责管理数据。即，Hive 把数据移入它的"仓库目录" (warehouse directory)

外部表：由用户来控制数据的创建和删除。外部数据的位置需要在创建表的时候指明。使用EXTERNAL 关键字以后， Hìve 知道数据并不由自己管理，因此不会把数据移到自己的仓库目录。事实上，在定义时，它甚至不会检查这一外部位置是否存在。这是一个非常重要的特性，因为这意味着你可以把创建数据推迟到创建表之后才进行。

区别：丢弃内部表时，这个表(包括它的元数据和数据)会被一起删除。丢弃外部表时，Hive 不会碰数据，只会删除元数据，而不会删除数据文件本身

5.表属性修改

##创建表log2

CREATE external TABLE log2(

id string COMMENT 'this is id column',

phonenumber bigint,

mac string,

ip string,

url string,

status1 string,

status2 string,

up int,

down int,

code int,

dt String

)

COMMENT 'this is log table' ##加入描述信息

ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '

LINES TERMINATED BY '\n'

stored as textfile;

##加载数据

load local data inpath '/home/data.log.txt' into table log2;

修改表名：rename to

alter table原名rename to 新名

alter table log rename to log2;

修改列名：change column

alter table 表名 change column 字段名新字段名字段类型【描述信息】;

##修改列名

alter table log4 change column ip myip String;

##修改列名同时加入列的描述

alter table log4 change column myip ip String comment 'this is mysip' ;

##使用after关键字，将修改后的字段放在某个字段后

alter table log4 change column myip ip String comment 'this is myip' after code;

##使用first关键字。将修改的字段调整到第一个字段

alter table log4 change column ip myip int comment 'this is myip' first;

添加列：add columns

##添加列，使用add columns,后面跟括号，括号里面加要加入的字段及字段描述，多个字段用逗号分开

alter table log4 add columns(

x int comment 'this x',

y int

);

删除列：

##删除列，使用replace columns,后面跟括号，括号里面加要删除的字段，多个字段用逗号分开

alter table log4 replace columns(x int,y int);

alter table log4 replace columns(

myip int,

id string,

phonenumber bigint,

mac string,

url string,

status1 string,

status2 string,

up int,

down int,

code int,

dt string

);

将内部表转换为外部表:

alter table log4 set tblproperties(

'EXTERNAL' = 'TRUE'

);

alter table log4 set tblproperties(

'EXTERNAL' = 'false'

);

alter table log4 set tblproperties(

'EXTERNAL' = 'FALSE'

);

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。