HBase权威指南中文版第三章翻译

第三章客户端API: 基础篇(第二部分)

Get操作

Get方法是用来从HBase中取出相应的数据。可以根据它们一次取出的条数分成两类：单条Get、多条Get。

单条Get

可以使用如下的接口从HBase出取出特定的数据出来：

Result get(Get

get) throws IOException

与Put类类似，Get类提供了一个get方法，在调用该方法时，您同样需要提供一个Get实例，该实例必须指定一个rowkey，它有如下的两种创造函数：

Get(byte[] row)

Get(byte[] row,

RowLock rowLock)

一个get方法从来取一个特殊的行，但可以取出这一行中的多个列。Get类的构造函数必须指定一个row，第二个构造函数还可以指定一个RowLock，允许您使用一个自己定义的行锁。与Put类相似，Get类也指供了大量的方法从来设置您所要找的行，或者精确到一个具体的Cell：

Get

addFamily(byte[] family)

Get

addColumn(byte[] family, byte[] qualifier)

Get

setTimeRange(long minStamp, long maxStamp) throws IOException

Get

setTimeStamp(long timestamp)

Get

setMaxVersions()

Get

setMaxVersions(int maxVersions) throws IOException

addFamily将查询的行限制到特定的Column Family上。它可以被调用多次来添加多个Column Family。对于addColumn方法也是一样的。您可以给Get实例添加更多的限制条件，比如时间戳范围、版本数目等。

一次get操作允许取回一行记录的多个版本，在不设置取回的版本数目下，默认返回最近的一个版本。如果您有所怀疑，可以通过接口getMaxVersions()查看。对于无参的setMaxVersions()调用，会将版本数设为Integer.MAX_VALUE，从而取回这一行对应的所有版本。Get类也提供了其它的一些函数调用，在表3-4中列出了他们的用法。

表3-4 Get类部分方法列举

Method

Description

getRow()

Returns the row key as

specified when creating the Get instance.

getRowLock()

Returns the row RowLock

instance for the current Get instance.

getLockId()

Returns the optional lock ID

handed into the constructor using the rowlock parameter. Will be -1L if not

set.

getTimeRange()

Retrieves the associated

timestamp or time range of the Get instance. Note that there is no getTimeStamp()

since the API converts a value assigned with set TimeStamp() into a TimeRange

instance internally, setting the minimum and maximum values to the given

timestamp.

setFilter()/getFilter()

Special filter instances can

be used to select certain columns or cells, based on a wide variety of conditions. You can

get and set them with these methods.

setCacheBlocks()/

getCacheBlocks()

Each HBase region server has a

block cache that efficiently retains recently accessed data for subsequent reads of

contiguous information. In some events it is better to not engage the cache to avoid

too much churn when doing completely random gets. These methods give you control

over this feature.

numFamilies()

Convenience method to retrieve

the size of the family map, containing the families added using the addFamily() or

addColumn() calls.

hasFamilies()

Another helper to check if a

family—or column—has been

added to the current instance of the Get class.

familySet()/

getFamilyMap()

These methods give you access

to the column families and specific columns, as added by the addFamily() and/or

addColumn() calls. The family map is a map where the key is the family name and

the value a list of added column qualifiers for this particular family. The

familySet() returns the Set of all stored families, i.e., a set containing only the family

names.

表3-4中表出的getter方法，只能取出对Get设置过的值。因此，他们很少被用到。

在前面提到过，HBase提供了一个名为Bytes的帮助类，该类提供了很多的静态方法实现Java中的类型与byte数组的转换。它也提供了反向的转换，即从byte数组，解析出相应的Java类型。下面给出了Bytes类的一些方法：

static String

toString(byte[] b)

static boolean

toBoolean(byte[] b)

static long

toLong(byte[] bytes)

static float

toFloat(byte[] bytes)

static int

toInt(byte[] bytes)

示例3-8演示了如何使用它们：

示例3-8 从HBase中获取数据

Configuration conf

= HBaseConfiguration.create();

HTable table = new

HTable(conf, “testtable”);

Get get = new

Get(Bytes.toBytes(“row1″));

get.addColumn(Bytes.toBytes(“colfam1″),

Bytes.toBytes(“qual1″));

Result result =

table.get(get);

byte[] val = result.getValue(Bytes.toBytes(“colfam1″),

Bytes.toBytes(“qual1″));

System.out.println(“Value:

” + Bytes.toString(val));

首先创建一个HBase的配置文件，初始化一个HTable的实例。创建一个指定向row1的Get实例，向Get中添加一个colfam1的列，和一个qual1的qualifier。然后从HBase中取出这一行的对应的数据，最后将数据转化成相应的格式并打印出来。如果您运行上述的示例代码，应该打印出：

Value: val1

Result类

如果调用get()函数取出数据，您将会得到一个Result对象，它持有所有相符的Cell。当您使用特定的行、特定的查询条件（如column family,

column qualifier, timestamp等）从HBase服务器上取出一个Result对象后，您可以利用它来取出所有您想要的结果。

就像前面示例3-8给出的一样，您可以得到更多的维度信息。比如，要求服务器返回指定column family的所有列，这样您在客户端侧就可以通过get方法取得所有的列信息。下面给出了Result类提供的一些方法：

byte[]

getValue(byte[] family, byte[] qualifier)

byte[] value()

byte[] getRow()

int size()

boolean isEmpty()

KeyValue[] raw()

List<KeyValue>

list()

getValue方法从HBase中一个特定的Cell中取出数据。您可以不设定时间戳、版本数，从而获得最近的一个版本数据。由于服务器上同一行的数据按照版本由新到旧的顺序排度，因此，总是服务器查到的第一次记录就是最新的一个版本。

前面已经介绍过getRow()：它返回rowkey，即在创建Get实例时指定的rowkey。size()方法可以得到服务器返回的KeyValue实例的数目。isEmpty可以判断KeyValue实例的数目是否为空。

通过row方法，可以得到当前Result实例后存储的一组KeyValue实例数组，list方法可以将KeyValue数组对象转化成一个List对象，从而可以简单地通过迭代器进行访问。

raw()方法返回的数组已经经过了排序，排序的维度是Column family

> Column qualifier > timetamp > type。

还有一组面向Column的方法：

List<KeyValue>

getColumn(byte[] family, byte[] qualifier)

KeyValue

getColumnLatest(byte[] family, byte[] qualifier)

boolean containsColumn(byte[]

family, byte[] qualifier)

要得到一个列对应的一组KeyValue，您必须先调用setMaxVersions设定要取得多个版本，否则只能得到一个KeyValue。getCOlumnLatest返回这个列对应的最新的一个Cell。getValue()方法并不返回一个raw字节数组，而是返回KeyValue对象。containsColumn可以非常便捷的查看返回的Cell中是否指定的Column列。

所有的方法中的qualifier字段都可以设置为null，这样可以匹配qualifier为空的列。Qualifier为空意味着列没有label。当查看一个表中的数据时，比如使用shell命令，您必须了解表中有哪些列。很少会使用到空的qualifier，在这些情况下，意味着只有一个column，这时column family便起到了column的作用。

还有另一个方法集，可以对请求得到的数据进行访问。它们是面向map的访问方式：

NavigableMap<byte[],

NavigableMap<byte[], NavigableMap<Long, byte[]>>> getMap()

NavigableMap<byte[],

avigableMap<byte[], byte[]>> getNoVersionMap()

NavigableMap<byte[],

byte[]> getFamilyMap(byte[] family)

getMap是更通用的调用方式，以Java Map的方法，返回整个result集合，可以通过迭代的方式遍历所有的值。getNoVersionMap()方法只返回最新的一个版本的数据。第三个getMap方法返回指定family下的所有版本的value值。

使用哪组接口访问Result对象取决于你的习惯; 数据已经通过网络从服务器转输到了客户端，并不存在效率的差别。

批量Get

put方法中，可以一次插入一组Put对象。类似地，get操作也允许一次从服务器上取一组Get对象。批量Get是一种高效地访问HBase的方法，但同样不能保证多条数据之间的顺序。

从前面的图3-1可以看出来，请求不只会发送到一台服务器上，但从客户端看来，仿佛只有一条请求发出。

批量Get的API定义如下：

Result[]

get(List<Get> gets) throws IOException

跟前面的批量put一样，您需要先创建一个队列来存储Get实例，这些实例保存要请求的条件，而服务器端返回查询出来的Result结果。示例3-9给出了如何使用两种不同的方式取数据。

示例3-9 批量从HBase中取数据

byte[] cf1 =

Bytes.toBytes(“colfam1″);

byte[] qf1 =

Bytes.toBytes(“qual1″);

byte[] qf2 =

Bytes.toBytes(“qual2″);

byte[] row1 =

Bytes.toBytes(“row1″);

byte[] row2 =

Bytes.toBytes(“row2″);

List<Get>

gets = new ArrayList<Get>();

Get get1 = new

Get(row1);

get1.addColumn(cf1,

qf1);

gets.add(get1);

Get get2 = new

Get(row2);

get2.addColumn(cf1,

qf1);

gets.add(get2);

Get get3 = new

Get(row2);

get3.addColumn(cf1,

qf2);

gets.add(get3);

Result[] results =

table.get(gets);

System.out.println(“First

iteration…”);

for (Result result

: results) {

String

row = Bytes.toString(result.getRow());

System.out.print(“Row:

” + row + ” “);

byte[]

val = null;

(result.containsColumn(cf1, qf1)) {

val

= result.getValue(cf1, qf1);

System.out.println(“Value:

” + Bytes.toString(val));

}

(result.containsColumn(cf1, qf2)) {

val

= result.getValue(cf1, qf2);

System.out.println(“Value:

” + Bytes.toString(val));

}

System.out.println(“Second

iteration…”);

for (Result result

: results) {

for

(KeyValue kv : result.raw()) {

System.out.println(“Row:

” + Bytes.toString(kv.getRow()) +

” Value:

” + Bytes.toString(kv.getValue()));

}

示例中首先定义了一组byte数据，用来存放column family的名字、column qualifier的名字、row的名字。然后创建一个List保存所有的Get请求对象。最后调用HTable的get方法，从服务器上批量读取数据。在第一个迭代遍历的过程中，只打印出colfam1、qual1对应的列和colfam1、qual2对应的值。第二个迭代遍历的过程中打印出所有取到的值。

假设您在运行了示例3-4之后，运行示例3-9，那么您将得到如下的输出：

First iteration…

Row: row1 Value:

val1

Row: row2 Value:

val2

Row: row2 Value:

val3

Second

iteration…

Row: row1 Value:

val1

Row: row2 Value:

val2

Row: row2 Value:

val3

两次迭代过程会打印相同的值。示例告诉您，如何访问批量get的结果。您现在还不了解的便是出错如何通知到您。这和前面讲到的put是有所不同的，get操作要么取出与Get实例大小相等的结果，要么抛出一个异常。示例3-10给出一个例子：

示例3-10 读出一个错误的column family

List<Get>

gets = new ArrayList<Get>();

Get get1 = new

Get(row1);

get1.addColumn(cf1,

qf1);

gets.add(get1);

Get get2 = new

Get(row2);

get2.addColumn(cf1,

qf1);

gets.add(get2);

Get get3 = new

Get(row2);

get3.addColumn(cf1,

qf2);

gets.add(get3);

Get get4 = new Get(row2);

get4.addColumn(Bytes.toBytes(“BOGUS”),

qf2);

gets.add(get4);

Result[] results =

table.get(gets);

System.out.println(“Result count: ” + results.length);

上述代码首先将Get实例插入到一个List中，其中一个Get实例指定了一个虚假的column family值。因此，一个异常将会被抛出，最后的打印记录永远不会输出来。执行上述代码，将会得到一个如下的异常：

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:

Failed 1 action:

NoSuchColumnFamilyException: 1 time,

servers with

issues: 10.0.0.57:51640,

batch()是一种更有控制力的API，它能处理部分出错的情况。在后文的批量操作部分将会介绍到这个API。