1.前置条件:
1.1 环境准备:
Java 8 Linux,
Mac OS X或其他类Unix操作系统(不支持Windows)
8G的RAM
2个vCPU
1.2 下载并解压 druid
解压命令:tar -xzf apache-druid-0.13.0-incubating-bin.tar.gz
进入目录:cd apache-druid-0.13.0-incubating
1.3 下载并解压 zookeeper
Druid依赖Apache ZooKeeper进行分布式协调,在druid的根目录中,下载并运行Zookeeper。运行以下命令:
curl https://archive.apache.org/dist/zookeeper/zookeeper-3.4.11/zookeeper-3.4.11.tar.gz -o zookeeper-3.4.11.tar.gz
tar -xzf zookeeper-3.4.11.tar.gz
mv zookeeper-3.4.11 zk
1.4 启动并运行druid
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
这将带来Zookeeper和Druid服务的实例,所有这些都在本地机器上运行,例如:
- MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
- [Tue Dec 25 16:11:35 2018] Running command[zk], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf
- [Tue Dec 25 16:11:35 2018] Running command[coordinator], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf
- [Tue Dec 25 16:11:35 2018] Running command[broker], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/broker.log]: bin/run-druid broker quickstart/tutorial/conf
- [Tue Dec 25 16:11:35 2018] Running command[historical], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/historical.log]: bin/run-druid historical quickstart/tutorial/conf
- [Tue Dec 25 16:11:35 2018] Running command[overlord], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/overlord.log]: bin/run-druid overlord quickstart/tutorial/conf
- [Tue Dec 25 16:11:35 2018] Running command[middleManager], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/middleManager.log]: bin/run-druid middleManager quickstart/tutorial/conf
所有持久状态(如集群元数据存储和服务段)都将保存在apache-druid-0.13.0-incubating下的var目录中。服务的日志位于var / sv。
2.加载数据
此示例数据位于Druid软件包根目录下的quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz中。页面编辑事件作为JSON对象存储在文本文件中。
示例数据包含以下列,示例事件如下所示:
- {
- "time": "2015-09-12T00:47:47.870Z",
- "channel": "#vi.wikipedia",
- "cityName": null,
- "comment": "clean up using [[Project:AWB|AWB]]",
- "countryIsoCode": null,
- "countryName": null,
- "isAnonymous": false,
- "isMinor": false,
- "isNew": false,
- "isRobot": true,
- "isUnpatrolled": false,
- "metroCode": null,
- "namespace": "Main",
- "page": "Atractus duboisi",
- "regionIsoCode": null,
- "regionName": null,
- "user": "ThitxongkhoiAWB",
- "delta": 18,
- "added": 18,
- "deleted": 0
- }
2.1 通过kafka流式的方式加载数据
2.1.1 Kafka相关操作
(1)下载
- curl -O https://archive.apache.org/dist/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
- tar -xzf kafka_2.11-0.10.2.0.tgz
- cd kafka_2.11-0.10.2.0
(2)启动kafka
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-server-start.sh config/server.properties
(3)查看主题
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --list
(4)创建主题
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wikipedia
(5)删除主题
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic wikipedia
2.1.2 druid 相关操作:
数据进入kafka之后,需要定义一个datasource specfile告诉druid怎样接入数据,里面声明时间戳格式,纬度列、指标列、预聚合的粒度等等。
(1)在druid中启动从kafka中摄入数据的处理逻辑
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ curl -XPOST -H'Content-Type: application/json' -d @quickstart/tutorial/wikipedia-kafka-supervisor.json http://localhost:8090/druid/indexer/v1/supervisor
说明:如果主管成功创建,您将收到包含主管ID的回复;在我们的例子中,我们应该看到{“id”:“wikipedia-kafka”}
数据(定义datasource):wikipedia-kafka-supervisor.json
- {
- "type": "kafka",
- "dataSchema": {
- "dataSource": "wikipedia",
- "parser": {
- "type": "string",
- "parseSpec": {
- "format": "json",
- "timestampSpec": {
- "column": "time",
- "format": "auto"
- },
- "dimensionsSpec": {
- "dimensions": [
- "channel",
- "cityName",
- "comment",
- "countryIsoCode",
- "countryName",
- "isAnonymous",
- "isMinor",
- "isNew",
- "isRobot",
- "isUnpatrolled",
- "metroCode",
- "namespace",
- "page",
- "regionIsoCode",
- "regionName",
- "user",
- { "name": "added", "type": "long" },
- { "name": "deleted", "type": "long" },
- { "name": "delta", "type": "long" }
- ]
- }
- }
- },
- "metricsSpec" : [],
- "granularitySpec": {
- "type": "uniform",
- "segmentGranularity": "DAY",
- "queryGranularity": "NONE",
- "rollup": false
- }
- },
- "tuningConfig": {
- "type": "kafka",
- "reportParseExceptions": false
- },
- "ioConfig": {
- "topic": "wikipedia",
- "replicas": 2,
- "taskDuration": "PT10M",
- "completionTimeout": "PT20M",
- "consumerProperties": {
- "bootstrap.servers": "localhost:9092"
- }
- }
- }
(2)解压wikiticker-2015-09-12-sampled.json.gz
- MacBook-Air-3:tutorial g2$ pwd
- /Users/g2/myresource/druid/apache-druid-0.13.0-incubating/quickstart/tutorial
- MacBook-Air-3:tutorial g2$ gunzip -k wikiticker-2015-09-12-sampled.json.gz
数据示例:
- {
- "time": "2015-09-12T00:48:02.596Z",
- "channel": "#es.wikipedia",
- "cityName": "Mexico City",
- "comment": "Cambio en la redacción del texto y correción en sintaxis",
- "countryIsoCode": "MX",
- "countryName": "Mexico",
- "isAnonymous": true,
- "isMinor": false,
- "isNew": false,
- "isRobot": false,
- "isUnpatrolled": false,
- "metroCode": null,
- "namespace": "Main",
- "page": "Mathis Bolly",
- "regionIsoCode": "DIF",
- "regionName": "Mexico City",
- "user": "189.217.75.123",
- "delta": -67,
- "added": 0,
- "deleted": 67
- }
(3)在kafka目录中,解压如下命令(向kafka中写入数据)
- MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ export KAFKA_OPTS="-Dfile.encoding=UTF-8"
- MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia < /Users/g2/myresource/druid/apache-druid-0.13.0-incubating/quickstart/tutorial/wikiticker-2015-09-12-sampled.json
现在,数据成功写入到kafka,接下来我们就可以到druid中进行查询了。
3.2 加载文件
3.2.1 准备数据和摄取任务规范:wikipedia-index.json
- {
- "type" : "index",
- "spec" : {
- "dataSchema" : {
- "dataSource" : "wikipedia",
- "parser" : {
- "type" : "string",
- "parseSpec" : {
- "format" : "json",
- "dimensionsSpec" : {
- "dimensions" : [
- "channel",
- "cityName",
- "comment",
- "countryIsoCode",
- "countryName",
- "isAnonymous",
- "isMinor",
- "isNew",
- "isRobot",
- "isUnpatrolled",
- "metroCode",
- "namespace",
- "page",
- "regionIsoCode",
- "regionName",
- "user",
- { "name": "added", "type": "long" },
- { "name": "deleted", "type": "long" },
- { "name": "delta", "type": "long" }
- ]
- },
- "timestampSpec": {
- "column": "time",
- "format": "iso"
- }
- }
- },
- "metricsSpec" : [],
- "granularitySpec" : {
- "type" : "uniform",
- "segmentGranularity" : "day",
- "queryGranularity" : "none",
- "intervals" : ["2015-09-12/2015-09-13"],
- "rollup" : false
- }
- },
- "ioConfig" : {
- "type" : "index",
- "firehose" : {
- "type" : "local",
- "baseDir" : "quickstart/tutorial/",
- "filter" : "wikiticker-2015-09-12-sampled.json.gz"
- },
- "appendToExisting" : false
- },
- "tuningConfig" : {
- "type" : "index",
- "targetPartitionSize" : 5000000,
- "maxRowsInMemory" : 25000,
- "forceExtendableShardSpecs" : true
- }
- }
- }
3.2.2 Load batch data
- MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/post-index-task --file quickstart/tutorial/wikipedia-index.json
- Beginning indexing data for wikipedia
- Task started: index_wikipedia_2018-12-25T10:00:45.744Z
- Task log: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-12-25T10:00:45.744Z/log
- Task status: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-12-25T10:00:45.744Z/status
- Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
- Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
- Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
- Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
- Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
- Task finished with status: SUCCESS
- Completed indexing data for wikipedia. Now loading indexed data onto the cluster...
- wikipedia loading complete! You may now query your data
- MacBook-Air-3:apache-druid-0.13.0-incubating g2$
现在,数据成功加载到druid,接下来我们就可以到druid中进行查询了。
3.查询数据
3.1 Native JSON queries
3.1.1 查询的数据请求示例
- {
- "queryType" : "topN",
- "dataSource" : "wikipedia",
- "intervals" : ["2015-09-12/2015-09-13"],
- "granularity" : "all",
- "dimension" : "page",
- "metric" : "count",
- "threshold" : 15,
- "aggregations" : [
- {
- "type" : "count",
- "name" : "count"
- }
- ]
- }
3.1.2 查询:
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8082/druid/v2?pretty
3.1.3 执行结果
- [ {
- "timestamp" : "2015-09-12T00:46:58.771Z",
- "result" : [ {
- "count" : 33,
- "page" : "Wikipedia:Vandalismusmeldung"
- }, {
- "count" : 28,
- "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
- }, {
- "count" : 27,
- "page" : "Jeremy Corbyn"
- }, {
- "count" : 21,
- "page" : "Wikipedia:Administrators' noticeboard/Incidents"
- }, {
- "count" : 20,
- "page" : "Flavia Pennetta"
- }, {
- "count" : 18,
- "page" : "Total Drama Presents: The Ridonculous Race"
- }, {
- "count" : 18,
- "page" : "User talk:Dudeperson176123"
- }, {
- "count" : 18,
- "page" : "Wikipédia:Le Bistro/12 septembre 2015"
- }, {
- "count" : 17,
- "page" : "Wikipedia:In the news/Candidates"
- }, {
- "count" : 17,
- "page" : "Wikipedia:Requests for page protection"
- }, {
- "count" : 16,
- "page" : "Utente:Giulio Mainardi/Sandbox"
- }, {
- "count" : 16,
- "page" : "Wikipedia:Administrator intervention against vandalism"
- }, {
- "count" : 15,
- "page" : "Anthony Martial"
- }, {
- "count" : 13,
- "page" : "Template talk:Connected contributor"
- }, {
- "count" : 12,
- "page" : "Chronologie de la Lorraine"
- } ]
3.2 Druid SQL queries
德鲁伊还支持用于查询的SQL方言。让我们运行一个SQL查询,它等同于上面显示的本机JSON查询。
3.2.1 sql请求的json格式
- {
- "query":"SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE \"__time\" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10"
- }
3.2.2 查询
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql
3.2.3 执行结果
- [
- {
- "page": "Wikipedia:Vandalismusmeldung",
- "Edits": 33
- },
- {
- "page": "User:Cyde/List of candidates for speedy deletion/Subpage",
- "Edits": 28
- },
- {
- "page": "Jeremy Corbyn",
- "Edits": 27
- },
- {
- "page": "Wikipedia:Administrators' noticeboard/Incidents",
- "Edits": 21
- },
- {
- "page": "Flavia Pennetta",
- "Edits": 20
- },
- {
- "page": "Total Drama Presents: The Ridonculous Race",
- "Edits": 18
- },
- {
- "page": "User talk:Dudeperson176123",
- "Edits": 18
- },
- {
- "page": "Wikipédia:Le Bistro/12 septembre 2015",
- "Edits": 18
- },
- {
- "page": "Wikipedia:In the news/Candidates",
- "Edits": 17
- },
- {
- "page": "Wikipedia:Requests for page protection",
- "Edits": 17
- }
- ]
3.3 sql client
为方便起见,Druid包中包含一个SQL命令行客户端,位于Druid包根目录的bin/dsql中。 我们现在运行bin / dsql;你应该看到以下提示:
- MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/dsql
- Welcome to dsql, the command-line client for Druid SQL.
- Type "\h" for help.
- dsql>
3.3.1 示例1:
- dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
- ┌──────────────────────────────────────────────────────────┬───────┐
- │ page │ Edits │
- ├──────────────────────────────────────────────────────────┼───────┤
- │ Wikipedia:Vandalismusmeldung │ 33 │
- │ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
- │ Jeremy Corbyn │ 27 │
- │ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
- │ Flavia Pennetta │ 20 │
- │ Total Drama Presents: The Ridonculous Race │ 18 │
- │ User talk:Dudeperson176123 │ 18 │
- │ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
- │ Wikipedia:In the news/Candidates │ 17 │
- │ Wikipedia:Requests for page protection │ 17 │
- └──────────────────────────────────────────────────────────┴───────┘
- Retrieved 10 rows in 0.35s.
3.3.2 示例2:Timeseries
- dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);
- ┌──────────────────────────┬──────────────┐
- │ HourTime │ LinesDeleted │
- ├──────────────────────────┼──────────────┤
- │ 2015-09-12T00:00:00.000Z │ 1761 │
- │ 2015-09-12T01:00:00.000Z │ 16208 │
- │ 2015-09-12T02:00:00.000Z │ 14543 │
- │ 2015-09-12T03:00:00.000Z │ 13101 │
- │ 2015-09-12T04:00:00.000Z │ 12040 │
- │ 2015-09-12T05:00:00.000Z │ 6399 │
- │ 2015-09-12T06:00:00.000Z │ 9036 │
- │ 2015-09-12T07:00:00.000Z │ 11409 │
- │ 2015-09-12T08:00:00.000Z │ 11616 │
- │ 2015-09-12T09:00:00.000Z │ 17509 │
- │ 2015-09-12T10:00:00.000Z │ 19406 │
- │ 2015-09-12T11:00:00.000Z │ 16284 │
- │ 2015-09-12T12:00:00.000Z │ 18672 │
- │ 2015-09-12T13:00:00.000Z │ 30520 │
- │ 2015-09-12T14:00:00.000Z │ 18025 │
- │ 2015-09-12T15:00:00.000Z │ 26399 │
- │ 2015-09-12T16:00:00.000Z │ 24759 │
- │ 2015-09-12T17:00:00.000Z │ 19634 │
- │ 2015-09-12T18:00:00.000Z │ 17345 │
- │ 2015-09-12T19:00:00.000Z │ 19305 │
- │ 2015-09-12T20:00:00.000Z │ 22265 │
- │ 2015-09-12T21:00:00.000Z │ 16394 │
- │ 2015-09-12T22:00:00.000Z │ 16379 │
- │ 2015-09-12T23:00:00.000Z │ 15289 │
- └──────────────────────────┴──────────────┘
- Retrieved 24 rows in 0.25s.
- dsql>
3.3.3 GroupBy
- dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;
- ┌───────────────┬─────────┐
- │ channel │ EXPR$1 │
- ├───────────────┼─────────┤
- │ #en.wikipedia │ 3045299 │
- │ #it.wikipedia │ 711011 │
- │ #fr.wikipedia │ 642555 │
- │ #ru.wikipedia │ 640698 │
- │ #es.wikipedia │ 634670 │
- └───────────────┴─────────┘
- Retrieved 5 rows in 0.13s.
3.3.4 EXPLAIN PLAN FOR
- dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
- ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
- │ PLAN │
- ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
- └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
- Retrieved 1 row in 0.09s.
联系客服