打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
全球人类动态数据库向公众开放
userphoto

2014.05.31

关注
这是行星地球上最大的数据库之一,包含自1979年到当前每日更新的全球25亿起事件记录,58个不同的领域横跨300个种类的“地球人类观察日记”,它通过数以千计的位于全球各地的各种新闻来源记录着人类社会每一天的行为活动,这个庞大的数据库被用来进行事件分析预测,及行为模式拾取,该数据库由Google提供的云端服务支持利用Google BigQuery 进行数据分析。该数据库对全球人类开放,不过中国人是访问不了了,就像Google提供的其它服务一样,当然原因不在Google。这种阻隔中国人对世界先进技术的了解访问的努力除了让中国在整体上与世界前沿发展不断产生距离和保持落后趋势之外,对其他国家的人类没有任何坏处。
More than 250 million global events are now in the cloud for anyone to analyze
Posted on MAY. 29, 2014  |  by  Derrick Harris
SUMMARY:  The Global Database of Events, Languages, and Tones is a growing trove of information about meaningful events that have happened across the world in the past three decades. Now, it’s available to the public to access and analyze using Google’s cloud computing services.
Georgetown University researcher Kalev Leetaru has spent years building the Global Database of Events, Languages, and Tones. It now contains data on more than 250 million events dating back to 1979 and updated daily, with 58 different fields apiece, across 300 categories. Leetaru uses it to produce a daily report analyzing global stability. He and others have used it to figure out whether the kidnapping of 200 Nigerian girls was a predictable event and watch Crimea turn into a hotspot of activity leading up to ex-Ukrainian Viktor Yanukovych’s ouster and Russia’s subsequent invasion.
“The idea of GDELT is how do we create a catalog, essentially, of everything that’s going on across the planet, each day,” Leetaru explained in a recent interview.
And now all of it is available in the cloud, for free, for anybody to analyze as they desire. Leetaru has partnered with Google, where he has been hosting GDELT for the past year, to make it available (here) as a public dataset that users can analyze directly with Google BigQuery. Previously, anyone interested in the data had to download the 100-gigabyte dataset and analyze it on their own machines. They still can, of course, and Leetaru recently built a catalog of recipes for various analyses and a BigQuery-based method for slicing off specific parts of the data.
A view from the Time Mapper tool in the GDELT Analysis Service.
But there’s big promise in removing barriers by letting data scientists, policy analysts and researchers dig into it right from a browser window. BigQuery is actually remarkably powerful in terms of the types of analysis it enables, Leetaru explained, and it’s fast. Tasks that used to take him hours now take him seconds. He (as I have before) calls that kind of computing power and capability, paired with such valuable data and data scientists who can make sense of it, “a perfect marriage.”
“You’ve got all this pent-up [analytic] expertise out there,” he said. “… Go run these big queries. Tell us what’s possible.”
(Leetaru, who used to work with supercomputers (he helped create the supercomputer-powered Twitter Heartbeat project), also lauded the automation and performance of Google Compute Engine — something I’ll discuss with Google SVP Urs H?lzle at our Structure conference next month.)
Leetaru has big ideas for what he thinks is possible with GDELT. He wants us to be able to understand, in real-time, what’s going on in the human world just like the USGS can tell us about earthquakes — what happened, where and what to expect next. Right now, for example, Leetaru thinks he’ll be able to analyze 90 days of activity around the recent coup in Thailand and then find similar patterns around the world over the last 35 years. That could help shed light on what will happen next, to prove (or not) that history really does repeat itself.
Another GDELT visualization, this one showing intensity of events by country, over time.
He’s also excited about the scale he has at his fingertips, in terms of both data sources and computing. Right now, GDELT is populated from numerous news sources around the world, their content automatically processed by text-analysis and geocoding algorithms Leetaru has built and then added to the database. With advances in natural-language processing and translation, however, he’s confident he’ll soon be able to grab even more content from non-English sources (75 percent of their content, he said, isn’t available in English anywhere on the planet).
“We get too trapped in this western narrative,” he explained, citing pushback from foreign policy experts (Leetaru is a frequent contributor to Foreign Policy magazine) about the threat to Crimea when he analyzed the situation in Ukraine. The more data we have from other parts of the world, written by people on the ground, the easier time we should have predicting what will happen there.
Leetaru holding a disk that now, presumably, is in the cloud. Source: Kalev Leetaru
Which is why Leetaru is also trying to get a grip on how social media operates around the world, so he can incorporate those feeds into GDELT. He’s working with the U.S. Army to translate the world’s academic literature, and he’s generally looking for ways to digitize as much content as possible going back as far in history as possible. “How do we bring all of this into one fold and, essentially, codify the world?” he asked.
This grand sort of goal wasn’t really feasible when Leetaru was doing everything from his desktop. “Now, on the cloud,” he added, “it can pretty much expand at its leisure.”
Although it’s unique because of its connection to BigQuery for analysis, GDELT isn’t the only large dataset available in the cloud. Google itself hosts many othersas does Amazon Web Services – including the 1,000 Genomes Project, U.S. Census data, NASA NEX and Freebase datasets.
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
云数据库
什么是无服务器数据库(优点和缺点)
AR学院
IDL 连接sql server 2005 数据库心得
微软最新云数据库预示对亚马孙优势的又一个攻击(英文)
在SAP WebIDE Database Explorer里操作hdi实例
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服