打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
11月24日论文推荐(附下载地址)

论文名:

Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification

作者:

Chuxu Zhang, Chao Huang, Lu Yu, Xiangliang Zhang, Nitesh V. Chawla

推荐理由:

这篇文章关注的是利用历史数据寻找匿名文章的可能作者。文章的重点主要有两个。其一是 metric learning ,作者对文章的 abstract 进行 word embedding,随后通过 GRU 编码为 d 维的 embedding,通过拉近文章和真正作者的距离,以及拉远虚假作者的距离训练模型。从而给定一个 abstract,这个模型就可以给出历史数据中哪个作者和这篇文章最接近。其二是 meta-path walk,也就是对由作者,机构,文章,出版方构成的异质网络建模。在这个网络上采取一定策略 walk,各结点的类型由 meta-path 指定。将得到的 walk 作为监督信息,使用 skipgram 模型增强之前的训练结果。这种方式不仅利用了“文章-作者”这样的直接监督信息,还同时利用了各种间接的信息,例如形如“作者-文章-文章-作者”的引用。

Abstract

In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing deanonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning node embeddings in academic network to infer the correlation of paper-author pair.

However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose Camel, a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. Extensive experiments demonstrate that Camel outperforms the state-of-the-art baselines. It achieves an average improvement of 6.3% over the best baseline method.

 

论文下载链接

https://www3.nd.edu/~dial/publications/zhang2018camel.pdf

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
数据挖掘
CVPR2019论文抢鲜看!20篇涵盖目标检测、跨模态、视频处理、语义分割和姿态估计等方向论文
今日 Paper | 小样本图像分类;对抗自动增强;语音情感识别;多模态机器翻译等
何恺明官宣加入MIT!46万 引用冠绝MIT,首创ResNet被引破17万
初中英语书面表达精品范文(四)
手把手教程:从零开始学 Meta 分析
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服