大规模网络中基于LDA模型的重叠社区发现
Overlapping community detection algorithm based on LDA in large scale networks
  
DOI:
中文关键词:  社会网络;LDA;社区发现;重叠社区
英文关键词:social networks;LDA;community detection;overlapping communities
基金项目:国家自然科学基金(61272422,61672297)资助项目
作者单位
张伟 南京邮电大学 计算机学院,江苏南京210003 
祁德昊 南京邮电大学 计算机学院,江苏南京210003 
陈云芳 南京邮电大学 计算机学院,江苏南京210003 
摘要点击次数: 1716
全文下载次数: 1051
中文摘要:
      传统的重叠社区发现基于网络的结构信息,具体依靠节点之间的连接关系,由于没有使用节点的内容信息,难以反映网络社区的语义。文中提出了一种大规模网络中基于节点属性的重叠社区发现算法(Overlapping Community Detection algorithm based on LDA,OCD_LDA),该算法使用LDA主题模型对节点内容进行多维属性建模,将网络节点看作文章,节点所携带的多维属性值看作文章中的单词,因此网络中的社区对应了主题模型中的主题,节点的多重社区归属对应于文章的多个主题。算法进一步考虑到网络中节点内容短小在主题建模过程中导致的数据稀疏问题,在LDA主题模型中引入Spike and Slab prior方法辅助实现变量选择和参数估计,有效地解决节点上社区分布的稀疏性和平滑性问题。实验使用DBLP文献数据集对算法进行了验证,结果表明,OCD_LDA算法能够更加有效地发现大规模网络中的重叠社区分布,揭示出复杂数据的内在特性。
英文摘要:
      The traditional overlapping community detection is based on the network structure information,and depends on the connection relationship between the nodes.Without the content information of the nodes,it is difficult to reveal the semantics of the network community.An overlapping community detection algorithm based on node attributes in largescale networks,overlapping community detection algorithm based on LDA( OCD_LDA),is proposed.The LDA topic model is used to model the multi dimensional attributes of the node content in the algorithm,while a network node is regarded as an article and the multi dimensional attribute value carried by the node is regarded as the words in the article.Therefore,the community in the network corresponds to the theme in the topic model,and the multiple community attribution of nodes corresponds to multiple themes of the article.Moreover,the data sparsity caused by short content of the nodes in the topic modeling process is considered,and then the Spike and Slab prior method is introduced in the LDA topic model to help implement variable selection and parameter estimation to solve the sparsity and smoothness issues of community distribution on nodes.The experimental result in the DBLP bibliographic data set shows that the OCD_LDA can more effectively detect the distribution of overlapping communities in large scale networks and reveal the intrinsic properties of complex data.
查看全文  查看/发表评论  下载PDF阅读器

你是第3807818访问者
版权所有《南京邮电大学学报(自然科学版)》编辑部
Tel:86-25-85866913 E-mail:xb@njupt.edu.cn
技术支持:本系统由北京勤云科技发展有限公司设计

欢迎访问《南京邮电大学学报(自然科学版)》编辑部!