用户工具

站点工具


zh:notes:topic_extraction_mmsb

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:topic_extraction_mmsb [2018/05/12 12:06]
pzczxs [Requirements]
zh:notes:topic_extraction_mmsb [2022/06/30 11:32] (当前版本)
pzczxs 讨论状态变化了
行 2: 行 2:
 data source: [[http://​www.topic-challenge.info|Topic Extraction Challenge]] data source: [[http://​www.topic-challenge.info|Topic Extraction Challenge]]
  
-astro: data directory+===== Citation Information ===== 
 +Shuo Xu, Junwan Liu, Dongsheng Zhai, Xin An, Zheng Wang, and Hongshen Pang, 2018. [[https://​doi.org/​10.1007/​s11192-018-2841-4|Overlapping Thematic Structures Extraction with Mixed-Membership Stochastic Blockmodel]]. //​Scientometrics//,​ Vol. 117, No. 1, pp. 61-84. {{:​resources:​datasets:​xlza_2018.zip|Results}} 
 + 
 +===== Notes ===== 
 + 
 +astro: data directory ​
  
 tools: tool directory tools: tool directory
行 9: 行 14:
  
 ===== Requirements ===== ===== Requirements =====
-  *[[SVINET]] +  *[[:​zh:​notes:​install_svinet|SVINET]] 
-  *NetworkX +  *[[http://​networkx.github.io/​|NetworkX]] 
-  *Scikit-Learn +  *[[:​zh:​notes:​running_jate2|JATE 2.0]] 
-  *Gephi +  *[[http://​scikit-learn.org/​|scikit-learn]] 
-  *VOSViewer+  *[[https://​gephi.org/​|Gephi]] 
 +  *[[http://​www.vosviewer.com/​|VOSViewer]] 
 +  *Matlab 
 +  *SPSS
 =====Generate citation-link network as edge lists===== =====Generate citation-link network as edge lists=====
 <code bash> <code bash>
行 90: 行 98:
 > index = GiniIndex('​../​astro/​direct.edgelist.giant.degree',​ '​../​astro/​coupling.edgelist.giant.degree',​ '​../​astro/​cocitation.edgelist.giant.degree'​) > index = GiniIndex('​../​astro/​direct.edgelist.giant.degree',​ '​../​astro/​coupling.edgelist.giant.degree',​ '​../​astro/​cocitation.edgelist.giant.degree'​)
 </​code>​ </​code>​
-  *Statistics about uncovered communities:​ min, max, avg, and median+  *Statistics about uncovered communities:​ min, max, avg, median, and standard variance
 <code bash> <code bash>
 > python statistics.py ../​astro/​n101831-k2396-mmsb-linksampling/​communities.txt > python statistics.py ../​astro/​n101831-k2396-mmsb-linksampling/​communities.txt
行 100: 行 108:
 > python statistics.py ../​astro/​n82895-k113-mmsb-linksampling/​communities.txt > python statistics.py ../​astro/​n82895-k113-mmsb-linksampling/​communities.txt
 </​code>​ </​code>​
 +  *Distribution of the cluster size
 +<code matlab>
 +> load cluster_size
 +> cluster_size_distribution(direct_2396(:,​ 2), 100, 100)
 +> cluster_size_distribution(coupling_992(:,​ 2), 100, 100)
 +> cluster_size_distribution(cocitation_634(:,​ 2), 100, 100)
 +> cluster_size_distribution(direct_113(:,​ 2), 50, 200)
 +> cluster_size_distribution(coupling_113(:,​ 2), 50, 200)
 +> cluster_size_distribution(cocitation_113(:,​ 2), 50, 200)
 +</​code>​
 +
 +Q-Q (Quantile-Quantile) Plot: Analyze-->​Discriptive Statistics-->​Explore-->​Plots,​ //Normality plots with tests// with selected.
   *Distribution of the number of memberships of nodes   *Distribution of the number of memberships of nodes
 <code bash> <code bash>
行 111: 行 131:
 </​code>​ </​code>​
   *Overlaps between thematic structures at a high level   *Overlaps between thematic structures at a high level
 +<code bash>
 +> java -jar AstroConverter.jar Overlap -threshold 150 -input ../​astro/​n101831-k113-mmsb-linksampling/​communities.txt -output ../​astro/​direct
 +> java -jar AstroConverter.jar Overlap -threshold 200 -input ../​astro/​n101053-k113-mmsb-linksampling/​communities.txt -output ../​astro/​coupling
 +> java -jar AstroConverter.jar Overlap -threshold 200 -input ../​astro/​n82895-k113-mmsb-linksampling/​communities.txt -output ../​astro/​cocitation
 +</​code>​
  
-Screenshot -> Save -> EPS (Encapsulated PostScript) files (*.eps) ​-> PDF -> +Screenshot -> Save -> EPS (Encapsulated PostScript) files (*.eps)
   *Extract subgraph for visulization   *Extract subgraph for visulization
 load the file "​subgraph.gml"​ in Gephi to visualize it. Use each node's "​group"​ member and then edge's "​color"​ member for "​partitioning"​. Use the bridgeness to size each node. (First, copy bridgeness to a new column with column type BigDecimal.) load the file "​subgraph.gml"​ in Gephi to visualize it. Use each node's "​group"​ member and then edge's "​color"​ member for "​partitioning"​. Use the bridgeness to size each node. (First, copy bridgeness to a new column with column type BigDecimal.)
行 149: 行 174:
 </​code>​ </​code>​
  
 +<!--
   -log-likelihood and Bayesian Information Criterion (BIC): likelihood.py astro.adjlist.giant n101831-k50-mmsb-linksampling   -log-likelihood and Bayesian Information Criterion (BIC): likelihood.py astro.adjlist.giant n101831-k50-mmsb-linksampling
   -comparing result in term of adjusted mutual information and normalized mutual information   -comparing result in term of adjusted mutual information and normalized mutual information
     -SolutionComparisonConverter.java     -SolutionComparisonConverter.java
     -mutual_info.py cluster_pair_file     -mutual_info.py cluster_pair_file
 +-->
 +
 +~~DISCUSSION:​closed~~
  
zh/notes/topic_extraction_mmsb.1526097983.txt.gz · 最后更改: 2018/05/12 12:06 由 pzczxs