这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:topic_extraction_mmsb [2018/05/12 12:15] pzczxs [Utilities] |
zh:notes:topic_extraction_mmsb [2022/06/30 11:32] (当前版本) pzczxs 讨论状态变化了 |
||
---|---|---|---|
行 2: | 行 2: | ||
data source: [[http://www.topic-challenge.info|Topic Extraction Challenge]] | data source: [[http://www.topic-challenge.info|Topic Extraction Challenge]] | ||
- | astro: data directory | + | ===== Citation Information ===== |
+ | Shuo Xu, Junwan Liu, Dongsheng Zhai, Xin An, Zheng Wang, and Hongshen Pang, 2018. [[https://doi.org/10.1007/s11192-018-2841-4|Overlapping Thematic Structures Extraction with Mixed-Membership Stochastic Blockmodel]]. //Scientometrics//, Vol. 117, No. 1, pp. 61-84. {{:resources:datasets:xlza_2018.zip|Results}} | ||
+ | |||
+ | ===== Notes ===== | ||
+ | |||
+ | astro: data directory | ||
tools: tool directory | tools: tool directory | ||
行 8: | 行 13: | ||
doc: documents | doc: documents | ||
- | ===== Tools ===== | + | ===== Requirements ===== |
*[[:zh:notes:install_svinet|SVINET]] | *[[:zh:notes:install_svinet|SVINET]] | ||
*[[http://networkx.github.io/|NetworkX]] | *[[http://networkx.github.io/|NetworkX]] | ||
*[[:zh:notes:running_jate2|JATE 2.0]] | *[[:zh:notes:running_jate2|JATE 2.0]] | ||
- | *[[http://scikit-learn.org/|scikit-Learn]] | + | *[[http://scikit-learn.org/|scikit-learn]] |
*[[https://gephi.org/|Gephi]] | *[[https://gephi.org/|Gephi]] | ||
*[[http://www.vosviewer.com/|VOSViewer]] | *[[http://www.vosviewer.com/|VOSViewer]] | ||
+ | *Matlab | ||
+ | *SPSS | ||
=====Generate citation-link network as edge lists===== | =====Generate citation-link network as edge lists===== | ||
<code bash> | <code bash> | ||
行 91: | 行 98: | ||
> index = GiniIndex('../astro/direct.edgelist.giant.degree', '../astro/coupling.edgelist.giant.degree', '../astro/cocitation.edgelist.giant.degree') | > index = GiniIndex('../astro/direct.edgelist.giant.degree', '../astro/coupling.edgelist.giant.degree', '../astro/cocitation.edgelist.giant.degree') | ||
</code> | </code> | ||
- | *Statistics about uncovered communities: min, max, avg, and median | + | *Statistics about uncovered communities: min, max, avg, median, and standard variance |
<code bash> | <code bash> | ||
> python statistics.py ../astro/n101831-k2396-mmsb-linksampling/communities.txt | > python statistics.py ../astro/n101831-k2396-mmsb-linksampling/communities.txt | ||
行 101: | 行 108: | ||
> python statistics.py ../astro/n82895-k113-mmsb-linksampling/communities.txt | > python statistics.py ../astro/n82895-k113-mmsb-linksampling/communities.txt | ||
</code> | </code> | ||
+ | *Distribution of the cluster size | ||
+ | <code matlab> | ||
+ | > load cluster_size | ||
+ | > cluster_size_distribution(direct_2396(:, 2), 100, 100) | ||
+ | > cluster_size_distribution(coupling_992(:, 2), 100, 100) | ||
+ | > cluster_size_distribution(cocitation_634(:, 2), 100, 100) | ||
+ | > cluster_size_distribution(direct_113(:, 2), 50, 200) | ||
+ | > cluster_size_distribution(coupling_113(:, 2), 50, 200) | ||
+ | > cluster_size_distribution(cocitation_113(:, 2), 50, 200) | ||
+ | </code> | ||
+ | |||
+ | Q-Q (Quantile-Quantile) Plot: Analyze-->Discriptive Statistics-->Explore-->Plots, //Normality plots with tests// with selected. | ||
*Distribution of the number of memberships of nodes | *Distribution of the number of memberships of nodes | ||
<code bash> | <code bash> | ||
行 112: | 行 131: | ||
</code> | </code> | ||
*Overlaps between thematic structures at a high level | *Overlaps between thematic structures at a high level | ||
+ | <code bash> | ||
+ | > java -jar AstroConverter.jar Overlap -threshold 150 -input ../astro/n101831-k113-mmsb-linksampling/communities.txt -output ../astro/direct | ||
+ | > java -jar AstroConverter.jar Overlap -threshold 200 -input ../astro/n101053-k113-mmsb-linksampling/communities.txt -output ../astro/coupling | ||
+ | > java -jar AstroConverter.jar Overlap -threshold 200 -input ../astro/n82895-k113-mmsb-linksampling/communities.txt -output ../astro/cocitation | ||
+ | </code> | ||
- | Screenshot -> Save -> EPS (Encapsulated PostScript) files (*.eps) -> PDF -> | + | Screenshot -> Save -> EPS (Encapsulated PostScript) files (*.eps) |
*Extract subgraph for visulization | *Extract subgraph for visulization | ||
load the file "subgraph.gml" in Gephi to visualize it. Use each node's "group" member and then edge's "color" member for "partitioning". Use the bridgeness to size each node. (First, copy bridgeness to a new column with column type BigDecimal.) | load the file "subgraph.gml" in Gephi to visualize it. Use each node's "group" member and then edge's "color" member for "partitioning". Use the bridgeness to size each node. (First, copy bridgeness to a new column with column type BigDecimal.) | ||
行 150: | 行 174: | ||
</code> | </code> | ||
+ | <!-- | ||
-log-likelihood and Bayesian Information Criterion (BIC): likelihood.py astro.adjlist.giant n101831-k50-mmsb-linksampling | -log-likelihood and Bayesian Information Criterion (BIC): likelihood.py astro.adjlist.giant n101831-k50-mmsb-linksampling | ||
-comparing result in term of adjusted mutual information and normalized mutual information | -comparing result in term of adjusted mutual information and normalized mutual information | ||
-SolutionComparisonConverter.java | -SolutionComparisonConverter.java | ||
-mutual_info.py cluster_pair_file | -mutual_info.py cluster_pair_file | ||
+ | --> | ||
- | ~~DISCUSSION~~ | + | ~~DISCUSSION:closed~~ |