data source: Topic Extraction Challenge
Shuo Xu, Junwan Liu, Dongsheng Zhai, Xin An, Zheng Wang, and Hongshen Pang, 2018. Overlapping Thematic Structures Extraction with Mixed-Membership Stochastic Blockmodel. Scientometrics, Vol. 117, No. 1, pp. 61-84. Results
astro: data directory
tools: tool directory
doc: documents
> java -jar AstroConverter.jar DirectCitation -input ../astro/direct_citations.txt -output ../astro/direct > java -jar AstroConverter.jar BibliographicCoupling -threshold 4 -weight true -input ../astro/citation_links.txt -output ../astro/coupling > java -jar AstroConverter.jar Cocitation -threshold 1 -weight true -input ../astro/citation_links.txt -output ../astro/cocitation
> python extract_giant.py -weight false ../astro/direct.edgelist > python extract_giant.py -weight true ../astro/coupling.edgelist > python extract_giant.py -weight true ../astro/cocitation.edgelist
> svinet -file astro/direct.edgelist.giant -n 101831 -k 101831 -eta-type fromdata -findk > wc -l n101831-k101831-mmsb-findk/communities.txt > svinet -file astro/direct.edgelist.giant -n 101831 -k 2396 -eta-type fromdata -link-sampling > cd n101831-k2396-mmsb-linksampling > svinet -file ../astro/direct.edgelist.giant -n 101831 -k 2396 -gml > svinet -file astro/direct.edgelist.giant -n 101831 -k 113 -eta-type fromdata -link-sampling > cd n101831-k113-mmsb-linksampling > svinet -file ../astro/direct.edgelist.giant -n 101831 -k 113 -gml
> svinet -file astro/coupling.edgelist.giant -n 101053 -k 101053 -eta-type fromdata -findk > wc -l n101053-k101053-mmsb-findk/communities.txt > svinet -file astro/coupling.edgelist.giant -n 101053 -k 992 -eta-type fromdata -link-sampling > cd n101053-k992-mmsb-linksampling > svinet -file ../astro/coupling.edgelist.giant -n 101053 -k 992 -gml > svinet -file astro/coupling.edgelist.giant -n 101053 -k 113 -eta-type fromdata -link-sampling > cd n101053-k113-mmsb-linksampling > svinet -file ../astro/coupling.edgelist.giant -n 101053 -k 113 -gml
> svinet -file astro/cocitation.edgelist.giant -n 82895 -k 82895 -eta-type fromdata -findk > wc -l n82895-k82895-mmsb-findk/communities.txt > svinet -file astro/cocitation.edgelist.giant -n 82895 -k 634 -eta-type fromdata -link-sampling > cd n82895-k634-mmsb-linksampling > svinet -file ../astro/cocitation.edgelist.giant -n 82895 -k 634 -gml > svinet -file astro/cocitation.edgelist.giant -n 82895 -k 113 -eta-type fromdata -link-sampling > cd n82895-k113-mmsb-linksampling > svinet -file ../astro/cocitation.edgelist.giant -n 82895 -k 113 -gml
> java -jar AstroConverter.jar Term -input ../astro/astro-ALP-2003-2010.csv -output astro > java -Xmx8g -XX:-UseGCOverheadLimit -cp jate-2.0-beta.1-jar-with-dependencies.jar uk.ac.shef.dcs.jate.app.AppCValue -corpusDir astro -c true -pf.mttf 3 -o cvalue-terms.json solr-testbed ACLRDTEC > python extract_terms.py cvalue-terms.json
> python labeler.py hard cvalue-terms.json.terms ../astro/direct.docs.vocab ../astro/n101831-k113-mmsb-linksampling/communities.txt ../astro/direct.micro.txt > python labeler.py hard cvalue-terms.json.terms ../astro/coupling.docs.vocab ../astro/n101053-k113-mmsb-linksampling/communities.txt ../astro/coupling.micro.txt > python labeler.py hard cvalue-terms.json.terms ../astro/cocitation.docs.vocab ../astro/n82895-k113-mmsb-linksampling/communities.txt ../astro/cocitation.micro.txt > python labeler.py soft cvalue-terms.json.terms ../astro/direct.docs.vocab ../astro/n101831-k113-mmsb-linksampling/communities.txt ../astro/n101831-k113-mmsb-linksampling/groups.txt > python labeler.py soft cvalue-terms.json.terms ../astro/coupling.docs.vocab ../astro/n101053-k113-mmsb-linksampling/communities.txt ../astro/n101053-k113-mmsb-linksampling/groups.txt > python labeler.py soft cvalue-terms.json.terms ../astro/cocitation.docs.vocab ../astro/n82895-k113-mmsb-linksampling/communities.txt ../astro/n82895-k113-mmsb-linksampling/groups.txt
> index = GiniIndex('../astro/direct.edgelist.giant.degree', '../astro/coupling.edgelist.giant.degree', '../astro/cocitation.edgelist.giant.degree')
> python statistics.py ../astro/n101831-k2396-mmsb-linksampling/communities.txt > python statistics.py ../astro/n101053-k992-mmsb-linksampling/communities.txt > python statistics.py ../astro/n82895-k634-mmsb-linksampling/communities.txt > python statistics.py ../astro/n101831-k113-mmsb-linksampling/communities.txt > python statistics.py ../astro/n101053-k113-mmsb-linksampling/communities.txt > python statistics.py ../astro/n82895-k113-mmsb-linksampling/communities.txt
> load cluster_size > cluster_size_distribution(direct_2396(:, 2), 100, 100) > cluster_size_distribution(coupling_992(:, 2), 100, 100) > cluster_size_distribution(cocitation_634(:, 2), 100, 100) > cluster_size_distribution(direct_113(:, 2), 50, 200) > cluster_size_distribution(coupling_113(:, 2), 50, 200) > cluster_size_distribution(cocitation_113(:, 2), 50, 200)
Q-Q (Quantile-Quantile) Plot: Analyze–>Discriptive Statistics–>Explore–>Plots, Normality plots with tests with selected.
> python membership.py ../astro/n101831-k113-mmsb-linksampling/communities.txt > python membership.py ../astro/n101053-k113-mmsb-linksampling/communities.txt > python membership.py ../astro/n82895-k113-mmsb-linksampling/communities.txt > DistOfMemberships('../astro/n101831-k113-mmsb-linksampling/communities.dist') > DistOfMemberships('../astro/n101053-k113-mmsb-linksampling/communities.dist') > DistOfMemberships('../astro/n82895-k113-mmsb-linksampling/communities.dist')
> java -jar AstroConverter.jar Overlap -threshold 150 -input ../astro/n101831-k113-mmsb-linksampling/communities.txt -output ../astro/direct > java -jar AstroConverter.jar Overlap -threshold 200 -input ../astro/n101053-k113-mmsb-linksampling/communities.txt -output ../astro/coupling > java -jar AstroConverter.jar Overlap -threshold 200 -input ../astro/n82895-k113-mmsb-linksampling/communities.txt -output ../astro/cocitation
Screenshot → Save → EPS (Encapsulated PostScript) files (*.eps)
load the file “subgraph.gml” in Gephi to visualize it. Use each node's “group” member and then edge's “color” member for “partitioning”. Use the bridgeness to size each node. (First, copy bridgeness to a new column with column type BigDecimal.)
> java -jar AstroConverter.jar Network -node 556 -top 4 -input ../astro/n101831-k113-mmsb-linksampling/gml/network.gml -output ../astro/n101831-k113-mmsb-linksampling/gml/subnetwork.gml > java -jar AstroConverter.jar Network -node 2881 -top 3 -input ../astro/n101053-k113-mmsb-linksampling/gml/network.gml -output ../astro/n101053-k113-mmsb-linksampling/gml/subnetwork.gml > java -jar AstroConverter.jar Network -node 354 -top 3 -input ../astro/n82895-k113-mmsb-linksampling/gml/network.gml -output ../astro/n82895-k113-mmsb-linksampling/gml/subnetwork.gml
Note that the option node is set to the first element in the first line of *.edgelist.giant.degree.
> python convert2csv.py ../astro/direct.docs.vocab ../astro/n101831-k113-mmsb-linksampling/communities.txt ../astro/n101831-k113-mmsb-linksampling/groups.txt > python convert2csv.py ../astro/coupling.docs.vocab ../astro/n101053-k113-mmsb-linksampling/communities.txt ../astro/n101831-k113-mmsb-linksampling/groups.txt > python convert2csv.py ../astro/cocitation.docs.vocab ../astro/n82895-k113-mmsb-linksampling/communities.txt ../astro/n82895-k113-mmsb-linksampling/groups.txt
> python coclustering.py coclustering ../astro/direct.csv ../astro/coupling.csv 5 > python coclustering.py coclustering ../astro/direct.csv ../astro/cocitation.csv 5 > python coclustering.py coclustering ../astro/cocitation.csv ../astro/coupling.csv 5 > python coclustering.py coclustering ../astro/hd.csv ../astro/direct.csv 5 > python coclustering.py coclustering ../astro/hd.csv ../astro/coupling.csv 5 > python coclustering.py coclustering ../astro/hd.csv ../astro/cocitation.csv 5
> python measure.py ../astro/direct.csv ../astro/coupling.csv 113 > python measure.py ../astro/direct.csv ../astro/cocitation.csv 113 > python measure.py ../astro/coupling.csv ../astro/cocitation.csv 113 > python measure.py ../astro/direct.csv ../astro/hd.csv 113 > python measure.py ../astro/coupling.csv ../astro/hd.csv 113 > python measure.py ../astro/cocitation.csv ../astro/hd.csv 113
评论