用户工具

站点工具


zh:notes:math_softwares

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:math_softwares [2026/04/05 11:39]
pzczxs [Topic Extraction]
zh:notes:math_softwares [2026/04/28 12:00] (当前版本)
pzczxs [Count One- and Two- Grams for PMI Calculation]
行 30: 行 30:
 </​code>​ </​code>​
 ===== Converters ===== ===== Converters =====
-==== Gram Counting ​for PMI Calculation ====+==== Export for Gaussian AT credit Model ==== 
 +Project: <color red>​MathSoftware</​color>​ 
 + 
 +The citing articles can be exported for Gaussian AT<​sup>​credit</​sup>​ model by running <color red>​ToGaussianATCreditConverter.java</​color>​ in the pakcage <color red>​cn.edu.bjut.converter</​color>​. 
 +==== Count One- and Two- Grams for PMI Calculation ====
 [[https://​dumps.wikimedia.org/​backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. ​ [[https://​dumps.wikimedia.org/​backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. ​
  
行 57: 行 61:
 </​code>​ </​code>​
  
-Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>GramCounter</​color>​). ​+Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>WikipediaTool</​color>​). ​
 <code bash> <code bash>
-nohup ./run-gram-counter.sh run-gram-counter.log 2>&1+java -jar .\WikipediaTool.jar -t 0 -..\Wikipedia\ -o ..\Wikipedia-token\ ​tokenization.log 
 +> java -jar .\WikipediaTool.jar -t 1 -i ..\Wikipedia-token\ -o ..\Wikipedia-token-cleaned\ > cleaning.log 
 +> java -jar .\WikipediaTool.jar -t -d ..\data\math_software.word.vocab -i ..\Wikipedia-token-cleaned\ -o ..\data\Wikipedia\ ​extraction.log 
 +> java -Xmx25g -jar .\WikipediaTool.jar -t 3 -d ..\data\math_software.word.vocab -l XX -u XX -i ..\Wikipedia-token-cleaned\ -o ..\Wikipedia-statistics\ > counting.log
 </​code>​ </​code>​
  
-==== Export for Gaussian AT credit Model ==== 
-Project: <color red>​MathSoftware</​color>​ 
  
-The citing articles can be exported for Gaussian AT<​sup>​credit</​sup>​ model by running <color red>​ToGaussianATCreditConverter.java</​color>​ in the pakcage <color red>​cn.edu.bjut.converter</​color>​. 
  
 ===== Extract Themes ===== ===== Extract Themes =====
行 73: 行 77:
 > java -jar .\GaussianATModelWithCredit.jar -D 50 -n 2000 -s 7 -fb math_software/​math_software -K 200 > java -jar .\GaussianATModelWithCredit.jar -D 50 -n 2000 -s 7 -fb math_software/​math_software -K 200
 </​code>​ </​code>​
-===== Diversity ​Indicator Calculation ​===== +===== Calculate ​Diversity ​Indicators ​===== 
 +The following three diversity indicators are calculated: (1) Rao-Stirling ([[https://​www.jstor.org/​stable/​25050293|Rao,​ 1982]]; [[https://​doi.org/​10.1098/​rsif.2007.0213|Stirling,​ 2007]]), DIV ([[https://​doi.org/​10.1016/​j.joi.2019.03.016|Leydesdorff et al., 2019]]), and Diversity ([[https://​doi.org/​10.1007/​s11192-022-04336-3|Mutz,​ 2022]]). ​
zh/notes/math_softwares.1775360350.txt.gz · 最后更改: 2026/04/05 11:39 由 pzczxs