用户工具

站点工具


zh:notes:math_softwares

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:math_softwares [2026/04/07 13:27]
pzczxs [Count One- and Two- Grams for PMI Calculation]
zh:notes:math_softwares [2026/04/28 12:00] (当前版本)
pzczxs [Count One- and Two- Grams for PMI Calculation]
行 30: 行 30:
 </​code>​ </​code>​
 ===== Converters ===== ===== Converters =====
 +==== Export for Gaussian AT credit Model ====
 +Project: <color red>​MathSoftware</​color>​
 +
 +The citing articles can be exported for Gaussian AT<​sup>​credit</​sup>​ model by running <color red>​ToGaussianATCreditConverter.java</​color>​ in the pakcage <color red>​cn.edu.bjut.converter</​color>​.
 ==== Count One- and Two- Grams for PMI Calculation ==== ==== Count One- and Two- Grams for PMI Calculation ====
 [[https://​dumps.wikimedia.org/​backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. ​ [[https://​dumps.wikimedia.org/​backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. ​
行 57: 行 61:
 </​code>​ </​code>​
  
-Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>Wikipedia</​color>​). ​+Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>WikipediaTool</​color>​). ​
 <code bash> <code bash>
 > java -jar .\WikipediaTool.jar -t 0 -i ..\Wikipedia\ -o ..\Wikipedia-token\ > tokenization.log > java -jar .\WikipediaTool.jar -t 0 -i ..\Wikipedia\ -o ..\Wikipedia-token\ > tokenization.log
 > java -jar .\WikipediaTool.jar -t 1 -i ..\Wikipedia-token\ -o ..\Wikipedia-token-cleaned\ > cleaning.log > java -jar .\WikipediaTool.jar -t 1 -i ..\Wikipedia-token\ -o ..\Wikipedia-token-cleaned\ > cleaning.log
-+java -jar .\WikipediaTool.jar -t 2 -d ..\data\math_software.word.vocab -i ..\Wikipedia-token-cleaned\ -o ..\data\Wikipedia\ > extraction.log 
 +> java -Xmx25g -jar .\WikipediaTool.jar -t 3 -d ..\data\math_software.word.vocab -l XX -u XX -i ..\Wikipedia-token-cleaned\ -o ..\Wikipedia-statistics\ > counting.log
 </​code>​ </​code>​
  
-==== Export for Gaussian AT credit Model ==== 
-Project: <color red>​MathSoftware</​color>​ 
  
-The citing articles can be exported for Gaussian AT<​sup>​credit</​sup>​ model by running <color red>​ToGaussianATCreditConverter.java</​color>​ in the pakcage <color red>​cn.edu.bjut.converter</​color>​. 
  
 ===== Extract Themes ===== ===== Extract Themes =====
zh/notes/math_softwares.1775539647.txt.gz · 最后更改: 2026/04/07 13:27 由 pzczxs