用户工具

站点工具


zh:notes:math_softwares

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:math_softwares [2026/04/05 12:06]
pzczxs [Gram Counting for PMI Calculation]
zh:notes:math_softwares [2026/04/28 12:00] (当前版本)
pzczxs [Count One- and Two- Grams for PMI Calculation]
行 30: 行 30:
 </​code>​ </​code>​
 ===== Converters ===== ===== Converters =====
-==== Count One- and Two- Grams ====+==== Export for Gaussian AT credit Model ==== 
 +Project: <color red>​MathSoftware</​color>​ 
 + 
 +The citing articles can be exported for Gaussian AT<​sup>​credit</​sup>​ model by running <color red>​ToGaussianATCreditConverter.java</​color>​ in the pakcage <color red>​cn.edu.bjut.converter</​color>​. 
 +==== Count One- and Two- Grams for PMI Calculation ​====
 [[https://​dumps.wikimedia.org/​backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. ​ [[https://​dumps.wikimedia.org/​backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. ​
  
行 57: 行 61:
 </​code>​ </​code>​
  
-Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>GramCounter</​color>​). ​+Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>WikipediaTool</​color>​). ​
 <code bash> <code bash>
-nohup ./run-gram-counter.sh run-gram-counter.log 2>&1+java -jar .\WikipediaTool.jar -t 0 -..\Wikipedia\ -o ..\Wikipedia-token\ ​tokenization.log 
 +> java -jar .\WikipediaTool.jar -t 1 -i ..\Wikipedia-token\ -o ..\Wikipedia-token-cleaned\ > cleaning.log 
 +> java -jar .\WikipediaTool.jar -t -d ..\data\math_software.word.vocab -i ..\Wikipedia-token-cleaned\ -o ..\data\Wikipedia\ ​extraction.log 
 +> java -Xmx25g -jar .\WikipediaTool.jar -t 3 -d ..\data\math_software.word.vocab -l XX -u XX -i ..\Wikipedia-token-cleaned\ -o ..\Wikipedia-statistics\ > counting.log
 </​code>​ </​code>​
  
-==== Export for Gaussian AT credit Model ==== 
-Project: <color red>​MathSoftware</​color>​ 
  
-The citing articles can be exported for Gaussian AT<​sup>​credit</​sup>​ model by running <color red>​ToGaussianATCreditConverter.java</​color>​ in the pakcage <color red>​cn.edu.bjut.converter</​color>​. 
  
 ===== Extract Themes ===== ===== Extract Themes =====
zh/notes/math_softwares.1775362007.txt.gz · 最后更改: 2026/04/05 12:06 由 pzczxs