这里会显示出您选择的修订版和当前版本之间的差别。
| 两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
|
zh:notes:math_softwares [2026/04/07 13:27] pzczxs [Count One- and Two- Grams for PMI Calculation] |
zh:notes:math_softwares [2026/04/28 12:00] (当前版本) pzczxs [Count One- and Two- Grams for PMI Calculation] |
||
|---|---|---|---|
| 行 30: | 行 30: | ||
| </code> | </code> | ||
| ===== Converters ===== | ===== Converters ===== | ||
| + | ==== Export for Gaussian AT credit Model ==== | ||
| + | Project: <color red>MathSoftware</color> | ||
| + | |||
| + | The citing articles can be exported for Gaussian AT<sup>credit</sup> model by running <color red>ToGaussianATCreditConverter.java</color> in the pakcage <color red>cn.edu.bjut.converter</color>. | ||
| ==== Count One- and Two- Grams for PMI Calculation ==== | ==== Count One- and Two- Grams for PMI Calculation ==== | ||
| [[https://dumps.wikimedia.org/backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. | [[https://dumps.wikimedia.org/backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. | ||
| 行 57: | 行 61: | ||
| </code> | </code> | ||
| - | Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>Wikipedia</color>). | + | Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>WikipediaTool</color>). |
| <code bash> | <code bash> | ||
| > java -jar .\WikipediaTool.jar -t 0 -i ..\Wikipedia\ -o ..\Wikipedia-token\ > tokenization.log | > java -jar .\WikipediaTool.jar -t 0 -i ..\Wikipedia\ -o ..\Wikipedia-token\ > tokenization.log | ||
| > java -jar .\WikipediaTool.jar -t 1 -i ..\Wikipedia-token\ -o ..\Wikipedia-token-cleaned\ > cleaning.log | > java -jar .\WikipediaTool.jar -t 1 -i ..\Wikipedia-token\ -o ..\Wikipedia-token-cleaned\ > cleaning.log | ||
| - | > | + | > java -jar .\WikipediaTool.jar -t 2 -d ..\data\math_software.word.vocab -i ..\Wikipedia-token-cleaned\ -o ..\data\Wikipedia\ > extraction.log |
| + | > java -Xmx25g -jar .\WikipediaTool.jar -t 3 -d ..\data\math_software.word.vocab -l XX -u XX -i ..\Wikipedia-token-cleaned\ -o ..\Wikipedia-statistics\ > counting.log | ||
| </code> | </code> | ||
| - | ==== Export for Gaussian AT credit Model ==== | ||
| - | Project: <color red>MathSoftware</color> | ||
| - | The citing articles can be exported for Gaussian AT<sup>credit</sup> model by running <color red>ToGaussianATCreditConverter.java</color> in the pakcage <color red>cn.edu.bjut.converter</color>. | ||
| ===== Extract Themes ===== | ===== Extract Themes ===== | ||