这里会显示出您选择的修订版和当前版本之间的差别。
| 两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
|
zh:notes:math_softwares [2026/04/05 11:40] pzczxs [Diversity Indicator Calculation] |
zh:notes:math_softwares [2026/04/28 12:00] (当前版本) pzczxs [Count One- and Two- Grams for PMI Calculation] |
||
|---|---|---|---|
| 行 30: | 行 30: | ||
| </code> | </code> | ||
| ===== Converters ===== | ===== Converters ===== | ||
| - | ==== Gram Counting for PMI Calculation ==== | + | ==== Export for Gaussian AT credit Model ==== |
| + | Project: <color red>MathSoftware</color> | ||
| + | |||
| + | The citing articles can be exported for Gaussian AT<sup>credit</sup> model by running <color red>ToGaussianATCreditConverter.java</color> in the pakcage <color red>cn.edu.bjut.converter</color>. | ||
| + | ==== Count One- and Two- Grams for PMI Calculation ==== | ||
| [[https://dumps.wikimedia.org/backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. | [[https://dumps.wikimedia.org/backup-index.html|Wikepedia dumps]] (version: 2026-03-01) should be downloaded in advance. | ||
| 行 57: | 行 61: | ||
| </code> | </code> | ||
| - | Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>GramCounter</color>). | + | Then, one- and two-grams statistics can be obtained by running the following code (cf. Project <color red>WikipediaTool</color>). |
| <code bash> | <code bash> | ||
| - | > nohup ./run-gram-counter.sh > run-gram-counter.log 2>&1 | + | > java -jar .\WikipediaTool.jar -t 0 -i ..\Wikipedia\ -o ..\Wikipedia-token\ > tokenization.log |
| + | > java -jar .\WikipediaTool.jar -t 1 -i ..\Wikipedia-token\ -o ..\Wikipedia-token-cleaned\ > cleaning.log | ||
| + | > java -jar .\WikipediaTool.jar -t 2 -d ..\data\math_software.word.vocab -i ..\Wikipedia-token-cleaned\ -o ..\data\Wikipedia\ > extraction.log | ||
| + | > java -Xmx25g -jar .\WikipediaTool.jar -t 3 -d ..\data\math_software.word.vocab -l XX -u XX -i ..\Wikipedia-token-cleaned\ -o ..\Wikipedia-statistics\ > counting.log | ||
| </code> | </code> | ||
| - | ==== Export for Gaussian AT credit Model ==== | ||
| - | Project: <color red>MathSoftware</color> | ||
| - | The citing articles can be exported for Gaussian AT<sup>credit</sup> model by running <color red>ToGaussianATCreditConverter.java</color> in the pakcage <color red>cn.edu.bjut.converter</color>. | ||
| ===== Extract Themes ===== | ===== Extract Themes ===== | ||
| 行 74: | 行 78: | ||
| </code> | </code> | ||
| ===== Calculate Diversity Indicators ===== | ===== Calculate Diversity Indicators ===== | ||
| - | The following three diversity indicators are calculated: (1) Rao-Stirling (), DIV, and Diversity. | + | The following three diversity indicators are calculated: (1) Rao-Stirling ([[https://www.jstor.org/stable/25050293|Rao, 1982]]; [[https://doi.org/10.1098/rsif.2007.0213|Stirling, 2007]]), DIV ([[https://doi.org/10.1016/j.joi.2019.03.016|Leydesdorff et al., 2019]]), and Diversity ([[https://doi.org/10.1007/s11192-022-04336-3|Mutz, 2022]]). |