这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:common_specialty [2021/03/04 19:55] pzczxs [Syntactic and Lexical Complexity after Filtering Stopwords] |
zh:notes:common_specialty [2022/11/08 07:33] (当前版本) pzczxs [Citation Information] |
||
---|---|---|---|
行 3: | 行 3: | ||
===== Citation Information ===== | ===== Citation Information ===== | ||
- | Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2020. An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents. //Scientometrics//. | + | Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2021. [[https://doi.org/10.1007/s11192-021-04085-9|An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents]]. //Scientometrics//, Vol. 126, No. 9, pp. 7445-7475. |
===== Requirements ===== | ===== Requirements ===== | ||
行 98: | 行 98: | ||
===== Descriptive Statistics and Word Cloud ===== | ===== Descriptive Statistics and Word Cloud ===== | ||
- | To run <color red>Statistics.ipynb</color> and <color red>overlap.ipynb</color> in the directory <color red>indicators/before</color> to count the number of (overlapped) tokens and (overlapped) unique words; | + | To run <color red>Statistics.py</color> and <color red>overlap.py</color> in the directory <color red>indicators/before</color> to count the number of (overlapped) tokens and (overlapped) unique words; |
- | To run <color red>Mean_Statistics.ipynb</color>, <color red>Mean_overlap.ipynb</color> and <color red>Non_overlap.ipynb</color> in the directory <color red>indicators/after</color> to count the number of (overlapped) tokens and (overlapped) unique words, and save overlapped words with their corresponding word frequencies. | + | To run <color red>Mean_Statistics.py</color>, <color red>Mean_overlap.py</color> and <color red>Non_overlap.py</color> in the directory <color red>indicators/after</color> to count the number of (overlapped) tokens and (overlapped) unique words, and save overlapped words with their corresponding word frequencies. |
===== Format Data for the CDTM Model ===== | ===== Format Data for the CDTM Model ===== | ||
- | To run <color red>Trans_CDTM.ipynb</color> in the directory <color red>CDTM-Test</color>. In this time, a dictionary will be generated with the extension files <color red>.word.vocab</color> and <color red>ID.csv</color>. Then, with the help of Excel, to generate two documents with the extension name <color red>.docs</color> and <color red>.corpus</color>. | + | To run <color red>Trans_CDTM.py</color> in the directory <color red>CDTM-Test</color>. In this time, a dictionary will be generated with the extension files <color red>.word.vocab</color> and <color red>ID.csv</color>. Then, with the help of Excel, to generate two documents with the extension name <color red>.docs</color> and <color red>.corpus</color>. |
===== Estimate a CDTM Model ===== | ===== Estimate a CDTM Model ===== | ||
To run <color red>CdtmParameterTuning.java</color> in the package <color red>cn.edu.bjut.ui</color>. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications, and the number of topics specific to patents. | To run <color red>CdtmParameterTuning.java</color> in the package <color red>cn.edu.bjut.ui</color>. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications, and the number of topics specific to patents. |