这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:common_specialty [2021/03/04 19:53] pzczxs [Syntactic and Lexical Complexity before Filtering Stopwords] |
zh:notes:common_specialty [2022/11/08 07:33] (当前版本) pzczxs [Citation Information] |
||
---|---|---|---|
行 3: | 行 3: | ||
===== Citation Information ===== | ===== Citation Information ===== | ||
- | Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2020. An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents. //Scientometrics//. | + | Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2021. [[https://doi.org/10.1007/s11192-021-04085-9|An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents]]. //Scientometrics//, Vol. 126, No. 9, pp. 7445-7475. |
===== Requirements ===== | ===== Requirements ===== | ||
行 90: | 行 90: | ||
To run <color red>Abs_Sen_Complexity_Patent.py</color> and <color red>Abs_Sen_Complexity_Article.py</color> in the directory <color red>indicators/before</color> to save parsed tree structure, and then with the tool <color red>stanford-tregex</color> to calculate sentence complexity; | To run <color red>Abs_Sen_Complexity_Patent.py</color> and <color red>Abs_Sen_Complexity_Article.py</color> in the directory <color red>indicators/before</color> to save parsed tree structure, and then with the tool <color red>stanford-tregex</color> to calculate sentence complexity; | ||
- | To run <color red>lexical complexity_Patent_Title.py<color>, <color red>lexical complexity_Patent_Abs.py</color> and <color red>lexical complexity_Article_Title.py<color>, <color red>lexical complexity_Article_Abs.py</color> in the directory <color red>indicators/before</color> to calculate lexical complexity indicators (Lexical Diversity/ Sophistication /Density); | + | To run <color red>lexical complexity_Patent_Title.py</color>, <color red>lexical complexity_Patent_Abs.py</color> and <color red>lexical complexity_Article_Title.py</color>, <color red>lexical complexity_Article_Abs.py</color> in the directory <color red>indicators/before</color> to calculate lexical complexity indicators (Lexical Diversity/ Sophistication /Density); |
===== Syntactic and Lexical Complexity after Filtering Stopwords ===== | ===== Syntactic and Lexical Complexity after Filtering Stopwords ===== | ||
- | To run <color red>Mean_synatic_complexity_Patent.ipynb</color> and <color red>Mean_synatic_complexity_Article.ipynb</color> in the directory <color red>indicators/after</color> to calculate the meaningful synatic complexity indicators (Title/Abstract/Abstract average sentence Length); | + | To run <color red>Mean_synatic_complexity_Patent.py</color> and <color red>Mean_synatic_complexity_Article.py</color> in the directory <color red>indicators/after</color> to calculate the meaningful synatic complexity indicators (Title/Abstract/Abstract average sentence Length); |
- | To run <color red>Mean_lexical_complexity_Patent_Title.ipynb</color>, <color red>Mean_lexical complexity_Patent_Abs.ipynb</color>, <color red>Mean_lexical complexity_Article_Title.ipynb</color>, and <color red>Mean_lexical_complexity_Article_Abs.ipynb</color> in the directory <color red>indicators/after</color> to calculate the meaningful lexical complexity indicators (Lexical Diversity/ Sophistication /Density); | + | To run <color red>Mean_lexical_complexity_Patent_Title.py</color>, <color red>Mean_lexical complexity_Patent_Abs.py</color>, <color red>Mean_lexical complexity_Article_Title.py</color>, and <color red>Mean_lexical_complexity_Article_Abs.py</color> in the directory <color red>indicators/after</color> to calculate the meaningful lexical complexity indicators (Lexical Diversity/ Sophistication /Density); |
===== Descriptive Statistics and Word Cloud ===== | ===== Descriptive Statistics and Word Cloud ===== | ||
- | To run <color red>Statistics.ipynb</color> and <color red>overlap.ipynb</color> in the directory <color red>indicators/before</color> to count the number of (overlapped) tokens and (overlapped) unique words; | + | To run <color red>Statistics.py</color> and <color red>overlap.py</color> in the directory <color red>indicators/before</color> to count the number of (overlapped) tokens and (overlapped) unique words; |
- | To run <color red>Mean_Statistics.ipynb</color>, <color red>Mean_overlap.ipynb</color> and <color red>Non_overlap.ipynb</color> in the directory <color red>indicators/after</color> to count the number of (overlapped) tokens and (overlapped) unique words, and save overlapped words with their corresponding word frequencies. | + | To run <color red>Mean_Statistics.py</color>, <color red>Mean_overlap.py</color> and <color red>Non_overlap.py</color> in the directory <color red>indicators/after</color> to count the number of (overlapped) tokens and (overlapped) unique words, and save overlapped words with their corresponding word frequencies. |
===== Format Data for the CDTM Model ===== | ===== Format Data for the CDTM Model ===== | ||
- | To run <color red>Trans_CDTM.ipynb</color> in the directory <color red>CDTM-Test</color>. In this time, a dictionary will be generated with the extension files <color red>.word.vocab</color> and <color red>ID.csv</color>. Then, with the help of Excel, to generate two documents with the extension name <color red>.docs</color> and <color red>.corpus</color>. | + | To run <color red>Trans_CDTM.py</color> in the directory <color red>CDTM-Test</color>. In this time, a dictionary will be generated with the extension files <color red>.word.vocab</color> and <color red>ID.csv</color>. Then, with the help of Excel, to generate two documents with the extension name <color red>.docs</color> and <color red>.corpus</color>. |
===== Estimate a CDTM Model ===== | ===== Estimate a CDTM Model ===== | ||
To run <color red>CdtmParameterTuning.java</color> in the package <color red>cn.edu.bjut.ui</color>. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications, and the number of topics specific to patents. | To run <color red>CdtmParameterTuning.java</color> in the package <color red>cn.edu.bjut.ui</color>. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications, and the number of topics specific to patents. |