用户工具

站点工具


zh:notes:common_specialty

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:common_specialty [2021/03/04 19:44]
pzczxs [Data Augmentation with OPS API]
zh:notes:common_specialty [2022/11/08 07:33] (当前版本)
pzczxs [Citation Information]
行 3: 行 3:
  
 ===== Citation Information ===== ===== Citation Information =====
-Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2020. An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents. //​Scientometrics//​.+Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2021[[https://​doi.org/​10.1007/​s11192-021-04085-9|An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents]]. //​Scientometrics//​, Vol. 126, No. 9, pp. 7445-7475
  
 ===== Requirements ===== ===== Requirements =====
行 62: 行 62:
 To update the country information with <color red>​PatentOriginalCountryUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. To update the country information with <color red>​PatentOriginalCountryUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.
  
 +<!--
 To remove manually the irrelevant information from the abstract for the patent id = 5875.  To remove manually the irrelevant information from the abstract for the patent id = 5875. 
 <code sql> <code sql>
 SELECT * FROM patent WHERE id = 5875;  SELECT * FROM patent WHERE id = 5875; 
 </​code>​ </​code>​
 +-->
  
 ===== Export All Documents in the Format of genia ===== ===== Export All Documents in the Format of genia =====
行 84: 行 86:
 To run <color red>​HMMLDA.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from another project <color red>​HMM-LDA</​color>​. If you want, the resulting parameters can be set through a configure file <color red>​HMMLDA.properties</​color>,​ located in the directory <color red>​conf</​color>​. To run <color red>​HMMLDA.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from another project <color red>​HMM-LDA</​color>​. If you want, the resulting parameters can be set through a configure file <color red>​HMMLDA.properties</​color>,​ located in the directory <color red>​conf</​color>​.
 ===== Syntactic and Lexical Complexity before Filtering Stopwords ===== ===== Syntactic and Lexical Complexity before Filtering Stopwords =====
-To run <color red>​synatic complexity_Patent.ipynb</​color>​ and <color red>​synatic complexity_Article.ipynb</​color>​ in the directory <color red>​indicators/​before</​color>​ to calculate syntatic complexity indicators (Title/​Abstract/​Abstract average sentence Length); ​+To run <color red>​synatic complexity_Patent.py</​color>​ and <color red>​synatic complexity_Article.py</​color>​ in the directory <color red>​indicators/​before</​color>​ to calculate syntatic complexity indicators (Title/​Abstract/​Abstract average sentence Length); ​
  
-To run <color red>​Abs_Sen_Complexity_Patent.ipynb</​color>​ and <color red>​Abs_Sen_Complexity_Article.ipynb</​color>​ in the directory <color red>​indicators/​before</​color>​ to save parsed tree structure, and then with the tool <color red>​stanford-tregex</​color>​ to calculate sentence complexity;+To run <color red>​Abs_Sen_Complexity_Patent.py</​color>​ and <color red>​Abs_Sen_Complexity_Article.py</​color>​ in the directory <color red>​indicators/​before</​color>​ to save parsed tree structure, and then with the tool <color red>​stanford-tregex</​color>​ to calculate sentence complexity;
  
-To run <color red>​lexical complexity_Patent_Title.ipynb<​color>,​ <color red>​lexical complexity_Patent_Abs.ipynb</​color>​ and <color red>​lexical complexity_Article_Title.ipynb<​color>,​ <color red>​lexical complexity_Article_Abs.ipynb</​color>​ in the directory <color red>​indicators/​before</​color>​ to calculate lexical complexity indicators (Lexical Diversity/ Sophistication /​Density); ​+To run <color red>​lexical complexity_Patent_Title.py</color>, <color red>​lexical complexity_Patent_Abs.py</​color>​ and <color red>​lexical complexity_Article_Title.py</color>, <color red>​lexical complexity_Article_Abs.py</​color>​ in the directory <color red>​indicators/​before</​color>​ to calculate lexical complexity indicators (Lexical Diversity/ Sophistication /​Density); ​
  
 ===== Syntactic and Lexical Complexity after Filtering Stopwords ===== ===== Syntactic and Lexical Complexity after Filtering Stopwords =====
-To run <color red>​Mean_synatic_complexity_Patent.ipynb</​color>​ and <color red>​Mean_synatic_complexity_Article.ipynb</​color>​ in the directory <color red>​indicators/​after</​color>​ to calculate the meaningful synatic complexity indicators (Title/​Abstract/​Abstract average sentence Length); ​+To run <color red>​Mean_synatic_complexity_Patent.py</​color>​ and <color red>​Mean_synatic_complexity_Article.py</​color>​ in the directory <color red>​indicators/​after</​color>​ to calculate the meaningful synatic complexity indicators (Title/​Abstract/​Abstract average sentence Length); ​
  
-To run <color red>​Mean_lexical_complexity_Patent_Title.ipynb</​color>,​ <color red>​Mean_lexical complexity_Patent_Abs.ipynb</​color>,​ <color red>​Mean_lexical complexity_Article_Title.ipynb</​color>,​ and <color red>​Mean_lexical_complexity_Article_Abs.ipynb</​color>​ in the directory <color red>​indicators/​after</​color>​ to calculate the meaningful lexical complexity indicators (Lexical Diversity/ Sophistication /​Density); ​+To run <color red>​Mean_lexical_complexity_Patent_Title.py</​color>,​ <color red>​Mean_lexical complexity_Patent_Abs.py</​color>,​ <color red>​Mean_lexical complexity_Article_Title.py</​color>,​ and <color red>​Mean_lexical_complexity_Article_Abs.py</​color>​ in the directory <color red>​indicators/​after</​color>​ to calculate the meaningful lexical complexity indicators (Lexical Diversity/ Sophistication /​Density); ​
  
 ===== Descriptive Statistics and Word Cloud ===== ===== Descriptive Statistics and Word Cloud =====
-To run <color red>​Statistics.ipynb</​color>​ and <color red>​overlap.ipynb</​color>​ in the directory <color red>​indicators/​before</​color>​ to count the number of (overlapped) tokens and (overlapped) unique words; ​+To run <color red>​Statistics.py</​color>​ and <color red>​overlap.py</​color>​ in the directory <color red>​indicators/​before</​color>​ to count the number of (overlapped) tokens and (overlapped) unique words; ​
  
-To run <color red>​Mean_Statistics.ipynb</​color>,​ <color red>​Mean_overlap.ipynb</​color>​ and <color red>​Non_overlap.ipynb</​color>​ in the directory <color red>​indicators/​after</​color>​ to count the number of (overlapped) tokens and (overlapped) unique words, and save overlapped words with their corresponding word frequencies. ​+To run <color red>​Mean_Statistics.py</​color>,​ <color red>​Mean_overlap.py</​color>​ and <color red>​Non_overlap.py</​color>​ in the directory <color red>​indicators/​after</​color>​ to count the number of (overlapped) tokens and (overlapped) unique words, and save overlapped words with their corresponding word frequencies. ​
  
 ===== Format Data for the CDTM Model ===== ===== Format Data for the CDTM Model =====
-To run <color red>​Trans_CDTM.ipynb</​color>​ in the directory <color red>​CDTM-Test</​color>​. In this time, a dictionary will be generated with the extension files <color red>​.word.vocab</​color>​ and <color red>​ID.csv</​color>​. Then, with the help of Excel, to generate two documents with the extension name <color red>​.docs</​color>​ and <color red>​.corpus</​color>​.+To run <color red>​Trans_CDTM.py</​color>​ in the directory <color red>​CDTM-Test</​color>​. In this time, a dictionary will be generated with the extension files <color red>​.word.vocab</​color>​ and <color red>​ID.csv</​color>​. Then, with the help of Excel, to generate two documents with the extension name <color red>​.docs</​color>​ and <color red>​.corpus</​color>​.
 ===== Estimate a CDTM Model ===== ===== Estimate a CDTM Model =====
 To run <color red>​CdtmParameterTuning.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications,​ and the number of topics specific to patents. ​ To run <color red>​CdtmParameterTuning.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications,​ and the number of topics specific to patents. ​
zh/notes/common_specialty.1614858258.txt.gz · 最后更改: 2021/03/04 19:44 由 pzczxs