用户工具

站点工具


zh:notes:common_specialty

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:common_specialty [2020/08/28 10:57]
pzczxs [Syntactic and Lexical Complexity]
zh:notes:common_specialty [2022/11/08 07:33] (当前版本)
pzczxs [Citation Information]
行 1: 行 1:
-====== ​An Approach for Detecting the Commonality and Specialty ​between Scientific Publications and Patents ​======+====== Commonality and Specialty ​Detection ​======
  
 +
 +===== Citation Information =====
 +Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2021. [[https://​doi.org/​10.1007/​s11192-021-04085-9|An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents]]. //​Scientometrics//,​ Vol. 126, No. 9, pp. 7445-7475. ​
  
 ===== Requirements ===== ===== Requirements =====
行 57: 行 60:
 To import the related information into the database with <color red>​PatentURLImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from the directory <color red>​data/​patents/​url</​color>​. ​ To import the related information into the database with <color red>​PatentURLImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from the directory <color red>​data/​patents/​url</​color>​. ​
  
-To update the country information with <color red>PatentCountryUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.+To update the country information with <color red>PatentOriginalCountryUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.
  
 +<!--
 To remove manually the irrelevant information from the abstract for the patent id = 5875.  To remove manually the irrelevant information from the abstract for the patent id = 5875. 
 <code sql> <code sql>
 SELECT * FROM patent WHERE id = 5875;  SELECT * FROM patent WHERE id = 5875; 
 </​code>​ </​code>​
 +-->
  
 ===== Export All Documents in the Format of genia ===== ===== Export All Documents in the Format of genia =====
行 81: 行 86:
 To run <color red>​HMMLDA.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from another project <color red>​HMM-LDA</​color>​. If you want, the resulting parameters can be set through a configure file <color red>​HMMLDA.properties</​color>,​ located in the directory <color red>​conf</​color>​. To run <color red>​HMMLDA.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from another project <color red>​HMM-LDA</​color>​. If you want, the resulting parameters can be set through a configure file <color red>​HMMLDA.properties</​color>,​ located in the directory <color red>​conf</​color>​.
 ===== Syntactic and Lexical Complexity before Filtering Stopwords ===== ===== Syntactic and Lexical Complexity before Filtering Stopwords =====
-To run <color red>​synatic complexity_Patent.ipynb</​color>​ and <color red>​synatic complexity_Article.ipynb</​color>​ in the directory <color red>​indicators/​before</​color>​ to calculate syntatic complexity indicators (Title/​Abstract/​Abstract average sentence Length); ​+To run <color red>​synatic complexity_Patent.py</​color>​ and <color red>​synatic complexity_Article.py</​color>​ in the directory <color red>​indicators/​before</​color>​ to calculate syntatic complexity indicators (Title/​Abstract/​Abstract average sentence Length); ​
  
-To run <color red>​Abs_Sen_Complexity_Patent.ipynb</​color>​ and <color red>​Abs_Sen_Complexity_Article.ipynb</​color>​ in the directory <color red>​indicators/​before</​color>​ to save parsed tree structure, and then with the tool <color red>​stanford-tregex</​color>​ to calculate sentence complexity;+To run <color red>​Abs_Sen_Complexity_Patent.py</​color>​ and <color red>​Abs_Sen_Complexity_Article.py</​color>​ in the directory <color red>​indicators/​before</​color>​ to save parsed tree structure, and then with the tool <color red>​stanford-tregex</​color>​ to calculate sentence complexity;
  
-To run <color red>​lexical complexity_Patent_Title.ipynb<​color>,​ <color red>​lexical complexity_Patent_Abs.ipynb</​color>​ and <color red>​lexical complexity_Article_Title.ipynb<​color>,​ <color red>​lexical complexity_Article_Abs.ipynb</​color>​ in the directory to calculate lexical complexity indicators (Lexical Diversity/ Sophistication /​Density); ​+To run <color red>​lexical complexity_Patent_Title.py</color>, <color red>​lexical complexity_Patent_Abs.py</​color>​ and <color red>​lexical complexity_Article_Title.py</color>, <color red>​lexical complexity_Article_Abs.py</​color>​ in the directory ​<color red>​indicators/​before</​color> ​to calculate lexical complexity indicators (Lexical Diversity/ Sophistication /​Density); ​ 
 + 
 +===== Syntactic and Lexical Complexity after Filtering Stopwords ===== 
 +To run <color red>​Mean_synatic_complexity_Patent.py</​color>​ and <color red>​Mean_synatic_complexity_Article.py</​color>​ in the directory <color red>​indicators/​after</​color>​ to calculate the meaningful synatic complexity indicators (Title/​Abstract/​Abstract average sentence Length);  
 + 
 +To run <color red>​Mean_lexical_complexity_Patent_Title.py</​color>,​ <color red>​Mean_lexical complexity_Patent_Abs.py</​color>,​ <color red>​Mean_lexical complexity_Article_Title.py</​color>,​ and <color red>​Mean_lexical_complexity_Article_Abs.py</​color>​ in the directory <color red>​indicators/​after</​color>​ to calculate the meaningful lexical complexity indicators (Lexical Diversity/ Sophistication /Density);  
 + 
 +===== Descriptive Statistics and Word Cloud ===== 
 +To run <color red>​Statistics.py</​color>​ and <color red>​overlap.py</​color>​ in the directory <color red>​indicators/​before</​color>​ to count the number of (overlapped) tokens and (overlapped) unique words;  
 + 
 +To run <color red>​Mean_Statistics.py</​color>,​ <color red>​Mean_overlap.py</​color>​ and <color red>​Non_overlap.py</​color>​ in the directory <color red>​indicators/​after</​color>​ to count the number of (overlapped) tokens and (overlapped) unique words, and save overlapped words with their corresponding word frequencies. ​
  
 ===== Format Data for the CDTM Model ===== ===== Format Data for the CDTM Model =====
-To run <color red>​Trans_CDTM.ipynb</​color>​ in the directory <color red>​CDTM-Test</​color>​. In this time, a dictionary will be generated with the extension files <color red>​.word.vocab</​color>​ and <color red>​ID.csv</​color>​. Then, with the help of Excel, to generate two documents with the extension name <color red>​.docs</​color>​ and <color red>​.corpus</​color>​.+To run <color red>​Trans_CDTM.py</​color>​ in the directory <color red>​CDTM-Test</​color>​. In this time, a dictionary will be generated with the extension files <color red>​.word.vocab</​color>​ and <color red>​ID.csv</​color>​. Then, with the help of Excel, to generate two documents with the extension name <color red>​.docs</​color>​ and <color red>​.corpus</​color>​.
 ===== Estimate a CDTM Model ===== ===== Estimate a CDTM Model =====
 To run <color red>​CdtmParameterTuning.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications,​ and the number of topics specific to patents. ​ To run <color red>​CdtmParameterTuning.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. The perplexity will be obtained for each candidate value combination of the number of common topics, the number of topics specific to scientific publications,​ and the number of topics specific to patents. ​
行 94: 行 109:
 Then, perplexity values are imported to MATLAB software, and to run <color red>​TuneParam.m</​color>​. A figure will be shown for the perplexity with different number of topics. By observing this figure, the optimal number of common and special topics will be determined. To run <color red>​Cdtm.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>,​ the final results will be obtained. ​ Then, perplexity values are imported to MATLAB software, and to run <color red>​TuneParam.m</​color>​. A figure will be shown for the perplexity with different number of topics. By observing this figure, the optimal number of common and special topics will be determined. To run <color red>​Cdtm.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>,​ the final results will be obtained. ​
 ===== Connections amongst Common and Special Topics ===== ===== Connections amongst Common and Special Topics =====
-To run <color red>​NetworkConverter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. One map file and one network file will be generated. Thus, one can import these two files into the software //​VOSviewer//​.  ​+To run <color red>​NetworkConverter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. One map file and one network file will be generated. Thus, one can import these two files into the software //​VOSviewer//​. 
 + 
 +~~DISCUSSION:​closed~~ ​ 
zh/notes/common_specialty.1598583453.txt.gz · 最后更改: 2020/08/28 10:57 由 pzczxs