用户工具

站点工具


zh:notes:electric_power

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:electric_power [2024/04/03 21:18]
pzczxs [Update Cited Patents with Target Ones]
zh:notes:electric_power [2024/05/19 22:08] (当前版本)
pzczxs [Attributes Extraction]
行 1: 行 1:
 ====== Data Construction for Electric Power Domain ====== ====== Data Construction for Electric Power Domain ======
 ===== Requirements ===== ===== Requirements =====
-  *[[https://​opennlp.apache.org/​|OpenNLP 1.9.0]]+  *[[https://​opennlp.apache.org/​|OpenNLP 1.9.2]]
   *[[http://​bioc.sourceforge.net/​|BioC]]   *[[http://​bioc.sourceforge.net/​|BioC]]
 ===== Data Sources ===== ===== Data Sources =====
行 68: 行 68:
  
 The fundings can be merged by running <color red>​FundingMerger.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ according to grant number. The fundings can be merged by running <color red>​FundingMerger.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ according to grant number.
 +
 +<code bash>
 +> nohup ./​merge-funding.sh > merge-funding.log 2>&1
 +</​code>​
  
 ==== Update Keyword ==== ==== Update Keyword ====
 <color red>​KeywordUpdater.java</​color>​ <color red>​KeywordUpdater.java</​color>​
 +
 +<code bash>
 +> nohup ./​import-wos-keyword.sh 0 ../​dataset/​WoS/​papers > import-wos-keyword.log 2>&1
 +</​code>​
 ==== Merge Cited Articles ==== ==== Merge Cited Articles ====
 <code sql> <code sql>
行 153: 行 161:
  
 <code bash> <code bash>
-> nohup ./​update-article-technology.sh ​-l 0 > update-article-technology.log 2>&1+> nohup ./​update-article-technology.sh 0 > update-article-technology.log 2>&1
 </​code>​ </​code>​
 ===== Derwent Innovation Index ===== ===== Derwent Innovation Index =====
行 175: 行 183:
 <code bash> <code bash>
 > nohup ./​update-patent-cited-patent.sh > patent-cited-patent.log 2>&1 > nohup ./​update-patent-cited-patent.sh > patent-cited-patent.log 2>&1
-> nohup ./run-update-by-log.sh > patent-cited-patent-log.log 2>&1+> nohup ./update-patent-cited-patent-by-log.sh > update-patent-cited-patent-by-log.log 2>&1
 </​code>​ </​code>​
 ==== Download Cited Patents ==== ==== Download Cited Patents ====
行 210: 行 218:
 <code bash> <code bash>
 > nohup ./​update-pct-flag.sh > update-pct-flag.log 2>&1 > nohup ./​update-pct-flag.sh > update-pct-flag.log 2>&1
 +</​code>​
 +
 +==== Update Technologies ====
 +One can run <color red>​PatentTechnologyUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to update the technologies. ​
 +
 +<code bash>
 +> nohup ./​update-patent-technology.sh 2 ../​dataset/​Derwent/​technologies > update-patent-technology.log 2>&1
 </​code>​ </​code>​
 ===== Converter ===== ===== Converter =====
行 221: 行 236:
  
 The target articles and patents can be exported by running <color red>​ToTextConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.converter</​color>​. Then, one can extract the abbreviations from the titles and abstracts with the approach in [[https://​doi.org/​10.1142/​9789812776303_0042|Schwartz and Hearst (2003)]] by running <color red>​AbbreviationExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​ The target articles and patents can be exported by running <color red>​ToTextConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.converter</​color>​. Then, one can extract the abbreviations from the titles and abstracts with the approach in [[https://​doi.org/​10.1142/​9789812776303_0042|Schwartz and Hearst (2003)]] by running <color red>​AbbreviationExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​
 +
 +<code bash>
 +> nohup ./​convert-wos-text.sh > convert-wos-text.log 2>&1
 +> nohup ./​convert-derwent-text.sh > convert-derwent-text.log 2>&1
 +</​code>​
  
 ==== Export for TNG model ==== ==== Export for TNG model ====
行 232: 行 252:
 The target articles/​patents can be exported for [[https://​doi.org/​10.1109/​ICDM.2007.86|TNG (Topic N-Gram) model]] by running <color red>​ToTNGConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.converter</​color>​. ​ The target articles/​patents can be exported for [[https://​doi.org/​10.1109/​ICDM.2007.86|TNG (Topic N-Gram) model]] by running <color red>​ToTNGConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.converter</​color>​. ​
  
-==== Export for Novelty Indicator ====+<code bash> 
 +> nohup ./​convert-wos-bioc.sh > convert-wos-bioc.log 2>&​1 
 +> nohup ./​convert-derwent-bioc.sh > convert-derwent-bioc.log 2>&​1 
 +> nohup ./​convert-tng.sh > convert-tng.log 2>&​1 
 +</​code>​ 
 + 
 +==== Calculate ​Novelty Indicator ====
 The resulting journal of each cited article can be exported by running <color red>​NoveltyConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.runner</​color>​. In the meanwhile, the resulting IPC codes of each cited patent can also be exported by running <color red>​NoveltyConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.runner</​color>​. Then, the novelty indicator of each document can be calculated by following [[https://​doi.org/​10.1126/​science.1240474 | Uzzi et al. (2013)]]. The resulting journal of each cited article can be exported by running <color red>​NoveltyConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.runner</​color>​. In the meanwhile, the resulting IPC codes of each cited patent can also be exported by running <color red>​NoveltyConvertor.java</​color>​ in the package <color red>​cn.edu.bjut.runner</​color>​. Then, the novelty indicator of each document can be calculated by following [[https://​doi.org/​10.1126/​science.1240474 | Uzzi et al. (2013)]].
  
 +<code bash>
 +> nohup ./​convert-wos-novelty.sh ../​dataset/​WoS/​novelty/​20240421/​paper-citations.txt > convert-wos-novelty.log 2>&1
 +> nohup ./​convert-derwent-novelty.sh ../​dataset/​Derwent/​novelty/​20240421/​patent-citations.txt > convert-derwent-novelty.log 2>&1
 +</​code>​
 +
 +<code sql>
 +> TRUNCATE article_novelty;​
 +> TRUNCATE patent_novelty;​
 +</​code>​
 +
 +<code bash>
 +> nohup ./​import-article-novelty.sh ../​dataset/​WoS/​novelty/​20240421/​paper_novelty.txt > import-article-novelty.log 2>&1
 +> nohup ./​import-patent-novelty.sh ../​dataset/​Derwent/​novelty/​20240421/​patent_novelty.txt > import-patent-novelty.log 2>&1
 +</​code>​
 ==== Extract Citation Network ==== ==== Extract Citation Network ====
 The citation network for articles can be extracted by running <color red>​WoSCitationNetworkExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.converter</​color>​. The citation network for articles can be extracted by running <color red>​WoSCitationNetworkExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.converter</​color>​.
行 251: 行 291:
  
 One can run <color red>​IndicatorCalculator.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to calculate all indicators ([[https://​doi.org/​10.1016/​j.techfore.2020.120366|Xu et al., 2021]]). ​ One can run <color red>​IndicatorCalculator.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to calculate all indicators ([[https://​doi.org/​10.1016/​j.techfore.2020.120366|Xu et al., 2021]]). ​
 +
 +===== Attributes Extraction =====
 +<code bash>
 +> .\opennlp TokenNameFinderTrainer.brat -nameTypes ATTRIBUTE,​VALUE -lang en -model en-ElectronicPower-WoS-attributes.bin -annotationConfig annotation.conf -br
 +atDataDir ElectronicPower-WoS-Train -ruleBasedTokenizer simple -sentenceDetectorMode en-sent.bin
 +</​code>​
  
 ~~DISCUSSION:​closed~~ ~~DISCUSSION:​closed~~
zh/notes/electric_power.1712150309.txt.gz · 最后更改: 2024/04/03 21:18 由 pzczxs