这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:electric_power [2024/04/04 18:56] pzczxs [Update Technologies] |
zh:notes:electric_power [2024/05/19 22:08] (当前版本) pzczxs [Attributes Extraction] |
||
---|---|---|---|
行 1: | 行 1: | ||
====== Data Construction for Electric Power Domain ====== | ====== Data Construction for Electric Power Domain ====== | ||
===== Requirements ===== | ===== Requirements ===== | ||
- | *[[https://opennlp.apache.org/|OpenNLP 1.9.0]] | + | *[[https://opennlp.apache.org/|OpenNLP 1.9.2]] |
*[[http://bioc.sourceforge.net/|BioC]] | *[[http://bioc.sourceforge.net/|BioC]] | ||
===== Data Sources ===== | ===== Data Sources ===== | ||
行 68: | 行 68: | ||
The fundings can be merged by running <color red>FundingMerger.java</color> in the package <color red>cn.edu.bjut.ui</color> according to grant number. | The fundings can be merged by running <color red>FundingMerger.java</color> in the package <color red>cn.edu.bjut.ui</color> according to grant number. | ||
+ | |||
+ | <code bash> | ||
+ | > nohup ./merge-funding.sh > merge-funding.log 2>&1 | ||
+ | </code> | ||
==== Update Keyword ==== | ==== Update Keyword ==== | ||
行 214: | 行 218: | ||
<code bash> | <code bash> | ||
> nohup ./update-pct-flag.sh > update-pct-flag.log 2>&1 | > nohup ./update-pct-flag.sh > update-pct-flag.log 2>&1 | ||
+ | </code> | ||
+ | |||
+ | ==== Update Technologies ==== | ||
+ | One can run <color red>PatentTechnologyUpdater.java</color> in the package <color red>cn.edu.bjut.ui</color> to update the technologies. | ||
+ | |||
+ | <code bash> | ||
+ | > nohup ./update-patent-technology.sh 2 ../dataset/Derwent/technologies > update-patent-technology.log 2>&1 | ||
</code> | </code> | ||
===== Converter ===== | ===== Converter ===== | ||
行 247: | 行 258: | ||
</code> | </code> | ||
- | ==== Export for Novelty Indicator ==== | + | ==== Calculate Novelty Indicator ==== |
The resulting journal of each cited article can be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. In the meanwhile, the resulting IPC codes of each cited patent can also be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. Then, the novelty indicator of each document can be calculated by following [[https://doi.org/10.1126/science.1240474 | Uzzi et al. (2013)]]. | The resulting journal of each cited article can be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. In the meanwhile, the resulting IPC codes of each cited patent can also be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. Then, the novelty indicator of each document can be calculated by following [[https://doi.org/10.1126/science.1240474 | Uzzi et al. (2013)]]. | ||
+ | <code bash> | ||
+ | > nohup ./convert-wos-novelty.sh ../dataset/WoS/novelty/20240421/paper-citations.txt > convert-wos-novelty.log 2>&1 | ||
+ | > nohup ./convert-derwent-novelty.sh ../dataset/Derwent/novelty/20240421/patent-citations.txt > convert-derwent-novelty.log 2>&1 | ||
+ | </code> | ||
+ | |||
+ | <code sql> | ||
+ | > TRUNCATE article_novelty; | ||
+ | > TRUNCATE patent_novelty; | ||
+ | </code> | ||
+ | |||
+ | <code bash> | ||
+ | > nohup ./import-article-novelty.sh ../dataset/WoS/novelty/20240421/paper_novelty.txt > import-article-novelty.log 2>&1 | ||
+ | > nohup ./import-patent-novelty.sh ../dataset/Derwent/novelty/20240421/patent_novelty.txt > import-patent-novelty.log 2>&1 | ||
+ | </code> | ||
==== Extract Citation Network ==== | ==== Extract Citation Network ==== | ||
The citation network for articles can be extracted by running <color red>WoSCitationNetworkExtractor.java</color> in the package <color red>cn.edu.bjut.converter</color>. | The citation network for articles can be extracted by running <color red>WoSCitationNetworkExtractor.java</color> in the package <color red>cn.edu.bjut.converter</color>. | ||
行 266: | 行 291: | ||
One can run <color red>IndicatorCalculator.java</color> in the package <color red>cn.edu.bjut.ui</color> to calculate all indicators ([[https://doi.org/10.1016/j.techfore.2020.120366|Xu et al., 2021]]). | One can run <color red>IndicatorCalculator.java</color> in the package <color red>cn.edu.bjut.ui</color> to calculate all indicators ([[https://doi.org/10.1016/j.techfore.2020.120366|Xu et al., 2021]]). | ||
+ | |||
+ | ===== Attributes Extraction ===== | ||
+ | <code bash> | ||
+ | > .\opennlp TokenNameFinderTrainer.brat -nameTypes ATTRIBUTE,VALUE -lang en -model en-ElectronicPower-WoS-attributes.bin -annotationConfig annotation.conf -br | ||
+ | atDataDir ElectronicPower-WoS-Train -ruleBasedTokenizer simple -sentenceDetectorMode en-sent.bin | ||
+ | </code> | ||
~~DISCUSSION:closed~~ | ~~DISCUSSION:closed~~ |