这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:electric_power [2024/04/21 19:06] pzczxs [Export for Novelty Indicator] |
zh:notes:electric_power [2024/05/19 22:08] (当前版本) pzczxs [Attributes Extraction] |
||
---|---|---|---|
行 1: | 行 1: | ||
====== Data Construction for Electric Power Domain ====== | ====== Data Construction for Electric Power Domain ====== | ||
===== Requirements ===== | ===== Requirements ===== | ||
- | *[[https://opennlp.apache.org/|OpenNLP 1.9.0]] | + | *[[https://opennlp.apache.org/|OpenNLP 1.9.2]] |
*[[http://bioc.sourceforge.net/|BioC]] | *[[http://bioc.sourceforge.net/|BioC]] | ||
===== Data Sources ===== | ===== Data Sources ===== | ||
行 258: | 行 258: | ||
</code> | </code> | ||
- | ==== Export for Novelty Indicator ==== | + | ==== Calculate Novelty Indicator ==== |
The resulting journal of each cited article can be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. In the meanwhile, the resulting IPC codes of each cited patent can also be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. Then, the novelty indicator of each document can be calculated by following [[https://doi.org/10.1126/science.1240474 | Uzzi et al. (2013)]]. | The resulting journal of each cited article can be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. In the meanwhile, the resulting IPC codes of each cited patent can also be exported by running <color red>NoveltyConvertor.java</color> in the package <color red>cn.edu.bjut.runner</color>. Then, the novelty indicator of each document can be calculated by following [[https://doi.org/10.1126/science.1240474 | Uzzi et al. (2013)]]. | ||
<code bash> | <code bash> | ||
> nohup ./convert-wos-novelty.sh ../dataset/WoS/novelty/20240421/paper-citations.txt > convert-wos-novelty.log 2>&1 | > nohup ./convert-wos-novelty.sh ../dataset/WoS/novelty/20240421/paper-citations.txt > convert-wos-novelty.log 2>&1 | ||
+ | > nohup ./convert-derwent-novelty.sh ../dataset/Derwent/novelty/20240421/patent-citations.txt > convert-derwent-novelty.log 2>&1 | ||
+ | </code> | ||
+ | |||
+ | <code sql> | ||
+ | > TRUNCATE article_novelty; | ||
+ | > TRUNCATE patent_novelty; | ||
+ | </code> | ||
+ | |||
+ | <code bash> | ||
+ | > nohup ./import-article-novelty.sh ../dataset/WoS/novelty/20240421/paper_novelty.txt > import-article-novelty.log 2>&1 | ||
+ | > nohup ./import-patent-novelty.sh ../dataset/Derwent/novelty/20240421/patent_novelty.txt > import-patent-novelty.log 2>&1 | ||
</code> | </code> | ||
==== Extract Citation Network ==== | ==== Extract Citation Network ==== | ||
行 280: | 行 291: | ||
One can run <color red>IndicatorCalculator.java</color> in the package <color red>cn.edu.bjut.ui</color> to calculate all indicators ([[https://doi.org/10.1016/j.techfore.2020.120366|Xu et al., 2021]]). | One can run <color red>IndicatorCalculator.java</color> in the package <color red>cn.edu.bjut.ui</color> to calculate all indicators ([[https://doi.org/10.1016/j.techfore.2020.120366|Xu et al., 2021]]). | ||
+ | |||
+ | ===== Attributes Extraction ===== | ||
+ | <code bash> | ||
+ | > .\opennlp TokenNameFinderTrainer.brat -nameTypes ATTRIBUTE,VALUE -lang en -model en-ElectronicPower-WoS-attributes.bin -annotationConfig annotation.conf -br | ||
+ | atDataDir ElectronicPower-WoS-Train -ruleBasedTokenizer simple -sentenceDetectorMode en-sent.bin | ||
+ | </code> | ||
~~DISCUSSION:closed~~ | ~~DISCUSSION:closed~~ |