这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:electric_power [2024/05/19 21:49] pzczxs [Indicator Calculator] |
zh:notes:electric_power [2025/07/23 15:11] (当前版本) pzczxs [Merge Journals] |
||
---|---|---|---|
行 1: | 行 1: | ||
====== Data Construction for Electric Power Domain ====== | ====== Data Construction for Electric Power Domain ====== | ||
===== Requirements ===== | ===== Requirements ===== | ||
- | *[[https://opennlp.apache.org/|OpenNLP 1.9.0]] | + | *[[https://opennlp.apache.org/|OpenNLP 1.9.2]] |
*[[http://bioc.sourceforge.net/|BioC]] | *[[http://bioc.sourceforge.net/|BioC]] | ||
===== Data Sources ===== | ===== Data Sources ===== | ||
行 77: | 行 77: | ||
<code bash> | <code bash> | ||
- | > nohup ./import-wos-keyword.sh 0 ../dataset/WoS/papers > import-wos-keyword.log 2>&1 | + | > nohup ./import-wos-keyword.sh ../dataset/WoS/papers > import-wos-keyword.log 2>&1 |
</code> | </code> | ||
==== Merge Cited Articles ==== | ==== Merge Cited Articles ==== | ||
行 125: | 行 125: | ||
The journals can be merged by running <color red>JournalMerger.java</color> in the package <color red>cn.edu.bjut.ui</color> according to ISSN, EISSN and ISBN. | The journals can be merged by running <color red>JournalMerger.java</color> in the package <color red>cn.edu.bjut.ui</color> according to ISSN, EISSN and ISBN. | ||
+ | <code bash> | ||
+ | > nohup ./merge-journal.sh > merge-journal.log 2>&1 | ||
+ | </code> | ||
==== Update Country ==== | ==== Update Country ==== | ||
<code sql> | <code sql> | ||
行 294: | 行 297: | ||
===== Attributes Extraction ===== | ===== Attributes Extraction ===== | ||
<code bash> | <code bash> | ||
- | > .\opennlp TokenNameFinderTrainer.brat -nameTypes TECHNOLOGY,VALUE -lang en -model en-ElectronicPower-WoS-attributes.bin -annotationConfig annotation.conf -br | + | > .\opennlp TokenNameFinderTrainer.brat -nameTypes ATTRIBUTE,VALUE -lang en -model en-ElectronicPower-WoS-attributes.bin -annotationConfig annotation.conf -br |
atDataDir ElectronicPower-WoS-Train -ruleBasedTokenizer simple -sentenceDetectorMode en-sent.bin | atDataDir ElectronicPower-WoS-Train -ruleBasedTokenizer simple -sentenceDetectorMode en-sent.bin | ||
</code> | </code> | ||
~~DISCUSSION:closed~~ | ~~DISCUSSION:closed~~ |