这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:important_citation [2021/02/08 21:04] pzczxs [应用ParsCit工具解析文本格式数据] |
zh:notes:important_citation [2022/01/27 08:23] (当前版本) pzczxs 讨论状态变化了 |
||
---|---|---|---|
行 16: | 行 16: | ||
*xlwt | *xlwt | ||
*[[http://www.xpdfreader.com/|XPDF]] | *[[http://www.xpdfreader.com/|XPDF]] | ||
- | *R | + | *[[https://rstudio.com/|R]] |
*[[:zh:notes:install_parscit|ParsCit]] | *[[:zh:notes:install_parscit|ParsCit]] | ||
+ | *[[https://www.mysql.com/|MySQL]] | ||
===== Dataset ===== | ===== Dataset ===== | ||
行 73: | 行 74: | ||
===== Feature Engineering ===== | ===== Feature Engineering ===== | ||
==== CIM特征 ==== | ==== CIM特征 ==== | ||
- | To run <color red>CIM.java</color> in the package <color red>cn.edu.bjut.ui</color>,several files will be saved in the directory <color red>data/acl</color> and <color red>data/zhu</color>, respectively. | + | The database SQL files: <color red>acl_subset.sql</color> and <color red>zhu.sql</color>. These SQL files share the same table structure. |
- | 将施引文献及其参考文献的标题、摘要及引用关系等信息输入CIM模型中,生成.psi和.symKL文件 | + | To import the related information from the file <color red>data/pre_title.xls</color> to MySQL database by running <color red>AclExcelImporter.java</color> in the package <color red>cn.edu.bjut.ui</color> of the project <color red>DataConverter</color>. |
+ | |||
+ | To import the related information from the file <color red>data/zhu_data_0924.xlsx</color> to MySQL database by running <color red>ZhuExcelImporter.java</color> in the package <color red>cn.edu.bjut.ui</color> of the project <color red>DataConverter</color>. | ||
+ | |||
+ | To convert the data to the format of CIM model by running <color red>ToCIM.java</color> in the package <color red>cn.edu.bjut.ui</color> of the project <color red>DataConverter</color>. Note that the parameter "data/CIM/acl" for Valenzuela's dataset and "data/CIM/zhu" for Zhu's dataset. | ||
+ | |||
+ | To run <color red>CIM.java</color> in the package <color red>cn.edu.bjut.ui</color> of the project <color red>CIM</color>,several files will be saved in the directory <color red>data/acl</color> and <color red>data/zhu</color>, respectively. Note that the parameter "data/acl" for Valenzuela's dataset and "data/zhu" for Zhu's dataset. | ||
+ | |||
+ | To run <color red>FromCIM.java</color> in the package <color red>cn.edu.bjut.ui</color> of the project <color red>DataConverter</color>, two files <color red>.symKL</color> and <color red>.psi</color> will be saved in the directory <color red>data</color>. Note that the parameter "data/CIM/acl" for Valenzuela's dataset and "data/CIM/zhu" for Zhu's dataset. | ||
==== 结构特征 ==== | ==== 结构特征 ==== | ||
行 98: | 行 107: | ||
==== 线索词特征 ==== | ==== 线索词特征 ==== | ||
运行<color red>cue_words.py</color>, 线索词列表在<color red>./data/cue_words.xls</color>,通过正则表达式匹配来统计出现在引文中的重要线索词和不重要线索词的数量, 导出到Excel文件中。 | 运行<color red>cue_words.py</color>, 线索词列表在<color red>./data/cue_words.xls</color>,通过正则表达式匹配来统计出现在引文中的重要线索词和不重要线索词的数量, 导出到Excel文件中。 | ||
- | <code shell> | + | <code bash> |
> Python ./cue_words.py | > Python ./cue_words.py | ||
</code> | </code> | ||
行 137: | 行 146: | ||
> stats.ttest_rel(G1,G2) | > stats.ttest_rel(G1,G2) | ||
</code> | </code> | ||
+ | |||
+ | ~~DISCUSSION:closed~~ |