用户工具

站点工具


zh:notes:important_citation

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:important_citation [2021/02/08 21:03]
pzczxs [应用ParsCit工具解析文本格式数据]
zh:notes:important_citation [2022/01/27 08:23] (当前版本)
pzczxs 讨论状态变化了
行 16: 行 16:
     *xlwt     *xlwt
   *[[http://​www.xpdfreader.com/​|XPDF]]   *[[http://​www.xpdfreader.com/​|XPDF]]
-  *R+  *[[https://​rstudio.com/​|R]]
   *[[:​zh:​notes:​install_parscit|ParsCit]]   *[[:​zh:​notes:​install_parscit|ParsCit]]
 +  *[[https://​www.mysql.com/​|MySQL]]
  
 ===== Dataset ===== ===== Dataset =====
行 50: 行 51:
 <code bash> <code bash>
 > cd ParsCit > cd ParsCit
 +> ./​run_parscit.sh valenzuela_txt
 > ./​run_parscit.sh zhu_txt > ./​run_parscit.sh zhu_txt
 </​code>​ </​code>​
行 72: 行 74:
 ===== Feature Engineering ===== ===== Feature Engineering =====
 ==== CIM特征 ==== ==== CIM特征 ====
-To run <color red>CIM.java</​color> ​in the package ​<color red>cn.edu.bjut.ui</​color>​,several ​files will be saved in the directory <color red>​data/​acl</​color>​ and <color red>​data/​zhu</​color>,​ respectively+The database SQL files: ​<color red>acl_subset.sql</​color> ​and <color red>zhu.sql</​color>​. These SQL files share the same table structure
  
-将施引文献及其参考文献的标题、摘要及引用关系等信息输入CIM模型中,生成.psi和.symKL文件+To import the related information from the file <color red>​data/​pre_title.xls</​color>​ to MySQL database by running <color red>​AclExcelImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ of the project <color red>​DataConverter</​color>​. 
 + 
 +To import the related information from the file <color red>​data/​zhu_data_0924.xlsx</​color>​ to MySQL database by running <color red>​ZhuExcelImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ of the project <color red>​DataConverter</​color>​. 
 + 
 +To convert the data to the format of CIM model by running <color red>​ToCIM.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ of the project <color red>​DataConverter</​color>​. Note that the parameter "​data/​CIM/​acl"​ for Valenzuela'​s dataset and "​data/​CIM/​zhu"​ for Zhu's dataset. 
 + 
 +To run <color red>​CIM.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ of the project <color red>​CIM</​color>​,several files will be saved in the directory <color red>​data/​acl</​color>​ and <color red>​data/​zhu</​color>​respectivelyNote that the parameter "​data/​acl"​ for Valenzuela'​s dataset and "​data/​zhu"​ for Zhu's dataset. 
 + 
 +To run <color red>​FromCIM.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ of the project <color red>​DataConverter</​color>,​ two files <color red>.symKL</​color>​ and <color red>​.psi</​color>​ will be saved in the directory <color red>​data</​color>​. Note that the parameter "​data/​CIM/​acl"​ for Valenzuela'​s dataset and "​data/​CIM/​zhu"​ for Zhu's dataset.
  
 ==== 结构特征 ==== ==== 结构特征 ====
行 97: 行 107:
 ==== 线索词特征 ==== ==== 线索词特征 ====
 运行<​color red>​cue_words.py</​color>,​ 线索词列表在<​color red>​./​data/​cue_words.xls</​color>​,通过正则表达式匹配来统计出现在引文中的重要线索词和不重要线索词的数量,​ 导出到Excel文件中。 运行<​color red>​cue_words.py</​color>,​ 线索词列表在<​color red>​./​data/​cue_words.xls</​color>​,通过正则表达式匹配来统计出现在引文中的重要线索词和不重要线索词的数量,​ 导出到Excel文件中。
-<​code ​shell>+<​code ​bash>
 > Python ./​cue_words.py > Python ./​cue_words.py
 </​code>​ </​code>​
行 136: 行 146:
 > stats.ttest_rel(G1,​G2) > stats.ttest_rel(G1,​G2)
 </​code>​ </​code>​
 +
 +~~DISCUSSION:​closed~~
zh/notes/important_citation.1612789425.txt.gz · 最后更改: 2021/02/08 21:03 由 pzczxs