用户工具

站点工具


zh:notes:multi-label-covid

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:multi-label-covid [2022/10/19 09:26]
pzczxs [Requirements]
zh:notes:multi-label-covid [2024/09/12 20:51] (当前版本)
pzczxs [Citation Information]
行 1: 行 1:
 ====== Multi-Label Topic Classification for COVID-19 Literature ====== ====== Multi-Label Topic Classification for COVID-19 Literature ======
 ===== Citation Information ===== ===== Citation Information =====
 +  -Shuo Xu, Yuefu Zhang, Liang Chen, and Xin An, 2024. [[https://​doi.org/​10.1093/​database/​baae106|Is Metadata of Articles about COVID-19 enough for Multi-Label Topic Classification Task]]? //​Database//,​ Vol. 2024, pp. baae106. [[https://​github.com/​pzczxs/​Enriched-BC7-LitCovid|Dataset]]
   -Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj, Jingcheng Du, Li Fang, Kai Wang, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh,​ Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Senja Pollak, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu, Richard Dufour, Yanis Labrak, Niladri Chatterjee, Kushagri Tandon, Fréjus A. A. Laleye, Loïc Rakotoson, Emmanuele Chersoni, Jinghang Gu, Annemarie Friedrich, Subhash Chandra Pujari, Mariia Chizhikova, Naveen Sivadasan, Saipradeep VG, and Zhiyong Lu, 2022. [[https://​doi.org/​10.1093/​database/​baac069|Multi-Label Classification for Biomedical Literature: An Overview of the BioCreative VII LitCovid Track for COVID-19 Literature Topic Annotations]]. //​Database//,​ Vol. 2022, pp. baac069.   -Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj, Jingcheng Du, Li Fang, Kai Wang, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh,​ Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Senja Pollak, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu, Richard Dufour, Yanis Labrak, Niladri Chatterjee, Kushagri Tandon, Fréjus A. A. Laleye, Loïc Rakotoson, Emmanuele Chersoni, Jinghang Gu, Annemarie Friedrich, Subhash Chandra Pujari, Mariia Chizhikova, Naveen Sivadasan, Saipradeep VG, and Zhiyong Lu, 2022. [[https://​doi.org/​10.1093/​database/​baac069|Multi-Label Classification for Biomedical Literature: An Overview of the BioCreative VII LitCovid Track for COVID-19 Literature Topic Annotations]]. //​Database//,​ Vol. 2022, pp. baac069.
-  -Shuo Xu, Yuefu Zhang, and Xin An, 2021. [[https://​biocreative.bioinformatics.udel.edu/​media/​store/​files/​2021/​TRACK5_pos_3_BC7_submission_217.pdf|Team BJUT-BJFU at BioCreative VII LitCovid Track: A Deep Learning based Method for Multi-label Topic Classification in COVID-19 Literature]]. //​Proceedings of teh BioCreative VII Challenge Evaluation Workshop//, pp. 275-277. ​+  -Shuo Xu, Yuefu Zhang, and Xin An, 2021. [[https://​biocreative.bioinformatics.udel.edu/​media/​store/​files/​2021/​TRACK5_pos_3_BC7_submission_217.pdf|Team BJUT-BJFU at BioCreative VII LitCovid Track: A Deep Learning based Method for Multi-label Topic Classification in COVID-19 Literature]]. //​Proceedings of the BioCreative VII Challenge Evaluation Workshop//, pp. 275-277. ​
  
 ===== Requirements ===== ===== Requirements =====
行 9: 行 10:
   *[[https://​mccormickml.com/​2019/​07/​22/​BERT-fine-tuning/​|Fine-tuning a pretrained model]]   *[[https://​mccormickml.com/​2019/​07/​22/​BERT-fine-tuning/​|Fine-tuning a pretrained model]]
   *[[http://​bioc.sourceforge.net/​|BioC]]   *[[http://​bioc.sourceforge.net/​|BioC]]
-  *[[https://​www.ncbi.nlm.nih.gov/​research/​pubtator/​index.html|PubTator]]+  *[[https://​www.ncbi.nlm.nih.gov/​research/​pubtator/​index.html|PubTator ​Central]]
   *[[https://​www.nlm.nih.gov/​databases/​download/​mesh.html|Medical Subject Headings (MeSH)]] ([[https://​meshb.nlm.nih.gov/​treeView|TreeView]])   *[[https://​www.nlm.nih.gov/​databases/​download/​mesh.html|Medical Subject Headings (MeSH)]] ([[https://​meshb.nlm.nih.gov/​treeView|TreeView]])
 ===== Dataset ===== ===== Dataset =====
   *[[https://​ftp.ncbi.nlm.nih.gov/​pub/​lu/​LitCovid/​biocreative/​|BC7-LitCovid]] from [[https://​biocreative.bioinformatics.udel.edu/​tasks/​biocreative-vii/​track-5/​|LitCovid track Multi-label topic classification for COVID-19 literature annotation]]   *[[https://​ftp.ncbi.nlm.nih.gov/​pub/​lu/​LitCovid/​biocreative/​|BC7-LitCovid]] from [[https://​biocreative.bioinformatics.udel.edu/​tasks/​biocreative-vii/​track-5/​|LitCovid track Multi-label topic classification for COVID-19 literature annotation]]
-  *[[https://​ftp.ncbi.nlm.nih.gov/​pub/​lu/​LitCovid/​|LitCovid]] (FTP), [[https://​www.ncbi.nlm.nih.gov/​research/​coronavirus/​ | LitCovid]] (Web Site)+  *[[https://​ftp.ncbi.nlm.nih.gov/​pub/​lu/​LitCovid/​|LitCovid]] (FTP), [[https://​www.ncbi.nlm.nih.gov/​research/​coronavirus/​|LitCovid]] (Web Site)
   *[[https://​ftp.ncbi.nlm.nih.gov/​pub/​wilbur/​BioC-PMC | BioC-PMC]] (FTP), [[https://​www.ncbi.nlm.nih.gov/​research/​bionlp/​APIs/​BioC-PMC/​|BioC-PMC]] (Web Site)   *[[https://​ftp.ncbi.nlm.nih.gov/​pub/​wilbur/​BioC-PMC | BioC-PMC]] (FTP), [[https://​www.ncbi.nlm.nih.gov/​research/​bionlp/​APIs/​BioC-PMC/​|BioC-PMC]] (Web Site)
  
行 50: 行 51:
 ===== Import Entity Annotations ===== ===== Import Entity Annotations =====
 ==== From LitCovid ==== ==== From LitCovid ====
-The biological entity annotations in the [[https://​ftp.ncbi.nlm.nih.gov/​pub/​lu/​LitCovid/​litcovid2pubtator.json.gz|BC7-LitCovid]] dataset can be imported to the database by running <color red>​LitCovidAnnotationImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.+The biological entity annotations in the [[https://​ftp.ncbi.nlm.nih.gov/​pub/​lu/​LitCovid/​litcovid2pubtator.xml.gz|BC7-LitCovid]] dataset can be imported to the database by running <color red>​LitCovidAnnotationImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. 
 + 
 +If you want, these fulltexts can be converted to BioC-XML format by running <color red>​LitCovidToBioCConventor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. In this time, these XML files are saved in the directory <color red>​data/​bioc-litcovid</​color>​. ​
  
 ==== From PubTator Annotations on BioC-PMC Fulltexts ==== ==== From PubTator Annotations on BioC-PMC Fulltexts ====
行 69: 行 72:
  
 As for our case, there are 7,028 articles in total. To fetch the resulting fulltext, each passage can be copied manually to an excel file. Then, one can run <color red>​FullTextWithExcelImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to import these fulltexts to the database. As for our case, there are 7,028 articles in total. To fetch the resulting fulltext, each passage can be copied manually to an excel file. Then, one can run <color red>​FullTextWithExcelImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to import these fulltexts to the database.
 +
 +These fulltexts are converted to BioC-XML format by running <color red>​CustomTextToBioCConventor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. In this time, these XML files are saved in the directory <color red>​data/​bioc-custom</​color>​. ​
  
 The requests are submitted to PubTator server by running <color red>​CustomTextSubmitRequester.java</​color>​ in the package <color red>​cn.edu.bjut.pubtator</​color>​. In this time, session number for each XML file can be obtained. ​ The requests are submitted to PubTator server by running <color red>​CustomTextSubmitRequester.java</​color>​ in the package <color red>​cn.edu.bjut.pubtator</​color>​. In this time, session number for each XML file can be obtained. ​
行 86: 行 91:
 </​code>​ </​code>​
  
-===== Import ​MeSH Headings ===== +===== Update ​MeSH Headings ===== 
-To extract XML files with <color red>​ArticleEFetchExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ with the help of [[https://​www.ncbi.nlm.nih.gov/​books/NBK25499/#chapter4.EFetch|E-Fetch API]], and save them in the directory ​<color red>data/url</​color>​.+To download ​the [[https://​www.nlm.nih.gov/​databases/download/mesh.html|MeSH data]] in the format of XML, and then to import it to the database by running <color red>​MeshHeadingImporter.java</​color> ​in the package ​<color red>cn.edu.bjut.ui</​color>​.
  
-Import MeSH Heading information to the database from the directory <color red>​data/​url</​color>​ by running <color red>MeshHeadingImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. Note that the resulting publication years and XML fragment for each author are also updated in this time. +To extract XML files with <color red>​ArticleEFetchExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ with the help of [[https://​www.ncbi.nlm.nih.gov/​books/​NBK25499/#​chapter4.EFetch|E-Fetch API]], and save them in the directory <color red>​data/​mesh</​color>​. 
 + 
 +Import MeSH Heading information to the database from the directory <color red>​data/​mesh</​color>​ by running <color red>MeshHeadingUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. Note that the resulting publication years and XML fragment for each author are also updated in this time. 
  
 ===== Update Publication Year ===== ===== Update Publication Year =====
行 105: 行 112:
 One can obtain the labelset statistics by running <color red>​LabelSetSummary.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. One can obtain the labelset statistics by running <color red>​LabelSetSummary.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.
  
-===== Export ​with CSV format ​for Deep Learning ​ ===== +===== Export for Deep Learning ​ ===== 
-One can run <color red>​CSVExporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to export ​the resulting ​information for deep learning. ​+One can run <color red>​CSVExporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to export ​all related ​information for deep learning.  
 + 
 +In addition, the entities and meshes can be separately exported by running <color red>​EntityExporter.java</​color>​ and <color red>​MeshHeadingExporter.java</​color>​ respectively in the package <color red>​cn.edu.bjut.ui</​color>​
  
 ===== Direct and Indirect Citations ===== ===== Direct and Indirect Citations =====
zh/notes/multi-label-covid.1666142768.txt.gz · 最后更改: 2022/10/19 09:26 由 pzczxs