这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:multi-label-covid [2023/12/10 21:40] pzczxs [From PubTator Annotations on Manual Fulltexts] |
zh:notes:multi-label-covid [2024/09/12 20:51] (当前版本) pzczxs [Citation Information] |
||
---|---|---|---|
行 1: | 行 1: | ||
====== Multi-Label Topic Classification for COVID-19 Literature ====== | ====== Multi-Label Topic Classification for COVID-19 Literature ====== | ||
===== Citation Information ===== | ===== Citation Information ===== | ||
+ | -Shuo Xu, Yuefu Zhang, Liang Chen, and Xin An, 2024. [[https://doi.org/10.1093/database/baae106|Is Metadata of Articles about COVID-19 enough for Multi-Label Topic Classification Task]]? //Database//, Vol. 2024, pp. baae106. [[https://github.com/pzczxs/Enriched-BC7-LitCovid|Dataset]] | ||
-Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj, Jingcheng Du, Li Fang, Kai Wang, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh, Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Senja Pollak, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu, Richard Dufour, Yanis Labrak, Niladri Chatterjee, Kushagri Tandon, Fréjus A. A. Laleye, Loïc Rakotoson, Emmanuele Chersoni, Jinghang Gu, Annemarie Friedrich, Subhash Chandra Pujari, Mariia Chizhikova, Naveen Sivadasan, Saipradeep VG, and Zhiyong Lu, 2022. [[https://doi.org/10.1093/database/baac069|Multi-Label Classification for Biomedical Literature: An Overview of the BioCreative VII LitCovid Track for COVID-19 Literature Topic Annotations]]. //Database//, Vol. 2022, pp. baac069. | -Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj, Jingcheng Du, Li Fang, Kai Wang, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh, Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Senja Pollak, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu, Richard Dufour, Yanis Labrak, Niladri Chatterjee, Kushagri Tandon, Fréjus A. A. Laleye, Loïc Rakotoson, Emmanuele Chersoni, Jinghang Gu, Annemarie Friedrich, Subhash Chandra Pujari, Mariia Chizhikova, Naveen Sivadasan, Saipradeep VG, and Zhiyong Lu, 2022. [[https://doi.org/10.1093/database/baac069|Multi-Label Classification for Biomedical Literature: An Overview of the BioCreative VII LitCovid Track for COVID-19 Literature Topic Annotations]]. //Database//, Vol. 2022, pp. baac069. | ||
- | -Shuo Xu, Yuefu Zhang, and Xin An, 2021. [[https://biocreative.bioinformatics.udel.edu/media/store/files/2021/TRACK5_pos_3_BC7_submission_217.pdf|Team BJUT-BJFU at BioCreative VII LitCovid Track: A Deep Learning based Method for Multi-label Topic Classification in COVID-19 Literature]]. //Proceedings of teh BioCreative VII Challenge Evaluation Workshop//, pp. 275-277. | + | -Shuo Xu, Yuefu Zhang, and Xin An, 2021. [[https://biocreative.bioinformatics.udel.edu/media/store/files/2021/TRACK5_pos_3_BC7_submission_217.pdf|Team BJUT-BJFU at BioCreative VII LitCovid Track: A Deep Learning based Method for Multi-label Topic Classification in COVID-19 Literature]]. //Proceedings of the BioCreative VII Challenge Evaluation Workshop//, pp. 275-277. |
===== Requirements ===== | ===== Requirements ===== | ||
行 51: | 行 52: | ||
==== From LitCovid ==== | ==== From LitCovid ==== | ||
The biological entity annotations in the [[https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/litcovid2pubtator.xml.gz|BC7-LitCovid]] dataset can be imported to the database by running <color red>LitCovidAnnotationImporter.java</color> in the package <color red>cn.edu.bjut.ui</color>. | The biological entity annotations in the [[https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/litcovid2pubtator.xml.gz|BC7-LitCovid]] dataset can be imported to the database by running <color red>LitCovidAnnotationImporter.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
+ | |||
+ | If you want, these fulltexts can be converted to BioC-XML format by running <color red>LitCovidToBioCConventor.java</color> in the package <color red>cn.edu.bjut.ui</color>. In this time, these XML files are saved in the directory <color red>data/bioc-litcovid</color>. | ||
==== From PubTator Annotations on BioC-PMC Fulltexts ==== | ==== From PubTator Annotations on BioC-PMC Fulltexts ==== | ||
行 70: | 行 73: | ||
As for our case, there are 7,028 articles in total. To fetch the resulting fulltext, each passage can be copied manually to an excel file. Then, one can run <color red>FullTextWithExcelImporter.java</color> in the package <color red>cn.edu.bjut.ui</color> to import these fulltexts to the database. | As for our case, there are 7,028 articles in total. To fetch the resulting fulltext, each passage can be copied manually to an excel file. Then, one can run <color red>FullTextWithExcelImporter.java</color> in the package <color red>cn.edu.bjut.ui</color> to import these fulltexts to the database. | ||
- | These fulltexts are converted to BioC-XML format by running <color red>ToBioCConventor.java</color> in the package <color red>cn.edu.bjut.ui</color>. In this time, these XML files are saved in the directory <color red>data/bioc-custom</color>. | + | These fulltexts are converted to BioC-XML format by running <color red>CustomTextToBioCConventor.java</color> in the package <color red>cn.edu.bjut.ui</color>. In this time, these XML files are saved in the directory <color red>data/bioc-custom</color>. |
The requests are submitted to PubTator server by running <color red>CustomTextSubmitRequester.java</color> in the package <color red>cn.edu.bjut.pubtator</color>. In this time, session number for each XML file can be obtained. | The requests are submitted to PubTator server by running <color red>CustomTextSubmitRequester.java</color> in the package <color red>cn.edu.bjut.pubtator</color>. In this time, session number for each XML file can be obtained. | ||
行 91: | 行 94: | ||
To download the [[https://www.nlm.nih.gov/databases/download/mesh.html|MeSH data]] in the format of XML, and then to import it to the database by running <color red>MeshHeadingImporter.java</color> in the package <color red>cn.edu.bjut.ui</color>. | To download the [[https://www.nlm.nih.gov/databases/download/mesh.html|MeSH data]] in the format of XML, and then to import it to the database by running <color red>MeshHeadingImporter.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
- | To extract XML files with <color red>ArticleEFetchExtractor.java</color> in the package <color red>cn.edu.bjut.ui</color> with the help of [[https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch|E-Fetch API]], and save them in the directory <color red>data/url</color>. | + | To extract XML files with <color red>ArticleEFetchExtractor.java</color> in the package <color red>cn.edu.bjut.ui</color> with the help of [[https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch|E-Fetch API]], and save them in the directory <color red>data/mesh</color>. |
- | Import MeSH Heading information to the database from the directory <color red>data/url</color> by running <color red>MeshHeadingUpdater.java</color> in the package <color red>cn.edu.bjut.ui</color>. Note that the resulting publication years and XML fragment for each author are also updated in this time. | + | Import MeSH Heading information to the database from the directory <color red>data/mesh</color> by running <color red>MeshHeadingUpdater.java</color> in the package <color red>cn.edu.bjut.ui</color>. Note that the resulting publication years and XML fragment for each author are also updated in this time. |
===== Update Publication Year ===== | ===== Update Publication Year ===== |