这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:drugbank [2022/09/27 21:32] pzczxs [Affiliation] |
zh:notes:drugbank [2025/01/17 22:12] (当前版本) pzczxs [Citation Information] |
||
---|---|---|---|
行 1: | 行 1: | ||
====== DrugBank Dataset Construction with Extended Information ====== | ====== DrugBank Dataset Construction with Extended Information ====== | ||
+ | |||
+ | ===== Citation Information ===== | ||
+ | -Shuo Xu, Zhen Liu, Xin An, Hong Wang, and Hongshen Pang, 2025. [[https://doi.org/10.1016/j.joi.2024.101617|Linkages among Science, Technology, and Industry on the basis of Main Path Analysis]]. //Journal of Informetrics//, Vol. 19, No. 1, pp. 101617. | ||
+ | -Shuo Xu, Xinyi Ma, Hong Wang, Xin An, and Ling Li, 2024. [[https://doi.org/10.1016/j.joi.2024.101557|A Recommendation Approach of Scientific Non-Patent Literature on the basis of Heterogeneous Information Network]]. //Journal of Informetrics//, Vol. 18, No. 4, pp. 101557. | ||
+ | -徐硕,孙童菲,罗贵缘,苑洲桐,连佳欣,刘畅,2024. [[https://doi.org/10.3969/j.issn.1672-6081.2024.04.001|分类体系双向映射视角下的科学-技术互动分析]]. //中国发明与专利//,Vol. 21,No. 4,pp. 4-15. | ||
+ | -Shuo Xu, Zhen Liu, and Xin An, 2023. [[https://ceur-ws.org/Vol-3451/paper3.pdf|Linkages among Science, Technology, and Industry]]. //The Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 3rd AI + Informetrics (EEKE-AII 2023)//, pp. 13-15. | ||
+ | -Shuo Xu, Ling Li, and Xin An, 2023. [[https://doi.org/10.1007/s11192-022-04587-0|Do Academic Inventors have Diverse Interests?]] //Scientometrics//, Vol. 128, No. 2, pp. 1023-1053. [[:zh:notes:drugbank|Note]] | ||
+ | -Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2021. [[https://doi.org/10.1007/s11192-021-04085-9|An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents]]. //Scientometrics//, Vol. 126, No. 9, pp. 7445-7475. | ||
===== Create Database ===== | ===== Create Database ===== | ||
行 19: | 行 27: | ||
==== Update PMC id and doi ==== | ==== Update PMC id and doi ==== | ||
- | To download [[ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz|PMC-ids-csv.gz]], and save it in the directory <color red>resource</color>. | + | To download [[https://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz|PMC-ids-csv.gz]], and save it in the directory <color red>resource</color>. |
To run <color red>ArticlePmcIdAndDoiUpdater.java</color> in the package <color red>cn.edu.bjut.ui</color>. | To run <color red>ArticlePmcIdAndDoiUpdater.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
行 57: | 行 65: | ||
==== Update Mesh Heading ==== | ==== Update Mesh Heading ==== | ||
+ | To download the [[https://www.nlm.nih.gov/databases/download/mesh.html|MeSH data]] in the format of XML, and then to import it to the database by running <color red>MeshHeadingImporter.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
+ | |||
To extract XML files with <color red>ArticleURLExtractor.java</color> in the package <color red>cn.edu.bjut.ui</color>, and save them in the directory <color red>data/articles/url4mesh</color>. | To extract XML files with <color red>ArticleURLExtractor.java</color> in the package <color red>cn.edu.bjut.ui</color>, and save them in the directory <color red>data/articles/url4mesh</color>. | ||
- | To import the related information into the database with <color red>MeshHeadingImporter.java</color> in the package <color red>cn.edu.bjut.ui</color> from the directory <color red>data/articles/url4mesh</color>. | + | To import the related information into the database with <color red>MeshHeadingUpdater.java</color> in the package <color red>cn.edu.bjut.ui</color> from the directory <color red>data/articles/url4mesh</color>. |
===== Authors ===== | ===== Authors ===== | ||
One can correct and merge authors by running <color red>ArticleAuthorCorrector.java</color> in the package <color red>cn.edu.bjut.ui</color> and then <color red>ArticleChecker.java</color> in the package <color red>cn.edu.bjut.ui</color>. | One can correct and merge authors by running <color red>ArticleAuthorCorrector.java</color> in the package <color red>cn.edu.bjut.ui</color> and then <color red>ArticleChecker.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
行 68: | 行 78: | ||
One can extract the affiliation information from XML files by running <color red>ArticleAffiliationRawImporter.java</color> in the package <color red>cn.edu.bjut.ui</color>. | One can extract the affiliation information from XML files by running <color red>ArticleAffiliationRawImporter.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
- | To recognize institution and country, the related information can be exported by running the following SQL statements: | + | To recognize manually the resulting institution and country, the related information can be exported by running the following SQL statements: |
<code sql> | <code sql> | ||
> SELECT article_id, REPLACE(title, "\r\n", " ") as title, doi, pubmed_id, pmc_id, entity_id, last_name, first_name, aes.seq_no AS seq_no, REPLACE(xml_fragment, "\r\n", "||||||||") as xml_fragment, replace(raw_affiliation, "\r\n", "||||||||") AS affiliation, aesa.seq_no AS seq_no FROM article_entity_science AS aes, article AS a, entity_science AS es, article_entity_science_affiliation AS aesa WHERE aes.entity_id = es.id AND aes.id = aesa.article_entity_science_id AND aes.article_id = a.id ORDER BY article_id ASC, aes.seq_no ASC, aesa.seq_no ASC; | > SELECT article_id, REPLACE(title, "\r\n", " ") as title, doi, pubmed_id, pmc_id, entity_id, last_name, first_name, aes.seq_no AS seq_no, REPLACE(xml_fragment, "\r\n", "||||||||") as xml_fragment, replace(raw_affiliation, "\r\n", "||||||||") AS affiliation, aesa.seq_no AS seq_no FROM article_entity_science AS aes, article AS a, entity_science AS es, article_entity_science_affiliation AS aesa WHERE aes.entity_id = es.id AND aes.id = aesa.article_entity_science_id AND aes.article_id = a.id ORDER BY article_id ASC, aes.seq_no ASC, aesa.seq_no ASC; | ||
行 74: | 行 84: | ||
</code> | </code> | ||
+ | Then, one can run <color red>ArticleAffiliationImporter.java</color> in the package <color red>cn.edu.bjut.ui</color> to import these affiliations and countries to the database. | ||
===== Patents ===== | ===== Patents ===== | ||
To extract XML files with <color red>PatentURLExtractor.java</color> in the package <color red>cn.edu.bjut.ui</color>, and save them in the directory <color red>data/patents/url</color>. | To extract XML files with <color red>PatentURLExtractor.java</color> in the package <color red>cn.edu.bjut.ui</color>, and save them in the directory <color red>data/patents/url</color>. | ||
行 89: | 行 99: | ||
One can correct and merge authors by running <color red>PatentInventorCorrector.java</color> in the package <color red>cn.edu.bjut.ui</color>. | One can correct and merge authors by running <color red>PatentInventorCorrector.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
+ | ==== Update Country Code ==== | ||
+ | Several inventors are not attached any country code at all. One can retrieve these inventors by running the following SQL statement. | ||
+ | <code sql> | ||
+ | > SELECT * FROM entity_technology WHERE country_code IS NULL AND last_name IS NOT NULL ; | ||
+ | </code> | ||
+ | |||
+ | As for our case, there are 39 inventors in total. The resulting country codes are supplemented manually one by one. Then, by running <color red>EntityTechnologyCountryUpdater.java</color> in the package <color red>cn.edu.bjut.ui</color>, the supplemented country codes can be imported to the database. | ||
===== Academic Inventors ===== | ===== Academic Inventors ===== | ||
To recognize academic inventors, the candidates can be exported by running the following SQL statement: | To recognize academic inventors, the candidates can be exported by running the following SQL statement: | ||
行 106: | 行 123: | ||
===== Science Non-Patent References ===== | ===== Science Non-Patent References ===== | ||
==== Update with Excel ==== | ==== Update with Excel ==== | ||
- | The non-patent references (NPRs) include many types of documents, such as articles, books, reports, and so on. For purpose of identifying science NPRs, the original NPRs can be exported by running the following SQL statement: | + | The non-patent references (NPRs) include many types of documents, such as articles, books, reports, and so on. For purpose of identifying scientific NPRs, the original NPRs can be exported by running the following SQL statement: |
<code sql> | <code sql> | ||
> SELECT id, other_citation, title, abst, pubmed_id, pmc_id, doi FROM non_patent ORDER BY id ASC; | > SELECT id, other_citation, title, abst, pubmed_id, pmc_id, doi FROM non_patent ORDER BY id ASC; | ||
</code> | </code> | ||
- | One can check manually whether or not each record belongs to the type of scholarly articles. Then, one can update some information for science NPRs by running <color red>NonPatentUpdaterWithExcel.java</color> in the package <color red>cn.edu.bjut.ui</color>. | + | One can check manually whether or not each record belongs to the type of scholarly articles. Then, one can update some information for scientific NPRs by running <color red>NonPatentUpdaterWithExcel.java</color> in the package <color red>cn.edu.bjut.ui</color>. |
==== Update PMC id and PubMed id ==== | ==== Update PMC id and PubMed id ==== |