用户工具

站点工具


zh:notes:drugbank

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:drugbank [2022/09/27 21:32]
pzczxs [Affiliation]
zh:notes:drugbank [2025/01/17 22:12] (当前版本)
pzczxs [Citation Information]
行 1: 行 1:
 ====== DrugBank Dataset Construction with Extended Information ====== ====== DrugBank Dataset Construction with Extended Information ======
  
 +
 +===== Citation Information =====
 +  -Shuo Xu, Zhen Liu, Xin An, Hong Wang, and Hongshen Pang, 2025. [[https://​doi.org/​10.1016/​j.joi.2024.101617|Linkages among Science, Technology, and Industry on the basis of Main Path Analysis]]. //Journal of Informetrics//,​ Vol. 19, No. 1, pp. 101617.
 +  -Shuo Xu, Xinyi Ma, Hong Wang, Xin An, and Ling Li, 2024. [[https://​doi.org/​10.1016/​j.joi.2024.101557|A Recommendation Approach of Scientific Non-Patent Literature on the basis of Heterogeneous Information Network]]. //Journal of Informetrics//,​ Vol. 18, No. 4, pp. 101557. ​
 +  -徐硕,孙童菲,罗贵缘,苑洲桐,连佳欣,刘畅,2024. [[https://​doi.org/​10.3969/​j.issn.1672-6081.2024.04.001|分类体系双向映射视角下的科学-技术互动分析]]. //​中国发明与专利//​,Vol. 21,No. 4,pp. 4-15.
 +  -Shuo Xu, Zhen Liu, and Xin An, 2023. [[https://​ceur-ws.org/​Vol-3451/​paper3.pdf|Linkages among Science, Technology, and Industry]]. //The Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 3rd AI + Informetrics (EEKE-AII 2023)//, pp. 13-15.
 +  -Shuo Xu, Ling Li, and Xin An, 2023. [[https://​doi.org/​10.1007/​s11192-022-04587-0|Do Academic Inventors have Diverse Interests?​]] //​Scientometrics//,​ Vol. 128, No. 2, pp. 1023-1053. [[:​zh:​notes:​drugbank|Note]]
 +  -Shuo Xu, Ling Li, Xin An, Liyuan Hao, and Guancan Yang, 2021. [[https://​doi.org/​10.1007/​s11192-021-04085-9|An Approach for Detecting the Commonality and Specialty between Scientific Publications and Patents]]. //​Scientometrics//,​ Vol. 126, No. 9, pp. 7445-7475. ​
  
 ===== Create Database ===== ===== Create Database =====
行 19: 行 27:
  
 ==== Update PMC id and doi ==== ==== Update PMC id and doi ====
-To download [[ftp://​ftp.ncbi.nlm.nih.gov/​pub/​pmc/​PMC-ids.csv.gz|PMC-ids-csv.gz]],​ and save it in the directory <color red>​resource</​color>​.+To download [[https://​ftp.ncbi.nlm.nih.gov/​pub/​pmc/​PMC-ids.csv.gz|PMC-ids-csv.gz]],​ and save it in the directory <color red>​resource</​color>​.
  
 To run <color red>​ArticlePmcIdAndDoiUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. To run <color red>​ArticlePmcIdAndDoiUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.
行 57: 行 65:
  
 ==== Update Mesh Heading ==== ==== Update Mesh Heading ====
 +To download the [[https://​www.nlm.nih.gov/​databases/​download/​mesh.html|MeSH data]] in the format of XML, and then to import it to the database by running <color red>​MeshHeadingImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.
 +
 To extract XML files with <color red>​ArticleURLExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>,​ and save them in the directory <color red>​data/​articles/​url4mesh</​color>​. To extract XML files with <color red>​ArticleURLExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>,​ and save them in the directory <color red>​data/​articles/​url4mesh</​color>​.
  
-To import the related information into the database with <color red>MeshHeadingImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from the directory <color red>​data/​articles/​url4mesh</​color>​. ​+To import the related information into the database with <color red>MeshHeadingUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ from the directory <color red>​data/​articles/​url4mesh</​color>​. ​
 ===== Authors ===== ===== Authors =====
 One can correct and merge authors by running <color red>​ArticleAuthorCorrector.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ and then <color red>​ArticleChecker.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​ One can correct and merge authors by running <color red>​ArticleAuthorCorrector.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ and then <color red>​ArticleChecker.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​
行 68: 行 78:
 One can extract the affiliation information from XML files by running <color red>​ArticleAffiliationRawImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​ One can extract the affiliation information from XML files by running <color red>​ArticleAffiliationRawImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​
  
-To recognize institution and country, the related information can be exported by running the following SQL statements:+To recognize ​manually the resulting ​institution and country, the related information can be exported by running the following SQL statements:
 <code sql> <code sql>
 > SELECT article_id, REPLACE(title,​ "​\r\n",​ " ") as title, doi, pubmed_id, pmc_id, entity_id, last_name, first_name, aes.seq_no AS seq_no, REPLACE(xml_fragment,​ "​\r\n",​ "​||||||||"​) as xml_fragment,​ replace(raw_affiliation,​ "​\r\n",​ "​||||||||"​) AS affiliation,​ aesa.seq_no AS seq_no FROM article_entity_science AS aes, article AS a, entity_science AS es, article_entity_science_affiliation AS aesa WHERE aes.entity_id = es.id AND aes.id = aesa.article_entity_science_id AND aes.article_id = a.id ORDER BY article_id ASC, aes.seq_no ASC, aesa.seq_no ASC;  > SELECT article_id, REPLACE(title,​ "​\r\n",​ " ") as title, doi, pubmed_id, pmc_id, entity_id, last_name, first_name, aes.seq_no AS seq_no, REPLACE(xml_fragment,​ "​\r\n",​ "​||||||||"​) as xml_fragment,​ replace(raw_affiliation,​ "​\r\n",​ "​||||||||"​) AS affiliation,​ aesa.seq_no AS seq_no FROM article_entity_science AS aes, article AS a, entity_science AS es, article_entity_science_affiliation AS aesa WHERE aes.entity_id = es.id AND aes.id = aesa.article_entity_science_id AND aes.article_id = a.id ORDER BY article_id ASC, aes.seq_no ASC, aesa.seq_no ASC; 
行 74: 行 84:
 </​code>​ </​code>​
  
 +Then, one can run <color red>​ArticleAffiliationImporter.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​ to import these affiliations and countries to the database.
 ===== Patents ===== ===== Patents =====
 To extract XML files with <color red>​PatentURLExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>,​ and save them in the directory <color red>​data/​patents/​url</​color>​. To extract XML files with <color red>​PatentURLExtractor.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>,​ and save them in the directory <color red>​data/​patents/​url</​color>​.
行 89: 行 99:
 One can correct and merge authors by running <color red>​PatentInventorCorrector.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​ One can correct and merge authors by running <color red>​PatentInventorCorrector.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​
  
 +==== Update Country Code ====
 +Several inventors are not attached any country code at all. One can retrieve these inventors by running the following SQL statement. ​
 +<code sql>
 +> SELECT * FROM entity_technology WHERE country_code IS NULL AND last_name IS NOT NULL ; 
 +</​code>​
 +
 +As for our case, there are 39 inventors in total. The resulting country codes are supplemented manually one by one. Then, by running <color red>​EntityTechnologyCountryUpdater.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>,​ the supplemented country codes can be imported to the database.
 ===== Academic Inventors ===== ===== Academic Inventors =====
 To recognize academic inventors, the candidates can be exported by running the following SQL statement: ​ To recognize academic inventors, the candidates can be exported by running the following SQL statement: ​
行 106: 行 123:
 ===== Science Non-Patent References ===== ===== Science Non-Patent References =====
 ==== Update with Excel ==== ==== Update with Excel ====
-The non-patent references (NPRs) include many types of documents, such as articles, books, reports, and so on. For purpose of identifying ​science ​NPRs, the original NPRs can be exported by running the following SQL statement: ​+The non-patent references (NPRs) include many types of documents, such as articles, books, reports, and so on. For purpose of identifying ​scientific ​NPRs, the original NPRs can be exported by running the following SQL statement: ​
 <code sql> <code sql>
 > SELECT id, other_citation,​ title, abst, pubmed_id, pmc_id, doi FROM non_patent ORDER BY id ASC;  > SELECT id, other_citation,​ title, abst, pubmed_id, pmc_id, doi FROM non_patent ORDER BY id ASC; 
 </​code>​ </​code>​
  
-One can check manually whether or not each record belongs to the type of scholarly articles. Then, one can update some information for science ​NPRs by running <color red>​NonPatentUpdaterWithExcel.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​+One can check manually whether or not each record belongs to the type of scholarly articles. Then, one can update some information for scientific ​NPRs by running <color red>​NonPatentUpdaterWithExcel.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. ​
  
 ==== Update PMC id and PubMed id ==== ==== Update PMC id and PubMed id ====
zh/notes/drugbank.1664285534.txt.gz · 最后更改: 2022/09/27 21:32 由 pzczxs