用户工具

站点工具


zh:notes:math_softwares

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:math_softwares [2026/03/28 19:06]
pzczxs
zh:notes:math_softwares [2026/03/28 21:47] (当前版本)
pzczxs [Create Database]
行 4: 行 4:
  
 ===== Create Database ===== ===== Create Database =====
-The database SQL file: <color red>math_softwares.sql</​color>​.+The database SQL file: <color red>math_software.sql</​color>​. 
 +===== Download Data ===== 
 +[[https://​zbmath.org/​software/​|zbmath]]  
 + 
 +===== Word Embedding ===== 
 +  * [[https://​nlp.stanford.edu/​projects/​glove/​|GloVe]] 
 +  * 
 + 
 +===== PMI Calculation ===== 
 +[[https://​dumps.wikimedia.org/​backup-index.html|Wikepedia dumps]] should be downloaded in advance.  
 + 
 +==== Plain Text Extraction ==== 
 +Please refer to [[https://​radimrehurek.com/​gensim/​scripts/​segment_wiki.html|gensim.scripts.segment]] for more detial.  
 + 
 +<code python>​ 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p10p1147431.xml.bz2 -o enwiki-2026-03-01-p10p1147431.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p1147434p3987701.xml.bz2 -o enwiki-2026-03-01-p1147434p3987701.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p3987703p8213792.xml.bz2 -o enwiki-2026-03-01-p3987703p8213792.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p8213793p13295371.xml.bz2 -o enwiki-2026-03-01-p8213793p13295371.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p13295373p18816201.xml.bz2 -o enwiki-2026-03-01-p13295373p18816201.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p18816202p24038461.xml.bz2 -o enwiki-2026-03-01-p18816202p24038461.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p24038462p29075629.xml.bz2 -o enwiki-2026-03-01-p24038462p29075629.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p29075630p34204620.xml.bz2 -o enwiki-2026-03-01-p29075630p34204620.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p34204621p39293698.xml.bz2 -o enwiki-2026-03-01-p34204621p39293698.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p39293699p43920660.xml.bz2 -o enwiki-2026-03-01-p39293699p43920660.json.gz 
 +> python -m gensim.scripts.segment_wiki -i -f enwiki-2026-03-01-p43920661p48725620.xml.bz2 -o enwiki-2026-03-01-p43920661p48725620.json.gz 
 + 
 +</​code>​ 
 + 
 +==== One- and Two-Gram Counting ==== 
  
zh/notes/math_softwares.1774696008.txt.gz · 最后更改: 2026/03/28 19:06 由 pzczxs