Featured image of post R调用Taxonkit展示系统发育信息

R调用Taxonkit展示系统发育信息

TaxonKit是一个用于处理生物分类学数据的命令行工具,我把Taxonkit工具整合进了R包pctax,也开发了一些配套的系统发育分析和可视化方法。

Introduction

TaxonKit是一个用于处理生物分类学数据的命令行工具。 它的主要功能是处理NCBI的生物分类学数据,包括对分类单元(如物种、属、科等)的查找、分类单元的上下位关系查询、分类单元名称的标准化等。

为了方便R社区用户(自己)使用和流程整合,我把Taxonkit工具整合进了R包pctax,也开发了一些配套的系统发育分析和可视化方法。

R调用Taxonkit

准备工作

  1. 安装pctax pctax稳定版本可在CRAN上获得:
1
install.packages("pctax")

或者你可以通过以下方式从GitHub安装pctax的开发版本:

1
2
# install.packages("devtools")
devtools::install_github("Asa12138/pctax")
  1. 安装taxonkit:
1
2
3
4
5
library(pctax)
pctax::install_taxonkit(make_sure = TRUE)

#成功后taxonkit会安装在下面这个目录👇
tools::R_user_dir("pctax")
  1. 下载NCBI Taxonomy数据文件:
1
2
3
4
pctax::download_taxonkit_dataset(make_sure = TRUE)

#成功后Taxonomy数据文件会在下面这个目录👇
file.path(Sys.getenv("HOME"), ".taxonkit")

该函数会下载官网最新版本的Taxonomy数据库,如果需要制定版本的数据库,可以自己在官网下载:https://ftp.ncbi.nih.gov/pub/taxonomy/,然后指定位置:

1
pctax::download_taxonkit_dataset(make_sure = TRUE,taxdump_tar_gz = "~/Downloads/taxdump.tar.gz")

使用

1
2
# 下列命令不报错说明可以正常使用
check_taxonkit(print = FALSE)

主要功能与taxonkit一致:

函数 功能
taxonkit_list 列出指定TaxId下所有子单元的的TaxID
taxonkit_lineage 根据TaxID获取完整谱系(lineage)
taxonkit_reformat 将完整谱系转化为“界门纲目科属种株”的自定义格式
taxonkit_name2taxid 将分类单元名称转化为TaxID
taxonkit_filter 按分类学水平范围过滤TaxIDs
taxonkit_lca 计算最低公共祖先(LCA)

并且help(taxonkit_*)可查看详细使用说明。

1
2
# 列出[genus] Homo下的所有子单元
taxonkit_list(ids = c(9605), indent = "-", show_name = TRUE, show_rank = TRUE)
##  [1] "9605 [genus] Homo"                                    
##  [2] "-9606 [species] Homo sapiens"                         
##  [3] "--63221 [subspecies] Homo sapiens neanderthalensis"   
##  [4] "--741158 [subspecies] Homo sapiens subsp. 'Denisova'" 
##  [5] "-1425170 [species] Homo heidelbergensis"              
##  [6] "-2665952 [no rank] environmental samples"             
##  [7] "--2665953 [species] Homo sapiens environmental sample"
##  [8] "-2813598 [no rank] unclassified Homo"                 
##  [9] "--2813599 [species] Homo sp."                         
## [10] ""

taxonkit_lineage, taxonkit_reformat, taxonkit_name2taxid, taxonkit_filtertaxonkit_lca 默认从文件中读取数据,也可通过指定text = TRUE从字符串输入读取输入数据:

1
2
3
# 查询9606和63221的完整谱系
taxonkit_lineage("9606\n63221", show_name = TRUE, show_rank = TRUE, text = TRUE)%>%
    pcutils::strsplit2(split = "\t",colnames = c("taxid","lineage","name","level"))
##   taxid
## 1  9606
## 2 63221
##                                                                                                                                                                                                                                                                                                                                                                                          lineage
## 1                               cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens
## 2 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens;Homo sapiens neanderthalensis
##                            name      level
## 1                  Homo sapiens    species
## 2 Homo sapiens neanderthalensis subspecies

从文件中读取数据:

1
2
3
names <- system.file("extdata/name.txt", package = "pctax")
taxonkit_name2taxid(names, name_field = 1, sci_name = FALSE, show_rank = FALSE)%>%
    pcutils::strsplit2(split = "\t",colnames = c("name","taxid"))
##                                              name   taxid
## 1                                    Homo sapiens    9606
## 2            Akkermansia muciniphila ATCC BAA-835  349741
## 3                         Akkermansia muciniphila  239935
## 4                 Mouse Intracisternal A-particle   11932
## 5                                        Wei Shen        
## 6 uncultured murine large bowel bacterium BAC 54B  314101
## 7                       Croceibacter phage P2559Y 1327037

系统发育树

如果是做16S测序的话,在分析过程中就会得到一个带距离的系统发育树。宏基因组分析如果组装MAG后用GTDB-Tk比对数据库后也可以获得有距离的系统发育树。

但有时候我们想要从物种名或taxid获取整齐的谱系信息,用来一个构建系统发育树(层级树,没有真实的距离,只展示包含关系)。这是一个常见的需求,很多文章都会画一个这样的树图来展示自己的数据。

可以实现这个需求的工具有一些:

当然可以使用pctax包快速完成,对于分析流程都在R里做的人来说非常方便:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
names <- system.file("extdata/name.txt", package = "pctax")%>%readLines()

# 首先通过`name_or_id2df`获取整齐的系统发育分类:
tax_df=name_or_id2df(names,mode = "name")

# 去除部分NA,原因可能是学名不标准,或者在新数据库里删除了,因为taxonomy数据库是不断变化的
tax_df=na.omit(tax_df)

#用`df2tree`将分类层级表转化为树对象
tax_tree=pctax::df2tree(tax_df[,3:9])

# tax_tree是phylo对象,可以用ape包直接简单绘图
ape:::plot.phylo(tax_tree)

可视化

pctax还提供了一些系统发育信息展示方法:

  1. 系统发育树
1
2
3
4
5
6
data(otutab, package = "pcutils")
#otutab是丰度数据,taxonomy是分类层级表(可通过name_or_id2df获得)
ann_tree(taxonomy, otutab) -> tree

easy_tree(tree, add_abundance = TRUE) -> p
p

添加主要Phylum的strip:

1
2
3
4
5
6
easy_tree(tree, add_abundance = TRUE,add_tiplab = FALSE) -> p
some_tax <- table(taxonomy$Phylum) %>%
  sort(decreasing = TRUE) %>%
  head(5) %>%
  names()
add_strip(p, some_tax)

当然,更多系统发育树的绘制可以参考我之前写的R绘制优美的进化树(基础)R绘制优美的进化树(进阶),或者使用iPhylo网站来交互式绘图:iPhylo 生成并绘制优美的分类树

  1. 桑基图:
1
sangji_plot(tree)
k__Bacteria → p__Proteobacteria 11.1k k__Bacteria → p__Actinobacteria 6.89k p__Actinobacteria → c__Actinobacteria 6.88k c__Actinobacteria → o__Actinomycetales 6.63k p__Proteobacteria → c__Betaproteobacteria 4.42k c__Betaproteobacteria → o__Burkholderiales 4.32k p__Proteobacteria → c__Alphaproteobacteria 3.26k p__Proteobacteria → c__Gammaproteobacteria 2.86k o__Burkholderiales → f__Comamonadaceae 2.20k c__Alphaproteobacteria → o__Rhizobiales 2.19k c__Gammaproteobacteria → o__Pseudomonadales 1.96k o__Pseudomonadales → f__Pseudomonadaceae 1.96k o__Actinomycetales → f__Thermomonosporaceae 1.73k o__Actinomycetales → f__Pseudonocardiaceae 1.69k f__Pseudomonadaceae → g__Rhizobacter 1.69k g__un_f__Thermomonosporaceae → s__un_f__Thermomonosporaceae 1.45k f__Thermomonosporaceae → g__un_f__Thermomonosporaceae 1.45k g__Pelomonas → s__Pelomonas_puraquae 1.40k f__Comamonadaceae → g__Pelomonas 1.40k k__Bacteria → p__Bacteroidetes 1.38k p__Bacteroidetes → c__Flavobacteriia 1.13k c__Flavobacteriia → o__Flavobacteriales 1.13k o__Flavobacteriales → f__Flavobacteriaceae 1.09k f__Flavobacteriaceae → g__Flavobacterium 1.06k o__Actinomycetales → g__Streptomyces 984 g__Rhizobacter → s__Rhizobacter_bergeniae 922 g__Flavobacterium → s__Flavobacterium_terrae 916 g__Rhizobacter → s__un_g__Rhizobacter 772 k__Bacteria → p__Firmicutes 578 k__Bacteria → p__Chloroflexi 301 s__un_f__Thermomonosporaceae 1.45k s__un_f__Thermomonosporaceae1.45ks__Pelomonas_puraquae 1.40k s__Pelomonas_puraquae1.40ks__Rhizobacter_bergeniae 922 s__Rhizobacter_bergeniae922s__Flavobacterium_terrae 916 s__Flavobacterium_terrae916s__un_g__Rhizobacter 772 s__un_g__Rhizobacter772k__Bacteria 20.5k k__Bacteria20.5kp__Actinobacteria 6.89k p__Actinobacteria6.89kp__Proteobacteria 11.1k p__Proteobacteria11.1kp__Bacteroidetes 1.38k p__Bacteroidetes1.38kp__Firmicutes 578 p__Firmicutes578p__Chloroflexi 301 p__Chloroflexi301c__Actinobacteria 6.88k c__Actinobacteria6.88kc__Betaproteobacteria 4.42k c__Betaproteobacteria4.42kc__Gammaproteobacteria 2.86k c__Gammaproteobacteria2.86kc__Flavobacteriia 1.13k c__Flavobacteriia1.13kc__Alphaproteobacteria 3.26k c__Alphaproteobacteria3.26ko__Actinomycetales 6.63k o__Actinomycetales6.63ko__Burkholderiales 4.32k o__Burkholderiales4.32ko__Pseudomonadales 1.96k o__Pseudomonadales1.96ko__Flavobacteriales 1.13k o__Flavobacteriales1.13ko__Rhizobiales 2.19k o__Rhizobiales2.19kf__Thermomonosporaceae 1.73k f__Thermomonosporaceae1.73kf__Comamonadaceae 2.20k f__Comamonadaceae2.20kf__Pseudomonadaceae 1.96k f__Pseudomonadaceae1.96kf__Flavobacteriaceae 1.09k f__Flavobacteriaceae1.09kf__Pseudonocardiaceae 1.69k f__Pseudonocardiaceae1.69kg__un_f__Thermomonosporaceae 1.45k g__un_f__Thermomonosporaceae1.45kg__Pelomonas 1.40k g__Pelomonas1.40kg__Rhizobacter 1.69k g__Rhizobacter1.69kg__Flavobacterium 1.06k g__Flavobacterium1.06kg__Streptomyces 984 g__Streptomyces984KingdomPhylumClassOrderFamilyGenusSpecies

3.旭日图

1
sunburst(tree)
r__rootk__Bacteriap__Proteobacteriap__Actinobacteriap__Bacteroidetesp__Firmicutesp__Chloroflexip__Acidobacteriap__Verrucomicrobiap__Planctomycetesp__Spirochaetesp__Chlamydiaec__Betaproteobacteriac__Alphaproteobacteriac__Gammaproteobacteriac__Deltaproteobacteriac__un_p__Proteobacteriac__Oligoflexiac__Actinobacteriac__un_p__Actinobacteriac__Flavobacteriiac__Cytophagiac__Sphingobacteriiac__un_p__Bacteroidetesc__Bacteroidiac__Bacillic__Clostridiac__Erysipelotrichiac__Chloroflexiac__un_p__Chloroflexic__Caldilineaec__Acidobacteria_Gp4c__Acidobacteria_Gp10c__Acidobacteria_Gp3c__Acidobacteria_Gp6c__Acidobacteria_Gp17c__un_p__Acidobacteriac__Acidobacteria_Gp11c__Acidobacteria_Gp25c__Acidobacteria_Gp5c__Acidobacteria_Gp7c__Verrucomicrobiaec__un_p__Verrucomicrobiac__Spartobacteriac__Subdivision3c__Opitutaec__Planctomycetiac__un_p__Planctomycetesc__Spirochaetiac__Chlamydiiao__Burkholderialeso__Methylophilaleso__un_c__Betaproteobacteriao__Rhodocyclaleso__Nitrosomonadaleso__Rhizobialeso__Sphingomonadaleso__Caulobacteraleso__Rhodospirillaleso__un_c__Alphaproteobacteriao__Alphaproteobacteria_incertae_sediso__Sneathiellaleso__Rhodobacteraleso__Pseudomonadaleso__Xanthomonadaleso__un_c__Gammaproteobacteriao__Enterobacterialeso__Chromatialeso__Myxococcaleso__un_c__Deltaproteobacteriao__Bdellovibrionaleso__Desulfuromonadaleso__un_p__Proteobacteriao__Oligoflexaleso__Actinomycetaleso__Acidimicrobialeso__Solirubrobacteraleso__un_c__Actinobacteriao__Gaiellaleso__Rubrobacteraleso__un_p__Actinobacteriao__Flavobacterialeso__Cytophagaleso__Sphingobacterialeso__un_p__Bacteroideteso__Bacteroidaleso__Bacillaleso__Clostridialeso__un_c__Clostridiao__Erysipelotrichaleso__Herpetosiphonaleso__Chloroflexaleso__un_c__Chloroflexiao__un_p__Chloroflexio__Caldilinealeso__un_c__Acidobacteria_Gp4o__un_c__Acidobacteria_Gp10o__un_c__Acidobacteria_Gp3o__un_c__Acidobacteria_Gp6o__un_c__Acidobacteria_Gp17o__un_p__Acidobacteriao__un_c__Acidobacteria_Gp11o__un_c__Acidobacteria_Gp25o__un_c__Acidobacteria_Gp5o__un_c__Acidobacteria_Gp7o__Verrucomicrobialeso__un_p__Verrucomicrobiao__un_c__Spartobacteriao__un_c__Subdivision3o__Opitutaleso__Planctomycetaleso__un_p__Planctomyceteso__Spirochaetaleso__Chlamydialesf__Comamonadaceaef__Oxalobacteraceaef__un_o__Burkholderialesf__Burkholderiales_incertae_sedisf__Burkholderiaceaef__Alcaligenaceaef__Methylophilaceaef__un_c__Betaproteobacteriaf__Rhodocyclaceaef__Nitrosomonadaceaef__Rhizobiaceaef__Bradyrhizobiaceaef__Hyphomicrobiaceaef__un_o__Rhizobialesf__Phyllobacteriaceaef__Rhodobiaceaef__Methylobacteriaceaef__Rhizobiales_incertae_sedisf__Xanthobacteraceaef__Beijerinckiaceaef__Brucellaceaef__Sphingomonadaceaef__Erythrobacteraceaef__un_o__Sphingomonadalesf__Caulobacteraceaef__Hyphomonadaceaef__Rhodospirillaceaef__un_o__Rhodospirillalesf__un_c__Alphaproteobacteriaf__un_o__Alphaproteobacteria_incertae_sedisf__Sneathiellaceaef__Rhodobacteraceaef__Pseudomonadaceaef__Xanthomonadaceaef__Sinobacteraceaef__un_o__Xanthomonadalesf__un_c__Gammaproteobacteriaf__Enterobacteriaceaef__Ectothiorhodospiraceaef__Sandaracinaceaef__Polyangiaceaef__Cystobacteraceaef__un_o__Myxococcalesf__Haliangiaceaef__Myxococcaceaef__Nannocystaceaef__Labilitrichaceaef__un_c__Deltaproteobacteriaf__Bacteriovoracaceaef__Bdellovibrionaceaef__Geobacteraceaef__un_o__Desulfuromonadalesf__un_p__Proteobacteriaf__Oligoflexaceaef__Thermomonosporaceaef__Pseudonocardiaceaef__Streptomycetaceaef__Micromonosporaceaef__Nocardioidaceaef__Kineosporiaceaef__Microbacteriaceaef__un_o__Actinomycetalesf__Streptosporangiaceaef__Promicromonosporaceaef__Nocardiaceaef__Mycobacteriaceaef__Geodermatophilaceaef__Micrococcaceaef__Cellulomonadaceaef__Intrasporangiaceaef__Glycomycetaceaef__Cryptosporangiaceaef__Propionibacteriaceaef__Sporichthyaceaef__Acidimicrobiaceaef__Iamiaceaef__un_o__Acidimicrobialesf__Acidimicrobineae_incertae_sedisf__Solirubrobacteraceaef__Conexibacteraceaef__un_o__Solirubrobacteralesf__un_c__Actinobacteriaf__Gaiellaceaef__Rubrobacteraceaef__un_p__Actinobacteriaf__Flavobacteriaceaef__Cryomorphaceaef__un_o__Flavobacterialesf__un_o__Cytophagalesf__Flammeovirgaceaef__Chitinophagaceaef__Sphingobacteriaceaef__un_o__Sphingobacterialesf__un_p__Bacteroidetesf__Prolixibacteraceaef__Bacillaceae_1f__Paenibacillaceae_1f__Planococcaceaef__Bacillaceae_2f__Thermoactinomycetaceae_2f__Paenibacillaceae_2f__Alicyclobacillaceaef__un_o__Bacillalesf__Clostridiaceae_1f__Peptostreptococcaceaef__Lachnospiraceaef__Ruminococcaceaef__Gracilibacteraceaef__un_c__Clostridiaf__Erysipelotrichaceaef__Herpetosiphonaceaef__Chloroflexaceaef__Oscillochloridaceaef__un_o__Chloroflexalesf__un_c__Chloroflexiaf__un_p__Chloroflexif__Caldilineaceaef__un_c__Acidobacteria_Gp4f__un_c__Acidobacteria_Gp10f__un_c__Acidobacteria_Gp3f__un_c__Acidobacteria_Gp6f__un_c__Acidobacteria_Gp17f__un_p__Acidobacteriaf__un_c__Acidobacteria_Gp11f__un_c__Acidobacteria_Gp25f__un_c__Acidobacteria_Gp5f__un_c__Acidobacteria_Gp7f__Verrucomicrobiaceaef__un_p__Verrucomicrobiaf__un_c__Spartobacteriaf__un_c__Subdivision3f__Opitutaceaef__Planctomycetaceaef__un_p__Planctomycetesf__Leptospiraceaef__Spirochaetaceaef__Chlamydiaceaef__Parachlamydiaceaef__un_o__Chlamydialesg__Pelomonasg__un_f__Comamonadaceaeg__Kinneretiag__Variovoraxg__Hydrogenophagag__Acidovoraxg__Ramlibacterg__Roseatelesg__Xenophilusg__Caenimonasg__Pseudorhodoferaxg__Schlegelellag__Noviherbaspirillumg__un_f__Oxalobacteraceaeg__Massiliag__Pseudoduganellag__Duganellag__Paraherbaspirillumg__Herbaspirillumg__un_o__Burkholderialesg__un_f__Burkholderiales_incertae_sedisg__Piscinibacterg__Aquabacteriumg__Inhellag__Rubrivivaxg__Burkholderiag__Cupriavidusg__Ralstoniag__un_f__Burkholderiaceaeg__Achromobacterg__Azohydromonasg__un_f__Methylophilaceaeg__Methylotenerag__Methylovorusg__Methylophilusg__un_c__Betaproteobacteriag__un_f__Rhodocyclaceaeg__Uliginosibacteriumg__Methyloversatilisg__Nitrosospirag__Rhizobiumg__Ensiferg__Shinellag__un_f__Rhizobiaceaeg__Bradyrhizobiumg__Boseag__un_f__Bradyrhizobiaceaeg__Rhodopseudomonasg__Devosiag__Hyphomicrobiumg__Pedomicrobiumg__un_f__Hyphomicrobiaceaeg__un_o__Rhizobialesg__Mesorhizobiumg__Phyllobacteriumg__un_f__Phyllobacteriaceaeg__un_f__Rhodobiaceaeg__Methyloceanibacterg__Microvirgag__Bauldiag__Vasilyevaeag__Phreatobacterg__un_f__Xanthobacteraceaeg__Starkeyag__un_f__Beijerinckiaceaeg__Ochrobactrumg__Sphingomonasg__Sphingopyxisg__un_f__Sphingomonadaceaeg__Sphingobiumg__Sphingosinicellag__Novosphingobiumg__Sphingorhabdusg__Altererythrobacterg__un_f__Erythrobacteraceaeg__un_o__Sphingomonadalesg__Phenylobacteriumg__Caulobacterg__Brevundimonasg__Asticcacaulisg__un_f__Caulobacteraceaeg__Hyphomonasg__un_f__Hyphomonadaceaeg__Dongiag__un_f__Rhodospirillaceaeg__Inquilinusg__Pelagibiusg__Reyranellag__un_o__Rhodospirillalesg__un_c__Alphaproteobacteriag__Rhizomicrobiumg__Ferrovibriog__un_f__Sneathiellaceaeg__Amaricoccusg__Rhizobacterg__Pseudomonasg__Cellvibriog__Permianibacterg__Lysobacterg__Pseudoxanthomonasg__un_f__Xanthomonadaceaeg__Arenimonasg__Rhodanobacterg__Dokdonellag__Rudaeag__Stenotrophomonasg__Steroidobacterg__un_f__Sinobacteraceaeg__Povalibacterg__un_o__Xanthomonadalesg__un_c__Gammaproteobacteriag__un_f__Enterobacteriaceaeg__Pantoeag__Cedeceag__Enterobacterg__Thioalkalivibriog__Sandaracinusg__Sorangiumg__un_f__Polyangiaceaeg__Polyangiumg__Byssovoraxg__Cystobacterg__un_f__Cystobacteraceaeg__Archangiumg__un_o__Myxococcalesg__Haliangiumg__un_f__Myxococcaceaeg__Aggregicoccusg__Nannocystisg__un_f__Nannocystaceaeg__Labilithrixg__un_c__Deltaproteobacteriag__Peredibacterg__un_f__Bacteriovoracaceaeg__Bacteriovoraxg__Bdellovibriog__Geobacterg__un_o__Desulfuromonadalesg__un_p__Proteobacteriag__Oligoflexusg__un_f__Thermomonosporaceaeg__Actinocoralliag__Actinomadurag__Lentzeag__Amycolatopsisg__Actinophytocolag__Pseudonocardiag__Kibdelosporangiumg__un_f__Pseudonocardiaceaeg__Saccharothrixg__Lechevalieriag__Umezawaeag__Streptomycesg__Actinoplanesg__un_f__Micromonosporaceaeg__Verrucosisporag__Dactylosporangiumg__Polymorphosporag__Micromonosporag__Rhizocolag__Couchioplanesg__Phytohabitansg__Catellatosporag__Plantactinosporag__Catelliglobosisporag__Planosporangiumg__Nocardioidesg__Kribbellag__Aeromicrobiumg__Marmoricolag__un_f__Nocardioidaceaeg__Kineosporiag__un_f__Kineosporiaceaeg__Angustibacterg__Agromycesg__Microbacteriumg__Cryobacteriumg__Yonghaparkiag__un_f__Microbacteriaceaeg__un_o__Actinomycetalesg__un_f__Streptosporangiaceaeg__Nonomuraeag__Promicromonosporag__Cellulosimicrobiumg__Nocardiag__Gordoniag__Mycobacteriumg__Blastococcusg__Geodermatophilusg__Arthrobacterg__Cellulomonasg__Intrasporangiumg__un_f__Intrasporangiaceaeg__Phycicoccusg__Glycomycesg__Cryptosporangiumg__un_f__Propionibacteriaceaeg__Sporichthyag__Ilumatobacterg__Iamiag__un_f__Iamiaceaeg__Aquihabitansg__un_o__Acidimicrobialesg__Aciditerrimonasg__Solirubrobacterg__Conexibacterg__un_o__Solirubrobacteralesg__un_c__Actinobacteriag__Gaiellag__Rubrobacterg__un_p__Actinobacteriag__Flavobacteriumg__Chryseobacteriumg__un_f__Flavobacteriaceaeg__un_f__Cryomorphaceaeg__un_o__Flavobacterialesg__Ohtaekwangiag__un_o__Cytophagalesg__Chryseolineag__un_f__Flammeovirgaceaeg__Niastellag__Terrimonasg__Chitinophagag__un_f__Chitinophagaceaeg__Lacibacterg__un_f__Sphingobacteriaceaeg__un_o__Sphingobacterialesg__un_p__Bacteroidetesg__Mangrovibacteriumg__Bacillusg__un_f__Bacillaceae_1g__Fictibacillusg__Paenibacillusg__Brevibacillusg__Cohnellag__un_f__Paenibacillaceae_1g__Ammoniibacillusg__Paenisporosarcinag__un_f__Planococcaceaeg__Lysinibacillusg__Terribacillusg__un_f__Bacillaceae_2g__Gracilibacillusg__Planifilumg__un_f__Paenibacillaceae_2g__Tumebacillusg__un_o__Bacillalesg__Clostridium_sensu_strictog__Sporacetigeniumg__Romboutsiag__Clostridium_XIg__un_f__Lachnospiraceaeg__Clostridium_IIIg__un_f__Gracilibacteraceaeg__un_c__Clostridiag__Turicibacterg__Herpetosiphong__Roseiflexusg__un_f__Chloroflexaceaeg__Oscillochlorisg__un_o__Chloroflexalesg__un_c__Chloroflexiag__un_p__Chloroflexig__un_f__Caldilineaceaeg__un_c__Acidobacteria_Gp4g__Blastocatellag__Aridibacterg__Gp10g__Gp3g__Gp6g__Gp17g__un_p__Acidobacteriag__Gp11g__Gp25g__Gp5g__Gp7g__Luteolibacterg__Roseimicrobiumg__un_f__Verrucomicrobiaceaeg__un_p__Verrucomicrobiag__Spartobacteria_genera_incertae_sedisg__Subdivision3_genera_incertae_sedisg__un_c__Subdivision3g__Opitutusg__Blastopirellulag__un_f__Planctomycetaceaeg__Pirellulag__un_p__Planctomycetesg__Turneriellag__un_f__Leptospiraceaeg__Leptospirag__un_f__Spirochaetaceaeg__un_f__Chlamydiaceaeg__un_f__Parachlamydiaceaeg__Parachlamydiag__un_o__Chlamydialess__Pelomonas_puraquaes__un_f__Comamonadaceaes__Kinneretia_asaccharophilas__un_g__Variovoraxs__Variovorax_boronicumulanss__Variovorax_solis__un_g__Hydrogenophagas__Acidovorax_solis__un_g__Acidovoraxs__un_g__Ramlibacters__Ramlibacter_henchirensiss__un_g__Roseateless__Xenophilus_aerolatuss__Caenimonas_terraes__un_g__Pseudorhodoferaxs__Schlegelella_aquaticas__un_g__Noviherbaspirillums__Noviherbaspirillum_suwonenses__un_f__Oxalobacteraceaes__un_g__Massilias__Massilia_aerilatas__Massilia_namucuonensiss__Pseudoduganella_violaceinigras__Duganella_radiciss__Duganella_phyllosphaeraes__Paraherbaspirillum_solis__Herbaspirillum_aquaticums__un_o__Burkholderialess__un_f__Burkholderiales_incertae_sediss__Piscinibacter_aquaticuss__un_g__Aquabacteriums__Aquabacterium_communes__Aquabacterium_parvums__Inhella_inkyongensiss__Rubrivivax_gelatinosuss__un_g__Burkholderias__Burkholderia_heleias__Cupriavidus_necators__un_g__Cupriaviduss__un_g__Ralstonias__un_f__Burkholderiaceaes__un_g__Achromobacters__Azohydromonas_australicas__un_f__Methylophilaceaes__un_g__Methyloteneras__un_g__Methylovoruss__un_g__Methylophiluss__un_c__Betaproteobacterias__un_f__Rhodocyclaceaes__Uliginosibacterium_gangwonenses__Methyloversatilis_universaliss__Nitrosospira_multiformiss__un_g__Rhizobiums__Rhizobium_alveis__Rhizobium_subbaraoniss__Rhizobium_laguerreaes__Rhizobium_rosettiformanss__Rhizobium_vignaes__Rhizobium_nepotums__Rhizobium_rhizoryzaes__un_g__Ensifers__Ensifer_sahelis__Shinella_kummerowiaes__Shinella_zoogloeoidess__un_g__Shinellas__un_f__Rhizobiaceaes__Bradyrhizobium_neotropicales__un_g__Bradyrhizobiums__Bradyrhizobium_daqingenses__un_g__Boseas__Bosea_eneaes__Bosea_lathyris__Bosea_massiliensiss__un_f__Bradyrhizobiaceaes__Rhodopseudomonas_rhenobacensiss__un_g__Devosias__Devosia_insulaes__Devosia_chinhatensiss__Devosia_yakushimensiss__Devosia_pacificas__Devosia_riboflavinas__Hyphomicrobium_vulgares__un_g__Hyphomicrobiums__Hyphomicrobium_sulfonivoranss__Hyphomicrobium_zavarziniis__Pedomicrobium_manganicums__un_g__Pedomicrobiums__un_f__Hyphomicrobiaceaes__un_o__Rhizobialess__un_g__Mesorhizobiums__Mesorhizobium_gobienses__Mesorhizobium_silamurunenses__Phyllobacterium_bourgognenses__Phyllobacterium_myrsinacearums__un_f__Phyllobacteriaceaes__un_f__Rhodobiaceaes__Methyloceanibacter_caenitepidis__Microvirga_guangxiensiss__un_g__Microvirgas__Microvirga_lupinis__Microvirga_aerilatas__Bauldia_consociatas__un_g__Bauldias__un_g__Vasilyevaeas__Phreatobacter_oligotrophuss__un_f__Xanthobacteraceaes__Starkeya_koreensiss__un_f__Beijerinckiaceaes__un_g__Ochrobactrums__un_g__Sphingomonass__Sphingomonas_solis__Sphingomonas_daechungensiss__Sphingopyxis_solis__un_g__Sphingopyxiss__un_f__Sphingomonadaceaes__un_g__Sphingobiums__Sphingosinicella_vermicompostis__un_g__Novosphingobiums__Sphingorhabdus_planktonicas__un_g__Altererythrobacters__Altererythrobacter_troitsensiss__Altererythrobacter_xinjiangensiss__un_f__Erythrobacteraceaes__un_o__Sphingomonadaless__Phenylobacterium_compostis__un_g__Phenylobacteriums__Phenylobacterium_koreenses__Caulobacter_henriciis__Caulobacter_fusiformiss__un_g__Caulobacters__Brevundimonas_basaltiss__Brevundimonas_kwangchunensiss__Brevundimonas_aveniformiss__Brevundimonas_faecaliss__un_g__Brevundimonass__Asticcacaulis_solisilvaes__un_f__Caulobacteraceaes__un_g__Hyphomonass__un_f__Hyphomonadaceaes__Dongia_mobiliss__un_f__Rhodospirillaceaes__Inquilinus_ginsengisolis__Pelagibius_litoraliss__Reyranella_solis__Reyranella_massiliensiss__Reyranella_graminifoliis__un_o__Rhodospirillaless__un_c__Alphaproteobacterias__un_g__Rhizomicrobiums__Rhizomicrobium_electricums__Ferrovibrio_denitrificanss__un_f__Sneathiellaceaes__Amaricoccus_tamworthensiss__un_g__Amaricoccuss__Rhizobacter_bergeniaes__un_g__Rhizobacters__Pseudomonas_chlororaphis_subsp._pisciums__un_g__Pseudomonass__Pseudomonas_solis__Cellvibrio_mixtus_subsp._mixtuss__Cellvibrio_ostraviensiss__un_g__Cellvibrios__Permianibacter_aggreganss__Lysobacter_brunescenss__Lysobacter_panaciterraes__Lysobacter_terraes__un_g__Lysobacters__Lysobacter_enzymogeness__Lysobacter_panacisolis__Lysobacter_dokdonensiss__Lysobacter_oryzaes__Pseudoxanthomonas_geis__Pseudoxanthomonas_wuyuanensiss__un_g__Pseudoxanthomonass__Pseudoxanthomonas_indicas__un_f__Xanthomonadaceaes__un_g__Arenimonass__Arenimonas_daejeonensiss__Arenimonas_oryziterraes__Arenimonas_metallis__Arenimonas_donghaensiss__Rhodanobacter_denitrificanss__un_g__Dokdonellas__Rudaea_cellulosilyticas__un_g__Stenotrophomonass__Steroidobacter_denitrificanss__Steroidobacter_agariperforanss__un_g__Steroidobacters__un_f__Sinobacteraceaes__Povalibacter_uvarums__un_o__Xanthomonadaless__un_c__Gammaproteobacterias__un_f__Enterobacteriaceaes__un_g__Pantoeas__un_g__Cedeceas__un_g__Enterobacters__Thioalkalivibrio_sulfidiphiluss__Sandaracinus_amylolyticuss__Sorangium_cellulosums__un_f__Polyangiaceaes__Polyangium_fumosums__Byssovorax_cruentas__un_g__Cystobacters__Cystobacter_violaceuss__Cystobacter_graciliss__un_f__Cystobacteraceaes__Archangium_gephyras__un_o__Myxococcaless__un_g__Haliangiums__Haliangium_tepidums__un_f__Myxococcaceaes__Aggregicoccus_edonensiss__Nannocystis_exedenss__un_f__Nannocystaceaes__Labilithrix_luteolas__un_c__Deltaproteobacterias__Peredibacter_starriis__un_f__Bacteriovoracaceaes__Bacteriovorax_stolpiis__Bdellovibrio_bacteriovoruss__un_g__Bdellovibrios__Bdellovibrio_exovoruss__un_g__Geobacters__un_o__Desulfuromonadaless__un_p__Proteobacterias__Oligoflexus_tunisiensiss__un_f__Thermomonosporaceaes__Actinocorallia_herbidas__un_g__Actinomaduras__Lentzea_flaviverrucosas__un_g__Lentzeas__Lentzea_jiangxiensiss__Lentzea_kentuckyensiss__un_g__Amycolatopsiss__Amycolatopsis_keratiniphila_subsp._keratiniphilas__Actinophytocola_burenkhanensiss__un_g__Actinophytocolas__Actinophytocola_timorensiss__un_g__Pseudonocardias__Pseudonocardia_adelaidensiss__Pseudonocardia_xinjiangensiss__Pseudonocardia_seranimatas__Pseudonocardia_kunmingensiss__Pseudonocardia_parietiss__Kibdelosporangium_phytohabitanss__un_f__Pseudonocardiaceaes__Saccharothrix_longisporas__un_g__Saccharothrixs__un_g__Lechevalierias__Lechevalieria_xinjiangensiss__Umezawaea_tangerinas__un_g__Streptomycess__Streptomyces_ederensiss__Streptomyces_acidiscabiess__Streptomyces_bangladeshensiss__Streptomyces_bulliis__Streptomyces_chrysomallus_subsp._chrysomalluss__Streptomyces_seranimatuss__Streptomyces_fimbriatuss__un_g__Actinoplaness__Actinoplanes_xinjiangensiss__un_f__Micromonosporaceaes__Verrucosispora_andamanensiss__un_g__Verrucosisporas__un_g__Dactylosporangiums__Polymorphospora_rubras__un_g__Micromonosporas__Micromonospora_siamensiss__Rhizocola_helleboris__Couchioplanes_caeruleus_subsp._azureuss__Phytohabitans_houttuyneaes__Phytohabitans_flavuss__un_g__Catellatosporas__un_g__Plantactinosporas__Catelliglobosispora_koreensiss__un_g__Planosporangiums__un_g__Nocardioidess__Nocardioides_albuss__Nocardioides_agariphiluss__Nocardioides_jenseniis__Nocardioides_dilutuss__Nocardioides_panacisolis__Nocardioides_daeguensiss__Nocardioides_mesophiluss__Nocardioides_caricicolas__Nocardioides_iriomotensiss__Nocardioides_terrigenas__Nocardioides_marinquilinuss__Kribbella_catacumbaes__Aeromicrobium_ginsengisolis__Aeromicrobium_panaciterraes__un_g__Aeromicrobiums__un_g__Marmoricolas__Marmoricola_scoriaes__Marmoricola_aequoreuss__Marmoricola_bigeumensiss__un_f__Nocardioidaceaes__Kineosporia_rhamnosas__un_f__Kineosporiaceaes__un_g__Angustibacters__un_g__Agromycess__Agromyces_indicuss__Agromyces_subbeticuss__Microbacterium_aerolatums__Microbacterium_fluviis__Microbacterium_arthrosphaeraes__Cryobacterium_arcticums__Yonghaparkia_alkaliphilas__un_f__Microbacteriaceaes__un_o__Actinomycetaless__un_f__Streptosporangiaceaes__Nonomuraea_salmoneas__Nonomuraea_maritimas__Nonomuraea_muscovyensiss__Nonomuraea_jabiensiss__Nonomuraea_kuesteris__Promicromonospora_xylanilyticas__un_g__Promicromonosporas__Cellulosimicrobium_funkeis__un_g__Nocardias__Nocardia_salmonicidas__un_g__Gordonias__un_g__Mycobacteriums__Mycobacterium_canariasenses__Mycobacterium_vaccaes__Blastococcus_saxobsidenss__un_g__Blastococcuss__Blastococcus_jejuensiss__un_g__Geodermatophiluss__un_g__Arthrobacters__Arthrobacter_globiformiss__Cellulomonas_humilatas__Cellulomonas_hominiss__un_g__Cellulomonass__un_g__Intrasporangiums__un_f__Intrasporangiaceaes__un_g__Phycicoccuss__un_g__Glycomycess__Cryptosporangium_minutisporangiums__un_f__Propionibacteriaceaes__un_g__Sporichthyas__Ilumatobacter_fluminiss__un_g__Ilumatobacters__Ilumatobacter_nonamienses__Iamia_majanohamensiss__un_f__Iamiaceaes__Aquihabitans_daechungensiss__un_o__Acidimicrobialess__Aciditerrimonas_ferrireducenss__un_g__Solirubrobacters__Solirubrobacter_taibaiensiss__Solirubrobacter_paulis__Conexibacter_arvaliss__un_g__Conexibacters__un_o__Solirubrobacteraless__un_c__Actinobacterias__Gaiella_occultas__Rubrobacter_bracarensiss__un_g__Rubrobacters__un_p__Actinobacterias__Flavobacterium_terraes__un_g__Flavobacteriums__Flavobacterium_saliperosums__Flavobacterium_anhuienses__Flavobacterium_gyeonganenses__Flavobacterium_subsaxonicums__Flavobacterium_hauenses__Flavobacterium_urocaniciphilums__Chryseobacterium_defluviis__Chryseobacterium_wanjuenses__un_g__Chryseobacteriums__un_f__Flavobacteriaceaes__un_f__Cryomorphaceaes__un_o__Flavobacterialess__Ohtaekwangia_koreensiss__un_g__Ohtaekwangias__Ohtaekwangia_kribbensiss__un_o__Cytophagaless__Chryseolinea_serpenss__un_f__Flammeovirgaceaes__un_g__Niastellas__un_g__Terrimonass__Terrimonas_aquaticas__un_g__Chitinophagas__un_f__Chitinophagaceaes__Lacibacter_daechungensiss__un_f__Sphingobacteriaceaes__un_o__Sphingobacterialess__un_p__Bacteroidetess__Mangrovibacterium_diazotrophicums__un_g__Bacilluss__Bacillus_thuringiensiss__Bacillus_abyssaliss__Bacillus_infantiss__Bacillus_infernuss__Bacillus_nealsoniis__Bacillus_plakortidiss__Bacillus_hortis__Bacillus_litoraliss__Bacillus_idriensiss__Bacillus_invictaes__Bacillus_kokeshiiformiss__un_f__Bacillaceae_1s__un_g__Fictibacilluss__Paenibacillus_panacisolis__un_g__Paenibacilluss__Paenibacillus_terraes__Paenibacillus_glycanilyticuss__Paenibacillus_urinaliss__Paenibacillus_pectinilyticuss__Paenibacillus_prosopidiss__Paenibacillus_brasilensiss__Paenibacillus_barcinonensiss__Brevibacillus_ginsengisolis__un_g__Brevibacilluss__Cohnella_panacarvis__un_g__Cohnellas__un_f__Paenibacillaceae_1s__Ammoniibacillus_agariperforanss__Paenisporosarcina_macmurdoensiss__un_g__Paenisporosarcinas__un_f__Planococcaceaes__un_g__Lysinibacilluss__Terribacillus_goriensiss__un_g__Terribacilluss__un_f__Bacillaceae_2s__un_g__Gracilibacilluss__Planifilum_compostis__un_g__Planifilums__un_f__Paenibacillaceae_2s__Tumebacillus_ginsengisolis__un_g__Tumebacilluss__un_o__Bacillaless__un_g__Clostridium_sensu_strictos__Clostridium_roseums__Sporacetigenium_mesophilums__un_g__Romboutsias__un_g__Clostridium_XIs__un_f__Lachnospiraceaes__Clostridium_straminisolvenss__un_f__Gracilibacteraceaes__un_c__Clostridias__Turicibacter_sanguiniss__un_g__Herpetosiphons__un_g__Roseiflexuss__Roseiflexus_castenholziis__un_f__Chloroflexaceaes__un_g__Oscillochloriss__un_o__Chloroflexaless__un_c__Chloroflexias__un_p__Chloroflexis__un_f__Caldilineaceaes__un_c__Acidobacteria_Gp4s__Blastocatella_fastidiosas__Aridibacter_famiduranss__un_g__Gp10s__un_g__Gp3s__un_g__Gp6s__un_g__Gp17s__un_p__Acidobacterias__un_g__Gp11s__un_g__Gp25s__un_g__Gp5s__un_g__Gp7s__un_g__Luteolibacters__Roseimicrobium_gellanilyticums__un_f__Verrucomicrobiaceaes__un_p__Verrucomicrobias__un_g__Spartobacteria_genera_incertae_sediss__un_g__Subdivision3_genera_incertae_sediss__un_c__Subdivision3s__Opitutus_terraes__Blastopirellula_cremeas__un_g__Blastopirellulas__un_f__Planctomycetaceaes__Pirellula_staleyis__un_p__Planctomycetess__Turneriella_parvas__un_f__Leptospiraceaes__un_g__Leptospiras__un_f__Spirochaetaceaes__un_f__Chlamydiaceaes__un_f__Parachlamydiaceaes__Parachlamydia_acanthamoebaes__un_o__Chlamydiales

TaxonKit 使用

TaxonKit是采用Go语言编写的命令行工具, 提供Linux, Windows, macOS操作系统不同架构(x86-64/arm64)的静态编译的可执行二进制文件。 发布的压缩包不足3Mb,除了Github托管外,还提供国内镜像供下载,同时还支持conda和homebrew安装。

用户只需要下载、解压,开箱即用,无需配置,仅需下载解压NCBI Taxonomy数据文件解压到指定目录即可。

选择系统对应的版本下载最新版 https://github.com/shenwei356/taxonkit/releases ,解压后添加环境变量即可使用。或可选conda安装

conda install taxonkit -c bioconda -y

表格数据处理,推荐使用 csvtk 更高效:

conda install csvtk -c bioconda -y

测试数据下载可直接 https://github.com/shenwei356/taxonkit 下载项目压缩包,或使用git clone下载项目文件夹,其中的example为测试数据

git clone https://github.com/shenwei356/taxonkit

TaxonKit为命令行工具,采用子命令的方式来执行不同功能, 大多数子命令支持标准输入/输出,便于使用命令行管道进行流水作业, 轻松整合进分析流程中。

  • 输出:
    • 所有命令输出中包含输入数据内容,在此基础上增加列。
    • 所有命令默认输出到标准输出(stdout),可通过重定向(>)写入文件。
    • 或通过全局参数-o--out-file指定输出文件,且可自动识别输出文件后缀(.gz)输出gzip格式。
  • 输入:
    • 除了listtaxid-changelog之外,lineage, reformat, name2taxid, filterlca 均可从标准输入(stdin)读取输入数据,也可通过位置参数(positional arguments)输入,即命令后面不带 任何flag的参数,如 taxonkit lineage taxids.txt
    • 输入格式为单列,或者制表符分隔的格式,输入数据所在列用-i--taxid-field指定。

TaxonKit直接解析NCBI Taxonomy数据文件(2秒左右),配置更容易,也便于更新数据,占用内存在500Mb-1.5G左右。 数据下载:

1
2
3
4
5
6
7
# 有时下载失败,可多试几次;或尝试浏览器下载此链接
wget -c https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz 
tar -zxvf taxdump.tar.gz

# 解压文件存于家目录中.taxonkit/,程序默认数据库默认目录
mkdir -p $HOME/.taxonkit
cp names.dmp nodes.dmp delnodes.dmp merged.dmp $HOME/.taxonkit

Taxonkit的作者大大贴心地提供了中文文档:https://bioinf.shenwei.me/taxonkit/chinese/,非常详细,大家可以参考使用。

Licensed under CC BY-NC-SA 4.0
Email: pengchen2001@zju.edu.cn
Built with Hugo
Theme Stack designed by Jimmy