Home / Release archive / 10.216
The release is based on RefSeq release 216.
The metadata consists of the following files:
The fasta file is compressed with gzip, and the metadata file is a zip archive. To uncompress them, Linux and Mac OS users may use gzip and zip programs, they should be built-in. For Windows users, the free and open-source (de)compression program 7-Zip is available.
You can find all releases in the RiboGrove release archive.
Starting with RiboGrove 10.216 we use Rfam 14.9 for sequence filtering. In Rfam 14.9, the model RF00177 (Bacterial small subunit ribosomal RNA) is updated. Before RiboGrove 10.216 we used Rfam 14.6.
You can find notes to all RiboGrove releases on the release notes page.
Bacteria | Archaea | Total | |
---|---|---|---|
Number of gene sequences | 163,262 | 835 | 164,097 |
Number of unique gene sequences | 44,021 | 606 | 44,627 |
Number of species | 7,942 | 391 | 8,333 |
Number of genomes | 30,880 | 491 | 31,371 |
Number of genomes of category 1 | 20,501 | 183 | 20,684 |
Number of genomes of category 2 | 10,248 | 308 | 10,556 |
Number of genomes of category 3 | 131 | 0 | 131 |
Bacteria | Archaea | |
---|---|---|
Minimum (bp) | 1,448.00 | 1,439.00 |
25th percentile (bp) * | 1,517.00 | 1,471.50 |
Median (bp) * | 1,531.00 | 1,474.00 |
75th percentile (bp) * | 1,542.00 | 1,486.00 |
Average (bp) * | 1,527.85 | 1,495.76 |
Mode (bp) * | 1,537.00 | 1,472.00 |
Maximum (bp) | 2,438.00 | 3,604.00 |
Standard deviation (bp) * | 25.63 | 135.98 |
* Metrics marked with this sign were calculated with preliminary normalization, i.e. median within-species gene length was used for the summary.
Copy number * | Bacteria | Archaea | ||
---|---|---|---|---|
Number of species | Per cent of species (%) | Number of species | Per cent of species (%) | |
1 | 1,007 | 12.68 | 211 | 53.96 |
2 | 1,459 | 18.37 | 109 | 27.88 |
3 | 1,192 | 15.01 | 56 | 14.32 |
4 | 1,038 | 13.07 | 10 | 2.56 |
5 | 679 | 8.55 | 5 | 1.28 |
6 | 847 | 10.66 | 0 | 0.00 |
7 | 670 | 8.44 | 0 | 0.00 |
8 | 391 | 4.92 | 0 | 0.00 |
9 | 203 | 2.56 | 0 | 0.00 |
10 | 174 | 2.19 | 0 | 0.00 |
11 | 92 | 1.16 | 0 | 0.00 |
12 | 71 | 0.89 | 0 | 0.00 |
13 | 34 | 0.43 | 0 | 0.00 |
14 | 49 | 0.62 | 0 | 0.00 |
15 | 14 | 0.18 | 0 | 0.00 |
16 | 4 | 0.05 | 0 | 0.00 |
17 | 7 | 0.09 | 0 | 0.00 |
18 | 5 | 0.06 | 0 | 0.00 |
20 | 4 | 0.05 | 0 | 0.00 |
27 | 1 | 0.01 | 0 | 0.00 |
37 | 1 | 0.01 | 0 | 0.00 |
* These are median within-species copy numbers.
Organism | Gene length (bp) | RiboGrove Sequence ID(s) | Assembly ID |
---|---|---|---|
Bacteria | |||
Thermus thermophilus strain AA2-2 | 2,438 | G_10898951:NZ_AP024929.1:249100-251537:minus | 10898951 |
Ca. Annandia pinicola strain Ad13-065 | 1,887 | G_11277031:NZ_CP045876.1:290071-291957:minus | 11277031 |
Nitrosophilus labii strain HRV44 | 1,806 | G_8028891:NZ_AP022826.1:1258017-1259822:minus G_8028891:NZ_AP022826.1:1532588-1534393:minus G_8028891:NZ_AP022826.1:1939914-1941719:minus |
8028891 |
Gelria sp. Kuro-4 | 1,788 | G_10731991:NZ_AP024619.1:2016182-2017969:minus | 10731991 |
Thermoanaerobacter brockii strain Ako-1 | 1,781 | G_282748:NC_014964.1:2252888-2254668:minus | 282748 |
Thermoanaerobacter pseudethanolicus strain ATCC 33223 | 1,781 | G_40148:NC_010321.1:2265744-2267524:minus | 40148 |
Thermoanaerobacter sp. RKWS2 | 1,754 | G_14447161:NZ_CP110888.1:94012-95765:plus | 14447161 |
Campylobacter sputorum strain RM3237 | 1,744 | G_1153941:NZ_CP019682.1:607981-609724:plus G_1153941:NZ_CP019682.1:929565-931308:minus G_1153941:NZ_CP019682.1:1501945-1503688:minus |
1153941 |
Campylobacter sputorum strain LMG 7795 | 1,744 | G_4499991:NZ_CP043427.1:609141-610884:plus G_4499991:NZ_CP043427.1:930699-932442:minus G_4499991:NZ_CP043427.1:1503078-1504821:minus |
4499991 |
Campylobacter sputorum strain CCUG 20703 | 1,743 | G_1153911:NZ_CP019683.1:606847-608589:plus G_1153911:NZ_CP019683.1:935163-936905:minus G_1153911:NZ_CP019683.1:1558189-1559931:minus |
1153911 |
Archaea | |||
Pyrobaculum ferrireducens strain 1860 | 3,604 | G_351728:NC_016645.1:127214-130817:plus | 351728 |
Pyrobaculum aerophilum strain IM2 | 2,213 | G_28808:NC_003364.1:1089640-1091852:plus | 28808 |
Pyrobaculum arsenaticum strain DSM 13514 | 2,212 | G_37488:NC_009376.1:623323-625534:minus | 37488 |
Aeropyrum pernix strain K1 | 2,202 | G_32288:NC_000854.2:1218712-1220913:minus | 32288 |
Pyrobaculum neutrophilum strain V24Sta | 2,197 | G_40848:NC_010525.1:690419-692615:plus | 40848 |
Ca. Mancarchaeum acidiphilum strain Mia14 | 2,008 | G_1145431:NZ_CP019964.1:751297-753304:minus | 1145431 |
Ca. Micrarchaeum sp. A_DKE | 2,003 | G_9220081:NZ_CP060530.1:203642-205644:minus | 9220081 |
Caldivirga maquilingensis strain IC-167 | 1,679 | G_39388:NC_009954.1:129150-130828:minus | 39388 |
Aeropyrum camini strain SY1 | 1,650 | G_127981:NC_022521.1:1165168-1166817:minus | 127981 |
Pyrolobus fumarii strain 1A | 1,576 | G_304318:NC_015931.1:84671-86246:minus | 304318 |
Organism | Gene length (bp) | RiboGrove Sequence ID(s) | Assembly ID |
---|---|---|---|
Bacteria | |||
Hirschia baltica strain ATCC 49814 | 1,448 | G_44428:NC_012982.1:2336679-2338126:minus | 44428 |
Sagittula sp. P11 | 1,449 | G_1460951:NZ_CP021913.1:2386837-2388285:plus G_1460951:NZ_CP021913.1:3597920-3599368:plus |
1460951 |
Hyphomonas sp. Mor2 | 1,451 | G_860061:NZ_CP017718.1:2304269-2305719:minus | 860061 |
Antarctobacter heliothermus strain SMS3 | 1,453 | G_1163161:NZ_CP022540.1:1369380-1370832:plus G_1163161:NZ_CP022540.1:2482480-2483932:plus |
1163161 |
Mameliella alba strain KU6B | 1,454 | G_6279751:NZ_AP022337.1:267139-268592:plus G_6279751:NZ_AP022337.1:1420942-1422395:plus G_6279751:NZ_AP022337.1:3191208-3192661:minus |
6279751 |
Hyphomonas sp. KY3 | 1,455 | G_9503471:NZ_CP022271.1:2407999-2409453:minus | 9503471 |
Hyphomonas neptunium strain ATCC 15444 | 1,455 | G_34128:NC_008358.1:2818466-2819920:minus | 34128 |
Cognatishimia activa strain SOCE 004 | 1,456 | G_14327851:NZ_CP096147.1:529008-530463:plus | 14327851 |
Pseudooceanicola algae strain Lw-13e | 1,458 | G_8694041:NZ_CP060436.1:2482207-2483664:minus | 8694041 |
Ruegeria sp. SCSIO 43209 | 1,458 | G_10854641:NZ_CP065359.1:3157837-3159294:minus | 10854641 |
Archaea | |||
Ignicoccus hospitalis strain KIN4/I | 1,439 | G_39048:NC_009776.1:728362-729800:plus | 39048 |
Methanocaldococcus sp. SG7 | 1,457 | G_10131521:NZ_LR792632.1:542755-544211:plus | 10131521 |
Halorubrum sp. BOL3-1 | 1,463 | G_2220501:NZ_CP034692.1:397753-399215:minus | 2220501 |
Natronomonas gomsonensis strain KCTC 4088 | 1,466 | G_13300951:NZ_CP101323.1:2500564-2502029:plus | 13300951 |
Methanospirillum sp. J.3.6.1-F.2.7.3 | 1,466 | G_10123301:NZ_CP075546.1:133354-134819:plus G_10123301:NZ_CP075546.1:825954-827419:plus G_10123301:NZ_CP075546.1:872641-874106:plus G_10123301:NZ_CP075546.1:1727419-1728884:plus |
10123301 |
Methanospirillum hungatei strain GP1 | 1,466 | G_10519241:NZ_CP077107.1:4649-6114:plus G_10519241:NZ_CP077107.1:1359562-1361027:minus G_10519241:NZ_CP077107.1:1365502-1366967:minus G_10519241:NZ_CP077107.1:1986020-1987485:minus |
10519241 |
Salinirubellus salinus strain ZS-35-S2 | 1,466 | G_13813051:NZ_CP104003.1:3070232-3071697:plus | 13813051 |
Methanospirillum hungatei strain JF-1 | 1,466 | G_34548:NC_007796.1:39814-41279:plus G_34548:NC_007796.1:1301079-1302544:minus G_34548:NC_007796.1:3501525-3502990:minus G_34548:NC_007796.1:3507609-3509074:minus |
34548 |
Ca. Methanomethylophilus alvus strain Mx-05 | 1,466 | G_2068141:NZ_CP017686.1:283608-285073:plus | 2068141 |
Natronomonas sp. ZY43 | 1,466 | G_13300761:NZ_CP101154.1:18680-20145:plus | 13300761 |
Ca. Methanomethylophilus alvus strain MGYG-HGUT-02456 | 1,466 | G_4352521:NZ_LR699000.1:283607-285072:plus | 4352521 |
Natronomonas halophila strain C90 | 1,466 | G_7330651:NZ_CP058334.1:1530622-1532087:minus | 7330651 |
Ca. Methanomethylophilus alvus strain Mx1201 | 1,466 | G_599268:NC_020913.1:283607-285072:plus | 599268 |
Organism | Copy number | Assembly ID | |
---|---|---|---|
Bacteria | |||
Tumebacillus avium strain AR23208 | 37 | 1115491 | |
Tumebacillus algifaecis strain THMBR28 | 27 | 1166771 | |
Peribacillus asahii strain KF4 | 21 | 13022701 | |
Priestia megaterium strain S2 | 21 | 6720751 | |
Photobacterium damselae strain 04Ya311 | 20 | 14314271 | |
Moritella sp. 36 | 20 | 9972241 | |
Neobacillus drentensis strain JC05 | 20 | 11802511 | |
Moritella sp. 5 | 20 | 9972261 | |
Moritella sp. 28 | 20 | 9972251 | |
Metabacillus litoralis strain Bac94 | 19 | 2023811 | |
Photobacterium damselae strain AS-15-3942-7 | 19 | 11907491 | |
Archaea | |||
Methanococcoides orientis strain LMO-1 | 5 | 11622961 | |
Natronorubrum aibiense strain 7-3 | 5 | 5073821 | |
Natrinema sp. SYSU A 869 | 5 | 10842511 | |
Natronorubrum bangense strain JCM 10635 | 5 | 2580821 | |
Methanoplanus endosymbiosus strain DSM 3599 | 5 | 13492921 | |
Methanogenium organophilum strain DSM 3596 | 4 | 14706461 | |
Methanospirillum sp. J.3.6.1-F.2.7.3 | 4 | 10123301 | |
Halosiccatus urmianus strain IBRC-M: 10911 | 4 | 11057071 | |
Methanococcus vannielii strain SB | 4 | 38268 | |
Natronococcus occultus strain SP4 | 4 | 521038 | |
Halomicrobium salinisoli strain TH30 | 4 | 11151391 | |
Methanospirillum hungatei strain JF-1 | 4 | 34548 | |
Halomicrobium salinisoli strain LT50 | 4 | 11151361 | |
Methanospirillum hungatei strain GP1 | 4 | 10519241 | |
Methanosphaera stadtmanae strain DSM 3091 | 4 | 33648 | |
Haloterrigena salifodinae strain BOL5-1 | 4 | 9298621 | |
Haloarcula sinaiiensis strain ATCC 33800 | 4 | 9962651 | |
Methanosphaera stadtmanae strain MGYG-HGUT-02164 | 4 | 4349641 |
Organism | Sum of entropy * (bits) | Mean entropy * (bits) | Number of variable positions | Gene copy number | Assembly ID |
---|---|---|---|---|---|
Bacteria | |||||
Synechococcus sp. NB0720_010 | 243.35 | 0.16 | 265 | 3 | 12576831 |
Xanthomonas oryzae strain YNCX | 227.74 | 0.15 | 248 | 3 | 13407211 |
Sporomusa termitida strain DSM 4440 | 226.25 | 0.13 | 247 | 12 | 4155511 |
Campylobacter hyointestinalis strain CHY5 | 217.64 | 0.12 | 237 | 3 | 7294871 |
Campylobacter sp. RM6137 | 211.21 | 0.12 | 230 | 3 | 1101781 |
Acetivibrio thermocellus strain M3 | 211.00 | 0.14 | 211 | 2 | 13802461 |
Sinorhizobium meliloti strain AK76 | 184.58 | 0.12 | 201 | 3 | 9010851 |
Cylindrospermopsis raciborskii strain KLL07 | 168.97 | 0.11 | 184 | 3 | 11851031 |
Klebsiella pneumoniae strain GZ-1 | 167.21 | 0.10 | 216 | 5 | 8227731 |
Olleya sp. Bg11-27 | 145.25 | 0.10 | 156 | 3 | 1469691 |
Archaea | |||||
Halomicrobium sp. ZPS1 ** | 137.00 | 0.09 | 137 | 2 | 4982121 |
Halosiccatus urmianus strain IBRC-M: 10911 | 131.55 | 0.09 | 146 | 4 | 11057071 |
Halapricum desulfuricans strain HSR12-2 | 128.00 | 0.09 | 128 | 2 | 9390741 |
Halomicrobium salinisoli strain TH30 | 127.74 | 0.09 | 145 | 4 | 11151391 |
Halapricum desulfuricans strain HSR-Bgl | 127.00 | 0.09 | 127 | 2 | 9390521 |
Halomicrobium mukohataei strain JP60 | 125.81 | 0.09 | 137 | 3 | 2582391 |
Halomicrobium salinisoli strain LT50 | 123.31 | 0.08 | 140 | 4 | 11151361 |
Halapricum desulfuricans strain HSR-Est | 111.00 | 0.08 | 111 | 2 | 9390681 |
Halapricum desulfuricans strain HSR12-1 | 109.00 | 0.07 | 109 | 2 | 9390731 |
Halorussus sp. XZYJT49 | 105.10 | 0.07 | 113 | 3 | 12653301 |
* Entropy is Shannon entropy calculated for each column of the multiple sequence alignment (MSA) of all full-length 16S rRNA genes of a genome. Entropy is then summed up (column “Sum of entropy”) and averaged (column “Mean entropy”).
** Halomicrobium sp. ZPS1 is a quite remarkable case. This genome harbours two 16S rRNA genes, therefore entropy is equal to the number of mismatching nucleotides between sequences of the genes. Respectively, per cent of identity between these two gene sequences is 90.70%! This is remarkable because the usual (however arbitrary) genus demarcation threshold of per cent of identity is 95%.
Phylum | Number of genomes |
Full gene | V1–V2 | V1–V3 | V3–V4 | V3–V5 | V4–V5 | V4–V6 | V5–V6 | V5–V7 | V6–V7 | V6–V8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
27F–1492R (%) |
27F–338R (%) |
27F–534R (%) |
341F–785R (%) |
341F–944R (%) |
515F–944R (%) |
515F–1100R (%) |
784F–1100R (%) |
784F–1193R (%) |
939F–1193R (%) |
939F–1378R (%) |
||
Proteobacteria | 18,075 | 99.73 | 99.51 | 99.72 | 99.95 | 82.51 | 82.55 | 90.17 | 89.87 | 93.64 | 92.63 | 96.68 |
Bacillota | 6,885 | 99.97 | 99.85 | 99.94 | 99.96 | 95.89 | 95.79 | 99.51 | 97.88 | 97.21 | 98.56 | 99.27 |
Actinomycetota | 2,946 | 99.80 | 98.85 | 99.59 | 94.09 | 63.88 | 63.68 | 96.13 | 99.63 | 99.73 | 99.76 | 97.15 |
Bacteroidota | 1,185 | 95.19 | 94.68 | 95.11 | 99.92 | 61.60 | 61.18 | 38.73 | 38.99 | 94.77 | 92.41 | 94.51 |
Tenericutes | 489 | 97.34 | 94.68 | 74.44 | 98.36 | 90.59 | 90.80 | 71.37 | 41.10 | 42.33 | 78.73 | 0.41 |
Spirochaetes | 351 | 49.29 | 49.29 | 49.29 | 95.44 | 100.00 | 100.00 | 100.00 | 78.35 | 78.35 | 91.17 | 38.18 |
Cyanobacteria | 224 | 100.00 | 100.00 | 100.00 | 100.00 | 4.91 | 4.91 | 100.00 | 0.89 | 0.89 | 100.00 | 99.55 |
Chlamydiae | 187 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 93.58 |
Verrucomicrobia | 114 | 99.12 | 0.00 | 99.12 | 100.00 | 8.77 | 8.77 | 100.00 | 0.88 | 0.88 | 100.00 | 100.00 |
Fusobacteria | 80 | 100.00 | 96.25 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 98.75 | 98.75 | 100.00 | 0.00 |
Deinococcus-Thermus | 76 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 50.00 | 100.00 |
Planctomycetota | 57 | 100.00 | 19.30 | 100.00 | 100.00 | 64.91 | 64.91 | 0.00 | 0.00 | 0.00 | 3.51 | 0.00 |
Thermotogae | 42 | 100.00 | 97.62 | 100.00 | 100.00 | 9.52 | 9.52 | 100.00 | 0.00 | 0.00 | 59.52 | 97.62 |
Chloroflexi | 41 | 100.00 | 90.24 | 100.00 | 39.02 | 0.00 | 0.00 | 87.80 | 4.88 | 4.88 | 92.68 | 26.83 |
Acidobacteria | 31 | 96.77 | 96.77 | 96.77 | 100.00 | 100.00 | 100.00 | 100.00 | 61.29 | 45.16 | 83.87 | 100.00 |
Chlorobi | 15 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 93.33 | 86.67 | 6.67 |
Aquificae | 14 | 100.00 | 21.43 | 100.00 | 100.00 | 21.43 | 21.43 | 100.00 | 0.00 | 0.00 | 7.14 | 21.43 |
Nitrospirae | 10 | 100.00 | 100.00 | 100.00 | 100.00 | 60.00 | 60.00 | 100.00 | 100.00 | 60.00 | 60.00 | 100.00 |
Thermodesulfobacteria | 7 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Ca. Saccharibacteria | 6 | 100.00 | 100.00 | 100.00 | 100.00 | 16.67 | 16.67 | 16.67 | 0.00 | 0.00 | 100.00 | 100.00 |
Synergistetes | 6 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Deferribacteres | 6 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Elusimicrobia | 4 | 100.00 | 50.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 75.00 | 75.00 | 100.00 | 100.00 |
Gemmatimonadetes | 4 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Dictyoglomi | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 |
Fibrobacteres | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Ignavibacteriae | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Kiritimatiellaeota | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Chrysiogenetes | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Calditrichaeota | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Ca. Absconditabacteria | 1 | 100.00 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 |
Ca. Bipolaricaulota | 1 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Caldiserica | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 |
Balneolota | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Atribacterota | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Ca. Cloacimonetes | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Armatimonadetes | 1 | 100.00 | 0.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Ca. Omnitrophica | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Coprothermobacterota | 1 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 |
* Coverage of a primer pair is the per cent of genomes having at least one 16S rRNA gene which can be amplified by PCR using this primer pair. For details, see our paper about RiboGrove.
Primer name | Sequence | Reference |
---|---|---|
27F | AGAGTTTGATYMTGGCTCAG | Frank et al., 2008 |
338R | GCTGCCTCCCGTAGGAGT | Suzuki et al., 1996 |
341F * | CCTACGGGNGGCWGCAG | Klindworth et al., 2013 |
515F | GTGCCAGCMGCCGCGGTAA | Turner et al., 1999 |
534R | ATTACCGCGGCTGCTGG | Walker et al., 2015 |
784F | AGGATTAGATACCCTGGTA | Andersson et al., 2008 |
785R * | GACTACHVGGGTATCTAATCC | Klindworth et al., 2013 |
939F | GAATTGACGGGGGCCCGCACAAG | Lebuhn et al., 2014 |
944R | GAATTAAACCACATGCTC | Fuks et al., 2018 |
1100R | AGGGTTGCGCTCGTTG | Turner et al., 1999 |
1193R | ACGTCATCCCCACCTTCC | Bodenhausen et al, 2013 |
1378R | CGGTGTGTACAAGGCCCGGGAACG | Lebuhn et al., 2014 |
1492R | TACCTTGTTACGACTT | Frank et al., 2008 |
* Primers 341F and 785R are used in the protocol for library preparation for sequencing of V3–V4 region of 16S rRNA genes on Illumina MiSeq.
RiboGrove is a very minimalistic database — it comprises a collection of plain fasta files with metadata. Thus, extended search instruments are not available for it. We admit this problem and provide a list of suggestions below. The suggestions would help you to explore and select RiboGrove data.
RiboGrove fasta data has the following format of header:
>G_324861:NZ_CP009686.1:8908-10459:plus ;d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__cereus; category:1
Major blocks of a header are separated by spaces. A header consists of three such blocks:
You can select specific sequences from fasta files using the Seqkit program (GitHub repo, documentation). It is free, cross-platform, multifunctional and pretty fast and can process both gzipped and uncompressed fasta files. Programs seqkit grep and seqkit seq are useful for sequence selection.
Given the downloaded fasta file ribogrove_10.216_sequences.fasta.gz, consider the following examples of sequence selection using seqkit grep:
Example 1. Select a single sequence by SeqID.
seqkit grep -p "G_324861:NZ_CP009686.1:8908-10459:plus" ribogrove_10.216_sequences.fasta.gz
The -p option sets a pattern to search in fasta headers (only in sequence IDs, actually).
Example 2. Select all gene sequences of a single RefSeq genomic sequence by accession number NZ_CP009686.1.
seqkit grep -nrp ":NZ_CP009686.1:" ribogrove_10.216_sequences.fasta.gz
Here, two more options are required: -n and -r. The former tells the program to match the whole headers instead of IDs only. The latter tells the program not to exclude partial matches from output, i.e. if the pattern is a substring of a header, the header will be printed to output.
To ensure search specificity, surround the Accession.Version with colons (:).
Example 3. Select all gene sequences of a single genome (Assembly ID 10577151).
seqkit grep -nrp "G_10577151:" ribogrove_10.216_sequences.fasta.gz
To ensure search specificity, Assembly ID should be preceded by prefix G_ and followed by a colon (:).
Example 4. Select all actinobacterial sequences.
seqkit grep -nrp ";p__Actinobacteria;" ribogrove_10.216_sequences.fasta.gz
To ensure search specificity, surround the taxonomy name with semicolons (;).
Example 5. Select all sequences originating from category 1 genomes.
seqkit grep -nrp "category:1" ribogrove_10.216_sequences.fasta.gz
Example 6. Select all sequences except for those belonging to Firmicutes.
seqkit grep -nvrp ";p__Firmicutes;" ribogrove_10.216_sequences.fasta.gz
Recognize the -v option within the option sequence -nvrp. This option inverts match, i.e. output will comprise sequences, headers of which do not contain the substring “;p__Firmicutes;”.
You can use the seqkit seq program to select sequences by length.
Example 1. Select all sequences longer than 1600 bp.
seqkit seq -m 1601 ribogrove_10.216_sequences.fasta.gz
The -m option sets the minimum length of a sequence to be printed to output.
Example 2. Select all sequences shorter than 1500 bp.
seqkit seq -M 1499 ribogrove_10.216_sequences.fasta.gz
The -M option sets the maximum length of a sequence to be printed to output.
Example 3. Select all sequences having length in range [1500, 1600] bp.
seqkit seq -m 1500 -M 1600 ribogrove_10.216_sequences.fasta.gz
It is sometimes useful to retrieve only header information from a fasta file. You can use the seqkit seq program for it.
Example 1. Select all headers.
seqkit seq -n ribogrove_10.216_sequences.fasta.gz
The -n option tells the program to output only headers.
Example 2. Select all SeqIDs (header parts before the first space).
seqkit seq -ni ribogrove_10.216_sequences.fasta.gz
The -i option tells the program to output only sequence IDs.
Example 3. Select all “Assession.Version”s.
seqkit seq -ni ribogrove_10.216_sequences.fasta.gz | cut -f2 -d':' | sort | uniq
This might be done only if you have cut, sort and uniq utilities installed (Linux and Mac OS systems should have them built-in).
Example 4. Select all Assembly IDs.
seqkit seq -ni ribogrove_10.216_sequences.fasta.gz | cut -f1 -d':' | sed 's/G_//' | sort | uniq
This might be done only if you have cut, sed, sort and uniq utilities installed (Linux and Mac OS systems should have them built-in).
Example 5. Select all phylum names.
seqkit seq -n ribogrove_10.216_sequences.fasta.gz | grep -Eo ';p__[^;]+' | sed -E 's/;|p__//g' | sort | uniq
This might be done only if you have grep, sed, sort and uniq utilities installed (Linux and Mac OS systems should have them built-in).
RiboGrove, 2023-05-19