Home / Release archive / 9.215
The release is based on RefSeq release 215.
The metadata consists of the following files:
The fasta file is compressed with gzip, and the metadata file is a zip archive. To uncompress them, Linux and Mac OS users may use gzip and zip programs, they should be built-in. For Windows users, the free and open-source (de)compression program 7-Zip is available.
You can find all releases in the RiboGrove release archive.
Starting with release 9.215, we publish section “Coverage of primer pairs for different V-regions of bacterial 16S rRNA genes”.
You can find notes to all RiboGrove releases on the release notes page.
Bacteria | Archaea | Total | |
---|---|---|---|
Number of gene sequences | 158,769 | 813 | 159,582 |
Number of unique gene sequences | 43,247 | 592 | 43,839 |
Number of species | 7,717 | 381 | 8,098 |
Number of genomes | 29,922 | 478 | 30,400 |
Number of genomes of category 1 | 19,915 | 174 | 20,089 |
Number of genomes of category 2 | 9,876 | 304 | 10,180 |
Number of genomes of category 3 | 131 | 0 | 131 |
Bacteria | Archaea | |
---|---|---|
Minimum (bp) | 1,448.00 | 1,439.00 |
25th percentile (bp) * | 1,517.00 | 1,472.00 |
Median (bp) * | 1,531.00 | 1,474.00 |
75th percentile (bp) * | 1,542.00 | 1,486.50 |
Average (bp) * | 1,528.01 | 1,496.32 |
Mode (bp) * | 1,537.00 | 1,472.00 |
Maximum (bp) | 2,438.00 | 3,604.00 |
Standard deviation (bp) * | 25.63 | 137.70 |
* Metrics marked with this sign were calculated with preliminary normalization, i.e. median within-species gene length was used for the summary.
Copy number * | Bacteria | Archaea | ||
---|---|---|---|---|
Number of species | Per cent of species (%) | Number of species | Per cent of species (%) | |
1 | 949 | 12.30 | 205 | 53.81 |
2 | 1,431 | 18.54 | 106 | 27.82 |
3 | 1,164 | 15.08 | 56 | 14.70 |
4 | 1,003 | 13.00 | 9 | 2.36 |
5 | 657 | 8.51 | 5 | 1.31 |
6 | 825 | 10.69 | 0 | 0.00 |
7 | 650 | 8.42 | 0 | 0.00 |
8 | 389 | 5.04 | 0 | 0.00 |
9 | 202 | 2.62 | 0 | 0.00 |
10 | 172 | 2.23 | 0 | 0.00 |
11 | 92 | 1.19 | 0 | 0.00 |
12 | 67 | 0.87 | 0 | 0.00 |
13 | 34 | 0.44 | 0 | 0.00 |
14 | 49 | 0.63 | 0 | 0.00 |
15 | 12 | 0.16 | 0 | 0.00 |
16 | 5 | 0.06 | 0 | 0.00 |
17 | 5 | 0.06 | 0 | 0.00 |
18 | 5 | 0.06 | 0 | 0.00 |
20 | 4 | 0.05 | 0 | 0.00 |
27 | 1 | 0.01 | 0 | 0.00 |
37 | 1 | 0.01 | 0 | 0.00 |
* These are median within-species copy numbers.
Organism | Gene length (bp) | RiboGrove Sequence ID(s) | Assembly ID |
---|---|---|---|
Bacteria | |||
Thermus thermophilus strain AA2-2 | 2,438 | G_10898951:NZ_AP024929.1:249100-251537:minus | 10898951 |
Ca. Annandia pinicola strain Ad13-065 | 1,887 | G_11277031:NZ_CP045876.1:290071-291957:minus | 11277031 |
Nitrosophilus labii strain HRV44 | 1,806 | G_8028891:NZ_AP022826.1:1258017-1259822:minus G_8028891:NZ_AP022826.1:1532588-1534393:minus G_8028891:NZ_AP022826.1:1939914-1941719:minus |
8028891 |
Gelria sp. Kuro-4 | 1,788 | G_10731991:NZ_AP024619.1:2016182-2017969:minus | 10731991 |
Thermoanaerobacter pseudethanolicus strain ATCC 33223 |
1,781 | G_40148:NC_010321.1:2265744-2267524:minus | 40148 |
Thermoanaerobacter brockii strain Ako-1 | 1,781 | G_282748:NC_014964.1:2252888-2254668:minus | 282748 |
Campylobacter sputorum strain RM3237 | 1,744 | G_1153941:NZ_CP019682.1:607981-609724:plus G_1153941:NZ_CP019682.1:929565-931308:minus G_1153941:NZ_CP019682.1:1501945-1503688:minus |
1153941 |
Campylobacter sputorum strain LMG 7795 | 1,744 | G_4499991:NZ_CP043427.1:609141-610884:plus G_4499991:NZ_CP043427.1:930699-932442:minus G_4499991:NZ_CP043427.1:1503078-1504821:minus |
4499991 |
Campylobacter sputorum strain CCUG 20703 | 1,743 | G_1153911:NZ_CP019683.1:606847-608589:plus G_1153911:NZ_CP019683.1:935163-936905:minus G_1153911:NZ_CP019683.1:1558189-1559931:minus |
1153911 |
Campylobacter hyointestinalis strain CHY5 | 1,742 | G_7294871:NZ_CP053828.1:357136-358877:plus G_7294871:NZ_CP053828.1:1667816-1669557:minus |
7294871 |
Campylobacter sp. RM6137 | 1,742 | G_1101781:NZ_CP018789.1:273370-275111:plus G_1101781:NZ_CP018789.1:1545743-1547484:minus |
1101781 |
Campylobacter sputorum strain RM8705 | 1,742 | G_1153931:NZ_CP019685.1:577810-579551:plus G_1153931:NZ_CP019685.1:891862-893603:minus G_1153931:NZ_CP019685.1:1479764-1481505:minus |
1153931 |
Archaea | |||
Pyrobaculum ferrireducens strain 1860 | 3,604 | G_351728:NC_016645.1:127214-130817:plus | 351728 |
Pyrobaculum aerophilum strain IM2 | 2,213 | G_28808:NC_003364.1:1089640-1091852:plus | 28808 |
Pyrobaculum arsenaticum strain DSM 13514 | 2,212 | G_37488:NC_009376.1:623323-625534:minus | 37488 |
Aeropyrum pernix strain K1 | 2,202 | G_32288:NC_000854.2:1218712-1220913:minus | 32288 |
Pyrobaculum neutrophilum strain V24Sta | 2,197 | G_40848:NC_010525.1:690419-692615:plus | 40848 |
Ca. Mancarchaeum acidiphilum strain Mia14 | 2,008 | G_1145431:NZ_CP019964.1:751297-753304:minus | 1145431 |
Ca. Micrarchaeum sp. A_DKE | 2,003 | G_9220081:NZ_CP060530.1:203642-205644:minus | 9220081 |
Caldivirga maquilingensis strain IC-167 | 1,679 | G_39388:NC_009954.1:129150-130828:minus | 39388 |
Aeropyrum camini strain SY1 | 1,650 | G_127981:NC_022521.1:1165168-1166817:minus | 127981 |
Pyrolobus fumarii strain 1A | 1,576 | G_304318:NC_015931.1:84671-86246:minus | 304318 |
Organism | Gene length (bp) | RiboGrove Sequence ID(s) | Assembly ID |
---|---|---|---|
Bacteria | |||
Hirschia baltica strain ATCC 49814 | 1,448 | G_44428:NC_012982.1:2336679-2338126:minus | 44428 |
Sagittula sp. P11 | 1,449 | G_1460951:NZ_CP021913.1:2386837-2388285:plus G_1460951:NZ_CP021913.1:3597920-3599368:plus |
1460951 |
Hyphomonas sp. Mor2 | 1,451 | G_860061:NZ_CP017718.1:2304269-2305719:minus | 860061 |
Antarctobacter heliothermus strain SMS3 | 1,453 | G_1163161:NZ_CP022540.1:1369380-1370832:plus G_1163161:NZ_CP022540.1:2482480-2483932:plus |
1163161 |
Mameliella alba strain KU6B | 1,454 | G_6279751:NZ_AP022337.1:267139-268592:plus G_6279751:NZ_AP022337.1:1420942-1422395:plus G_6279751:NZ_AP022337.1:3191208-3192661:minus |
6279751 |
Hyphomonas neptunium strain ATCC 15444 | 1,455 | G_34128:NC_008358.1:2818466-2819920:minus | 34128 |
Hyphomonas sp. KY3 | 1,455 | G_9503471:NZ_CP022271.1:2407999-2409453:minus | 9503471 |
Pseudooceanicola algae strain Lw-13e | 1,458 | G_8694041:NZ_CP060436.1:2482207-2483664:minus | 8694041 |
Ruegeria sp. SCSIO 43209 | 1,458 | G_10854641:NZ_CP065359.1:3157837-3159294:minus | 10854641 |
Paracoccus contaminans strain LMG 29738T | 1,459 | G_1078381:NZ_CP020612.1:582021-583479:minus G_1078381:NZ_CP020612.1:1166317-1167775:minus |
1078381 |
Sulfitobacter mediterraneus strain SC1-11 | 1,459 | G_9217271:NZ_CP069004.1:3093411-3094869:plus | 9217271 |
Pelagovum pacificum strain SM1903 | 1,459 | G_8872011:NZ_CP065915.1:2729819-2731277:minus G_8872011:NZ_CP065915.1:3593071-3594529:minus |
8872011 |
Sulfitobacter pontiacus strain W028 | 1,459 | G_13748391:NZ_CP081118.1:282264-284:minus | 13748391 |
Sulfitobacter sp. B30-2 | 1,459 | G_8738751:NZ_CP065429.1:477373-478831:plus | 8738751 |
Archaea | |||
Ignicoccus hospitalis strain KIN4/I | 1,439 | G_39048:NC_009776.1:728362-729800:plus | 39048 |
Methanocaldococcus sp. SG7 | 1,457 | G_10131521:NZ_LR792632.1:542755-544211:plus | 10131521 |
Halorubrum sp. BOL3-1 | 1,463 | G_2220501:NZ_CP034692.1:397753-399215:minus | 2220501 |
Natronomonas halophila strain C90 | 1,466 | G_7330651:NZ_CP058334.1:1530622-1532087:minus | 7330651 |
Ca. Methanomethylophilus alvus strain MGYG-HGUT-02456 |
1,466 | G_4352521:NZ_LR699000.1:283607-285072:plus | 4352521 |
Natronomonas sp. ZY43 | 1,466 | G_13300761:NZ_CP101154.1:18680-20145:plus | 13300761 |
Methanospirillum hungatei strain JF-1 | 1,466 | G_34548:NC_007796.1:39814-41279:plus G_34548:NC_007796.1:1301079-1302544:minus G_34548:NC_007796.1:3501525-3502990:minus G_34548:NC_007796.1:3507609-3509074:minus |
34548 |
Natronomonas gomsonensis strain KCTC 4088 |
1,466 | G_13300951:NZ_CP101323.1:2500564-2502029:plus | 13300951 |
Ca. Methanomethylophilus alvus strain Mx-05 |
1,466 | G_2068141:NZ_CP017686.1:283608-285073:plus | 2068141 |
Methanospirillum hungatei strain GP1 | 1,466 | G_10519241:NZ_CP077107.1:4649-6114:plus G_10519241:NZ_CP077107.1:1359562-1361027:minus G_10519241:NZ_CP077107.1:1365502-1366967:minus G_10519241:NZ_CP077107.1:1986020-1987485:minus |
10519241 |
Methanospirillum sp. J.3.6.1-F.2.7.3 | 1,466 | G_10123301:NZ_CP075546.1:133354-134819:plus G_10123301:NZ_CP075546.1:825954-827419:plus G_10123301:NZ_CP075546.1:872641-874106:plus G_10123301:NZ_CP075546.1:1727419-1728884:plus |
10123301 |
Ca. Methanomethylophilus alvus strain Mx1201 |
1,466 | G_599268:NC_020913.1:283607-285072:plus | 599268 |
Salinirubellus salinus strain ZS-35-S2 | 1,466 | G_13813051:NZ_CP104003.1:3070232-3071697:plus | 13813051 |
Organism | Copy number | Assembly ID | |
---|---|---|---|
Bacteria | |||
Tumebacillus avium strain AR23208 | 37 | 1115491 | |
Tumebacillus algifaecis strain THMBR28 | 27 | 1166771 | |
Peribacillus asahii strain KF4 | 21 | 13022701 | |
Priestia megaterium strain S2 | 21 | 6720751 | |
Neobacillus drentensis strain JC05 | 20 | 11802511 | |
Moritella sp. 28 | 20 | 9972251 | |
Moritella sp. 5 | 20 | 9972261 | |
Moritella sp. 36 | 20 | 9972241 | |
Photobacterium damselae strain AS-15-3942-7 |
19 | 11907491 | |
Metabacillus litoralis strain Bac94 | 19 | 2023811 | |
Archaea | |||
Methanococcoides orientis strain LMO-1 | 5 | 11622961 | |
Natrinema sp. SYSU A 869 | 5 | 10842511 | |
Natronorubrum bangense strain JCM 10635 | 5 | 2580821 | |
Natronorubrum aibiense strain 7-3 | 5 | 5073821 | |
Methanoplanus endosymbiosus strain DSM 3599 | 5 | 13492921 | |
Methanosphaera stadtmanae strain DSM 3091 | 4 | 33648 | |
Natronococcus occultus strain SP4 | 4 | 521038 | |
Methanospirillum sp. J.3.6.1-F.2.7.3 | 4 | 10123301 | |
Methanospirillum hungatei strain GP1 | 4 | 10519241 | |
Halosiccatus urmianus strain IBRC-M: 10911 | 4 | 11057071 | |
Halomicrobium salinisoli strain LT50 | 4 | 11151361 | |
Haloterrigena salifodinae strain BOL5-1 | 4 | 9298621 | |
Halomicrobium salinisoli strain TH30 | 4 | 11151391 | |
Methanococcus vannielii strain SB | 4 | 38268 | |
Methanospirillum hungatei strain JF-1 | 4 | 34548 | |
Haloarcula sinaiiensis strain ATCC 33800 | 4 | 9962651 | |
Methanosphaera stadtmanae strain MGYG-HGUT-02164 |
4 | 4349641 |
Organism | Sum of entropy * (bits) | Mean entropy * (bits) | Number of variable positions | Gene copy number | Assembly ID |
---|---|---|---|---|---|
Bacteria | |||||
Synechococcus sp. NB0720_010 | 243.35 | 0.16 | 265 | 3 | 12576831 |
Xanthomonas oryzae strain YNCX | 227.74 | 0.15 | 248 | 3 | 13407211 |
Sporomusa termitida strain DSM 4440 | 226.25 | 0.13 | 247 | 12 | 4155511 |
Campylobacter hyointestinalis strain CHY5 |
217.64 | 0.12 | 237 | 3 | 7294871 |
Campylobacter sp. RM6137 | 211.21 | 0.12 | 230 | 3 | 1101781 |
Acetivibrio thermocellus strain M3 | 211.00 | 0.14 | 211 | 2 | 13802461 |
Sinorhizobium meliloti strain AK76 | 184.58 | 0.12 | 201 | 3 | 9010851 |
Cylindrospermopsis raciborskii strain KLL07 |
168.97 | 0.11 | 184 | 3 | 11851031 |
Klebsiella pneumoniae strain GZ-1 | 167.21 | 0.10 | 216 | 5 | 8227731 |
Olleya sp. Bg11-27 | 145.25 | 0.10 | 156 | 3 | 1469691 |
Archaea | |||||
Halomicrobium sp. ZPS1 ** | 137.00 | 0.09 | 137 | 2 | 4982121 |
Halosiccatus urmianus strain IBRC-M: 10911 |
131.55 | 0.09 | 146 | 4 | 11057071 |
Halapricum desulfuricans strain HSR12-2 |
128.00 | 0.09 | 128 | 2 | 9390741 |
Halomicrobium salinisoli strain TH30 | 127.74 | 0.09 | 145 | 4 | 11151391 |
Halapricum desulfuricans strain HSR-Bgl |
127.00 | 0.09 | 127 | 2 | 9390521 |
Halomicrobium mukohataei strain JP60 | 125.81 | 0.09 | 137 | 3 | 2582391 |
Halomicrobium salinisoli strain LT50 | 123.31 | 0.08 | 140 | 4 | 11151361 |
Halapricum desulfuricans strain HSR-Est |
111.00 | 0.08 | 111 | 2 | 9390681 |
Halapricum desulfuricans strain HSR12-1 | 109.00 | 0.07 | 109 | 2 | 9390731 |
Halorussus sp. XZYJT49 | 105.10 | 0.07 | 113 | 3 | 12653301 |
* Entropy is Shannon entropy calculated for each column of the multiple sequence alignment (MSA) of all full-length 16S rRNA genes of a genome. Entropy is then summed up (column “Sum of entropy”) and averaged (column “Mean entropy”).
** Halomicrobium sp. ZPS1 is a quite remarkable case. This genome harbours two 16S rRNA genes, therefore entropy is equal to the number of mismatching nucleotides between sequences of the genes. Respectively, per cent of identity between these two gene sequences is 90.70%! This is remarkable because the usual (however arbitrary) genus demarcation threshold of per cent of identity is 95%.
Phylum | Number of genomes |
Full gene | V1–V2 | V1–V3 | V3–V4 | V3–V5 | V4–V5 | V4–V6 | V5–V6 | V5–V7 | V6–V7 | V6–V8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
27F–1492R (%) |
27F–338R (%) |
27F–534R (%) |
341F–785R (%) |
341F–944R (%) |
515F–944R (%) |
515F–1100R (%) |
784F–1100R (%) |
784F–1193R (%) |
939F–1193R (%) |
939F–1378R (%) |
||
Proteobacteria | 17,538 | 99.74 | 99.53 | 99.73 | 99.95 | 82.30 | 82.34 | 90.47 | 90.16 | 93.59 | 92.59 | 96.94 |
Firmicutes | 6,721 | 99.97 | 99.85 | 99.96 | 99.96 | 95.73 | 95.63 | 99.51 | 97.87 | 97.17 | 98.54 | 99.27 |
Actinobacteria | 2,840 | 99.79 | 98.80 | 99.58 | 94.05 | 63.35 | 63.13 | 96.13 | 99.61 | 99.72 | 99.75 | 97.01 |
Bacteroidota | 1,161 | 95.09 | 94.57 | 95.00 | 99.91 | 61.15 | 60.72 | 38.67 | 38.93 | 94.83 | 92.42 | 94.49 |
Tenericutes | 468 | 97.22 | 94.44 | 73.29 | 98.29 | 90.38 | 90.60 | 73.50 | 41.88 | 42.95 | 77.99 | 0.43 |
Spirochaetes | 261 | 65.13 | 65.13 | 65.13 | 93.87 | 100.00 | 100.00 | 100.00 | 72.03 | 72.03 | 88.12 | 50.19 |
Cyanobacteria | 213 | 100.00 | 100.00 | 100.00 | 100.00 | 5.16 | 5.16 | 100.00 | 0.94 | 0.94 | 100.00 | 99.53 |
Chlamydiae | 186 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 93.55 |
Verrucomicrobia | 113 | 99.12 | 0.00 | 99.12 | 100.00 | 8.85 | 8.85 | 100.00 | 0.88 | 0.88 | 100.00 | 100.00 |
Fusobacteria | 80 | 100.00 | 96.25 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 98.75 | 98.75 | 100.00 | 0.00 |
Deinococcus-Thermus | 74 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 50.00 | 100.00 |
Planctomycetota | 57 | 100.00 | 19.30 | 100.00 | 100.00 | 64.91 | 64.91 | 0.00 | 0.00 | 0.00 | 3.51 | 0.00 |
Thermotogae | 42 | 100.00 | 97.62 | 100.00 | 100.00 | 9.52 | 9.52 | 100.00 | 0.00 | 0.00 | 59.52 | 97.62 |
Chloroflexi | 41 | 100.00 | 90.24 | 100.00 | 39.02 | 0.00 | 0.00 | 87.80 | 4.88 | 4.88 | 92.68 | 26.83 |
Acidobacteria | 31 | 96.77 | 96.77 | 96.77 | 100.00 | 100.00 | 100.00 | 100.00 | 61.29 | 45.16 | 83.87 | 100.00 |
Aquificae | 14 | 100.00 | 21.43 | 100.00 | 100.00 | 21.43 | 21.43 | 100.00 | 0.00 | 0.00 | 7.14 | 21.43 |
Chlorobi | 14 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 92.86 | 85.71 | 7.14 |
Nitrospirae | 10 | 100.00 | 100.00 | 100.00 | 100.00 | 60.00 | 60.00 | 100.00 | 100.00 | 60.00 | 60.00 | 100.00 |
Thermodesulfobacteria | 7 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Deferribacteres | 6 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Synergistetes | 6 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Ca. Saccharibacteria | 6 | 100.00 | 100.00 | 100.00 | 100.00 | 16.67 | 16.67 | 16.67 | 0.00 | 0.00 | 100.00 | 100.00 |
Elusimicrobia | 4 | 100.00 | 50.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 75.00 | 75.00 | 100.00 | 100.00 |
Gemmatimonadetes | 4 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Kiritimatiellaeota | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Ignavibacteriae | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Fibrobacteres | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Dictyoglomi | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.00 |
Chrysiogenetes | 2 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Balneolaeota | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Ca. Bipolaricaulota | 1 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Caldiserica | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 |
Coprothermobacterota | 1 | 0.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 |
Atribacterota | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 |
Armatimonadetes | 1 | 100.00 | 0.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Calditrichaeota | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Ca. Omnitrophica | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Ca. Cloacimonetes | 1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Ca. Absconditabacteria | 1 | 100.00 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 |
* Coverage of a primer pair is the per cent of genomes having at least one 16S rRNA gene which can be amplified by PCR using this primer pair. For details, see our paper about RiboGrove.
Primer name | Sequence | Reference |
---|---|---|
27F | AGAGTTTGATYMTGGCTCAG | Frank et al., 2008 |
338R | GCTGCCTCCCGTAGGAGT | Suzuki et al., 1996 |
341F * | CCTACGGGNGGCWGCAG | Klindworth et al., 2013 |
515F | GTGCCAGCMGCCGCGGTAA | Turner et al., 1999 |
534R | ATTACCGCGGCTGCTGG | Walker et al., 2015 |
784F | AGGATTAGATACCCTGGTA | Andersson et al., 2008 |
785R * | GACTACHVGGGTATCTAATCC | Klindworth et al., 2013 |
939F | GAATTGACGGGGGCCCGCACAAG | Lebuhn et al., 2014 |
944R | GAATTAAACCACATGCTC | Fuks et al., 2018 |
1100R | AGGGTTGCGCTCGTTG | Turner et al., 1999 |
1193R | ACGTCATCCCCACCTTCC | Bodenhausen et al, 2013 |
1378R | CGGTGTGTACAAGGCCCGGGAACG | Lebuhn et al., 2014 |
1492R | TACCTTGTTACGACTT | Frank et al., 2008 |
* Primers 341F and 785R are used in the protocol for library preparation for sequencing of V3–V4 region of 16S rRNA genes on Illumina MiSeq.
RiboGrove is a very minimalistic database — it comprises a collection of plain fasta files with metadata. Thus, extended search instruments are not available for it. We admit this problem and provide a list of suggestions below. The suggestions would help you to explore and select RiboGrove data.
RiboGrove fasta data has the following format of header:
>G_324861:NZ_CP009686.1:8908-10459:plus ;d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__cereus; category:1
Major blocks of a header are separated by spaces. A header consists of three such blocks:
You can select specific sequences from fasta files using the Seqkit program (GitHub repo, documentation). It is free, cross-platform, multifunctional and pretty fast and can process both gzipped and uncompressed fasta files. Programs seqkit grep and seqkit seq are useful for sequence selection.
Given the downloaded fasta file ribogrove_9.215_sequences.fasta.gz, consider the following examples of sequence selection using seqkit grep:
Example 1. Select a single sequence by SeqID.
seqkit grep -p "G_324861:NZ_CP009686.1:8908-10459:plus" ribogrove_9.215_sequences.fasta.gz
The -p option sets a pattern to search in fasta headers (only in sequence IDs, actually).
Example 2. Select all gene sequences of a single RefSeq genomic sequence by accession number NZ_CP009686.1.
seqkit grep -nrp ":NZ_CP009686.1:" ribogrove_9.215_sequences.fasta.gz
Here, two more options are required: -n and -r. The former tells the program to match the whole headers instead of IDs only. The latter tells the program not to exclude partial matches from output, i.e. if the pattern is a substring of a header, the header will be printed to output.
To ensure search specificity, surround the Accession.Version with colons (:).
Example 3. Select all gene sequences of a single genome (Assembly ID 10577151).
seqkit grep -nrp "G_10577151:" ribogrove_9.215_sequences.fasta.gz
To ensure search specificity, Assembly ID should be preceded by prefix G_ and followed by a colon (:).
Example 4. Select all actinobacterial sequences.
seqkit grep -nrp ";p__Actinobacteria;" ribogrove_9.215_sequences.fasta.gz
To ensure search specificity, surround the taxonomy name with semicolons (;).
Example 5. Select all sequences originating from category 1 genomes.
seqkit grep -nrp "category:1" ribogrove_9.215_sequences.fasta.gz
Example 6. Select all sequences except for those belonging to Firmicutes.
seqkit grep -nvrp ";p__Firmicutes;" ribogrove_9.215_sequences.fasta.gz
Recognize the -v option within the option sequence -nvrp. This option inverts match, i.e. output will comprise sequences, headers of which do not contain thesubstring “;p__Firmicutes;”.
You can use the seqkit seq program to select sequences by length.
Example 1. Select all sequences longer than 1600 bp.
seqkit seq -m 1601 ribogrove_9.215_sequences.fasta.gz
The -m option sets the minimum length of a sequence to be printed to output.
Example 2. Select all sequences shorter than 1500 bp.
seqkit seq -M 1499 ribogrove_9.215_sequences.fasta.gz
The -M option sets the maximum length of a sequence to be printed to output.
Example 3. Select all sequences having length in range [1500, 1600] bp.
seqkit seq -m 1500 -M 1600 ribogrove_9.215_sequences.fasta.gz
It is sometimes useful to retrieve only header information from a fasta file. You can use the seqkit seq program for it.
Example 1. Select all headers.
seqkit seq -n ribogrove_9.215_sequences.fasta.gz
The -n option tells the program to output only headers.
Example 2. Select all SeqIDs (header parts before the first space).
seqkit seq -ni ribogrove_9.215_sequences.fasta.gz
The -i option tells the program to output only sequence IDs.
Example 3. Select all “Assession.Version”s.
seqkit seq -ni ribogrove_9.215_sequences.fasta.gz | cut -f2 -d':' | sort | uniq
This might be done only if you have cut, sort and uniq utilities installed (Linux and Mac OS systems should have them built-in).
Example 4. Select all Assembly IDs.
seqkit seq -ni ribogrove_9.215_sequences.fasta.gz | cut -f1 -d':' | sed 's/G_//' | sort | uniq
This might be done only if you have cut, sed, sort and uniq utilities installed (Linux and Mac OS systems should have them built-in).
Example 5. Select all phylum names.
seqkit seq -n ribogrove_9.215_sequences.fasta.gz | grep -Eo ';p__[^;]+' | sed -E 's/;|p__//g' | sort | uniq
This might be done only if you have grep, sed, sort and uniq utilities installed (Linux and Mac OS systems should have them built-in).
RiboGrove, 2023-05-19