Home / Release archive / 7.213
The release is based on RefSeq release 213.
The metadata consists of the following files:
The fasta files are compressed with gzip, and the metadata file is a zip archive. To uncompress them, Linux and Mac OS users may use gzip and zip programs, they should be built-in. For Windows users, the free and open-source (de)compression program 7-Zip is available.
You can find all releases in the RiboGrove release archive.
Bacteria | Archaea | Total | |
---|---|---|---|
Number of gene sequences | 146,527 | 759 | 147,286 |
Number of unique gene sequences | 40,497 | 553 | 41,050 |
Number of species | 7,242 | 349 | 7,591 |
Number of genomes | 27,737 | 442 | 28,179 |
Number of genomes of category 1 | 18,357 | 144 | 18,501 |
Number of genomes of category 2 | 9,250 | 298 | 9,548 |
Number of genomes of category 3 | 130 | 0 | 130 |
Bacteria | Archaea | |
---|---|---|
Minimum (bp) | 1,448.00 | 1,439.00 |
25th percentile (bp) * | 1,517.50 | 1,472.00 |
Median (bp) * | 1,532.00 | 1,474.00 |
75th percentile (bp) * | 1,543.00 | 1,488.00 |
Average (bp) * | 1,528.35 | 1,498.54 |
Mode (bp) * | 1,537.00 | 1,472.00 |
Maximum (bp) | 2,438.00 | 3,604.00 |
Standard deviation (bp) * | 25.71 | 143.68 |
* Metrics marked with this sign were calculated with preliminary normalization, i.e. median within-species gene length was used for the summary.
Copy number * | Number of species | Per cent of species (%) |
---|---|---|
1 | 1,059 | 13.95 |
2 | 1,462 | 19.26 |
3 | 1,161 | 15.29 |
4 | 961 | 12.66 |
5 | 610 | 8.04 |
6 | 756 | 9.96 |
7 | 604 | 7.96 |
8 | 373 | 4.91 |
9 | 188 | 2.48 |
10 | 164 | 2.16 |
11 | 85 | 1.12 |
12 | 62 | 0.82 |
13 | 32 | 0.42 |
14 | 45 | 0.59 |
15 | 10 | 0.13 |
16 | 5 | 0.07 |
17 | 4 | 0.05 |
18 | 4 | 0.05 |
20 | 4 | 0.05 |
27 | 1 | 0.01 |
37 | 1 | 0.01 |
* These are median within-species copy numbers.
Organism | Gene length (bp) | RiboGrove Sequence ID(s) | Assembly ID |
---|---|---|---|
Bacteria | |||
Thermus thermophilus strain AA2-2 | 2,438 | NZ_AP024929.1:249100-251537_minus | 10898951 |
Ca. Annandia pinicola strain Ad13-065 | 1,887 | NZ_CP045876.1:290071-291957_minus | 11277031 |
Nitrosophilus labii strain HRV44 | 1,806 | NZ_AP022826.1:1258017-1259822_minus NZ_AP022826.1:1532588-1534393_minus NZ_AP022826.1:1939914-1941719_minus |
8028891 |
Gelria sp. Kuro-4 | 1,788 | NZ_AP024619.1:2016182-2017969_minus | 10731991 |
Thermoanaerobacter pseudethanolicus strain ATCC 33223 | 1,781 | NC_010321.1:2265744-2267524_minus | 40148 |
Thermoanaerobacter brockii strain Ako-1 | 1,781 | NC_014964.1:2252888-2254668_minus | 282748 |
Campylobacter sputorum strain RM3237 | 1,744 | NZ_CP019682.1:607981-609724_plus NZ_CP019682.1:929565-931308_minus NZ_CP019682.1:1501945-1503688_minus |
1153941 |
Campylobacter sputorum strain LMG 7795 | 1,744 | NZ_CP043427.1:609141-610884_plus NZ_CP043427.1:930699-932442_minus NZ_CP043427.1:1503078-1504821_minus |
4499991 |
Campylobacter sputorum strain CCUG 20703 | 1,743 | NZ_CP019683.1:606847-608589_plus NZ_CP019683.1:935163-936905_minus NZ_CP019683.1:1558189-1559931_minus |
1153911 |
Campylobacter sp. RM6137 | 1,742 | NZ_CP018789.1:273370-275111_plus NZ_CP018789.1:1545743-1547484_minus |
1101781 |
Campylobacter hyointestinalis strain CHY5 | 1,742 | NZ_CP053828.1:357136-358877_plus NZ_CP053828.1:1667816-1669557_minus |
7294871 |
Campylobacter sputorum strain RM8705 | 1,742 | NZ_CP019685.1:577810-579551_plus NZ_CP019685.1:891862-893603_minus NZ_CP019685.1:1479764-1481505_minus |
1153931 |
Archaea | |||
Pyrobaculum ferrireducens strain 1860 | 3,604 | NC_016645.1:127214-130817_plus | 351728 |
Pyrobaculum aerophilum strain IM2 | 2,213 | NC_003364.1:1089640-1091852_plus | 28808 |
Pyrobaculum arsenaticum strain DSM 13514 | 2,212 | NC_009376.1:623323-625534_minus | 37488 |
Aeropyrum pernix strain K1 | 2,202 | NC_000854.2:1218712-1220913_minus | 32288 |
Pyrobaculum neutrophilum strain V24Sta | 2,197 | NC_010525.1:690419-692615_plus | 40848 |
Ca. Mancarchaeum acidiphilum strain Mia14 | 2,008 | NZ_CP019964.1:751297-753304_minus | 1145431 |
Ca. Micrarchaeum sp. A_DKE | 2,003 | NZ_CP060530.1:203642-205644_minus | 9220081 |
Caldivirga maquilingensis strain IC-167 | 1,679 | NC_009954.1:129150-130828_minus | 39388 |
Aeropyrum camini strain SY1 | 1,650 | NC_022521.1:1165168-1166817_minus | 127981 |
Pyrolobus fumarii strain 1A | 1,576 | NC_015931.1:84671-86246_minus | 304318 |
Organism | Gene length (bp) | RiboGrove Sequence ID(s) | Assembly ID |
---|---|---|---|
Bacteria | |||
Hirschia baltica strain ATCC 49814 | 1,448 | NC_012982.1:2336679-2338126_minus | 44428 |
Sagittula sp. P11 | 1,449 | NZ_CP021913.1:2386837-2388285_plus NZ_CP021913.1:3597920-3599368_plus |
1460951 |
Hyphomonas sp. Mor2 | 1,451 | NZ_CP017718.1:2304269-2305719_minus | 860061 |
Antarctobacter heliothermus strain SMS3 | 1,453 | NZ_CP022540.1:1369380-1370832_plus NZ_CP022540.1:2482480-2483932_plus |
1163161 |
Mameliella alba strain KU6B | 1,454 | NZ_AP022337.1:267139-268592_plus NZ_AP022337.1:1420942-1422395_plus NZ_AP022337.1:3191208-3192661_minus |
6279751 |
Hyphomonas sp. KY3 | 1,455 | NZ_CP022271.1:2407999-2409453_minus | 9503471 |
Hyphomonas neptunium strain ATCC 15444 | 1,455 | NC_008358.1:2818466-2819920_minus | 34128 |
Ruegeria sp. SCSIO 43209 | 1,458 | NZ_CP065359.1:3157837-3159294_minus | 10854641 |
Pseudooceanicola algae strain Lw-13e | 1,458 | NZ_CP060436.1:2482207-2483664_minus | 8694041 |
Paracoccus contaminans strain LMG 29738T | 1,459 | NZ_CP020612.1:582021-583479_minus NZ_CP020612.1:1166317-1167775_minus |
1078381 |
Sulfitobacter mediterraneus strain SC1-11 | 1,459 | NZ_CP069004.1:3093411-3094869_plus | 9217271 |
Sulfitobacter sp. B30-2 | 1,459 | NZ_CP065429.1:477373-478831_plus | 8738751 |
Pelagovum pacificum strain SM1903 | 1,459 | NZ_CP065915.1:2729819-2731277_minus NZ_CP065915.1:3593071-3594529_minus |
8872011 |
Archaea | |||
Ignicoccus hospitalis strain KIN4/I | 1,439 | NC_009776.1:728362-729800_plus | 39048 |
Methanocaldococcus sp. SG7 | 1,457 | NZ_LR792632.1:542755-544211_plus | 10131521 |
Halorubrum sp. BOL3-1 | 1,463 | NZ_CP034692.1:397753-399215_minus | 2220501 |
Ca. Methanomethylophilus alvus strain Mx-05 | 1,466 | NZ_CP017686.1:283608-285073_plus | 2068141 |
Natronomonas halophila strain C90 | 1,466 | NZ_CP058334.1:1530622-1532087_minus | 7330651 |
Methanospirillum sp. J.3.6.1-F.2.7.3 | 1,466 | NZ_CP075546.1:133354-134819_plus NZ_CP075546.1:825954-827419_plus NZ_CP075546.1:872641-874106_plus NZ_CP075546.1:1727419-1728884_plus |
10123301 |
Methanospirillum hungatei strain GP1 | 1,466 | NZ_CP077107.1:4649-6114_plus NZ_CP077107.1:1359562-1361027_minus NZ_CP077107.1:1365502-1366967_minus NZ_CP077107.1:1986020-1987485_minus |
10519241 |
Methanospirillum hungatei strain JF-1 | 1,466 | NC_007796.1:39814-41279_plus NC_007796.1:1301079-1302544_minus NC_007796.1:3501525-3502990_minus NC_007796.1:3507609-3509074_minus |
34548 |
Ca. Methanomethylophilus alvus strain Mx1201 | 1,466 | NC_020913.1:283607-285072_plus | 599268 |
Ca. Methanomethylophilus alvus strain MGYG-HGUT-02456 | 1,466 | NZ_LR699000.1:283607-285072_plus | 4352521 |
Organism | Copy number | Assembly ID | |
---|---|---|---|
Bacteria | |||
Tumebacillus avium strain AR23208 | 37 | 1115491 | |
Tumebacillus algifaecis strain THMBR28 | 27 | 1166771 | |
Priestia megaterium strain S2 | 21 | 6720751 | |
Peribacillus asahii strain KF4 | 21 | 13022701 | |
Neobacillus drentensis strain JC05 | 20 | 11802511 | |
Moritella sp. 5 | 20 | 9972261 | |
Moritella sp. 28 | 20 | 9972251 | |
Moritella sp. 36 | 20 | 9972241 | |
Metabacillus litoralis strain Bac94 | 19 | 2023811 | |
Photobacterium damselae strain AS-15-3942-7 | 19 | 11907491 | |
Archaea | |||
Natronorubrum aibiense strain 7-3 | 5 | 5073821 | |
Methanococcoides orentis strain LMO-1 | 5 | 11622961 | |
Natrinema sp. SYSU A 869 | 5 | 10842511 | |
Natronorubrum bangense strain JCM 10635 | 5 | 2580821 | |
Halomicrobium salinisoli strain TH30 | 4 | 11151391 | |
Methanosphaera stadtmanae strain DSM 3091 | 4 | 33648 | |
Methanosphaera stadtmanae strain MGYG-HGUT-02164 | 4 | 4349641 | |
Halomicrobium salinisoli strain LT50 | 4 | 11151361 | |
Halosiccatus urmianus strain IBRC-M: 10911 | 4 | 11057071 | |
Natronococcus occultus strain SP4 | 4 | 521038 | |
Methanospirillum hungatei strain GP1 | 4 | 10519241 | |
Methanococcus vannielii strain SB | 4 | 38268 | |
Haloterrigena salifodinae strain BOL5-1 | 4 | 9298621 | |
Haloarcula sinaiiensis strain ATCC 33800 | 4 | 9962651 | |
Methanospirillum hungatei strain JF-1 | 4 | 34548 | |
Methanospirillum sp. J.3.6.1-F.2.7.3 | 4 | 10123301 |
Organism | Sum of entropy * (bits) | Mean entropy * (bits) | Number of variable positions | Gene copy number | Assembly ID |
---|---|---|---|---|---|
Bacteria | |||||
Synechococcus sp. NB0720_10 | 243.35 | 0.16 | 265 | 3 | 12576831 |
Sporomusa termitida strain DSM 4440 | 226.25 | 0.13 | 247 | 12 | 4155511 |
Campylobacter hyointestinalis strain CHY5 | 217.64 | 0.12 | 237 | 3 | 7294871 |
Campylobacter sp. RM6137 | 211.21 | 0.12 | 230 | 3 | 1101781 |
Sinorhizobium meliloti strain AK76 | 184.58 | 0.12 | 201 | 3 | 9010851 |
Cylindrospermopsis raciborskii strain KLL07 | 168.97 | 0.11 | 184 | 3 | 11851031 |
Klebsiella pneumoniae strain GZ-1 | 167.21 | 0.10 | 216 | 5 | 8227731 |
Olleya sp. Bg11-27 | 145.25 | 0.10 | 156 | 3 | 1469691 |
Microbulbifer sp. YPW1 | 136.25 | 0.09 | 145 | 4 | 7292581 |
Selenomonas sp. 136 F0591 | 135.84 | 0.08 | 138 | 4 | 638441 |
Archaea | |||||
Halomicrobium sp. ZPS1 ** | 137.00 | 0.09 | 137 | 2 | 4982121 |
Halosiccatus urmianus strain IBRC-M: 10911 | 131.55 | 0.09 | 146 | 4 | 11057071 |
Halapricum desulfuricans strain HSR12-2 | 128.00 | 0.09 | 128 | 2 | 9390741 |
Halomicrobium salinisoli strain TH30 | 127.74 | 0.09 | 145 | 4 | 11151391 |
Halapricum desulfuricans strain HSR-Bgl | 127.00 | 0.09 | 127 | 2 | 9390521 |
Halomicrobium mukohataei strain JP60 | 125.81 | 0.09 | 137 | 3 | 2582391 |
Halomicrobium salinisoli strain LT50 | 123.31 | 0.08 | 140 | 4 | 11151361 |
Halapricum desulfuricans strain HSR-Est | 111.00 | 0.08 | 111 | 2 | 9390681 |
Halapricum desulfuricans strain HSR12-1 | 109.00 | 0.07 | 109 | 2 | 9390731 |
Halorussus sp. XZYJT49 | 105.10 | 0.07 | 113 | 3 | 12653301 |
* Entropy is Shannon entropy calculated for each column of the multiple sequence alignment (MSA) of all full-length 16S rRNA genes of a genome. Entropy is then summed up (column “Sum of entropy”) and averaged (column “Mean entropy”).
** Halomicrobium sp. ZPS1 is a quite remarkable case. This genome harbours two 16S rRNA genes, therefore entropy is equal to the number of mismatching nucleotides between sequences of the genes. Respectively, per cent of identity between these two gene sequences is 90.70%! This is remarkable because the usual (however arbitrary) genus demarcation threshold of per cent of identity is 95%.
RiboGrove is a very minimalistic database — it comprises a collection of plain fasta files with metadata. Thus, extended search instruments are not available for it. We admit this problem and provide a list of suggestions below. The suggestions would help you to explore and select RiboGrove data.
RiboGrove fasta data has the following format of header:
>NZ_CP079719.1:86193-87742_plus Bacillus_velezensis ;Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus; category:2
Major blocks of a header are separated by spaces. A header consists of four such blocks:
You can select specific sequences from fasta files using the Seqkit program (GitHub repo, documentation). It is free, cross-platform, multifunctional and pretty fast and can process both gzipped and uncompressed fasta files. Programs seqkit grep and seqkit seq are useful for sequence selection.
Given the downloaded fasta file ribogrove_7.213_sequences.fasta.gz, consider the following examples of sequence selection using seqkit grep:
Example 1. Select a single sequence by SeqID.
seqkit grep -p "NZ_CP079719.1:86193-87742_plus" ribogrove_7.213_sequences.fasta.gz
The -p option sets a pattern to search in fasta headers (only in sequence IDs, actually).
Example 2. Select all gene sequences of a single RefSeq genomic sequence by accession number.
seqkit grep -nrp "NZ_CP079719.1" ribogrove_7.213_sequences.fasta.gz
Here, two more options are required: -n and -r. The former tells the program to match the whole headers instead of IDs only. The latter tells the program not to exclude partial matches from output, i.e. if the pattern is a substring of a header, the header will be printed to output.
Example 3. Select all gene sequences of a single genome (Assembly ID 10577151), which has two replicons: NZ_CP079110.1 and NZ_CP079111.1.
seqkit grep -nr -p "NZ_CP079110.1" -p "NZ_CP079111.1" ribogrove_7.213_sequences.fasta.gz
Example 4. Select all actinobacterial sequences.
seqkit grep -nrp ";Actinobacteria;" ribogrove_7.213_sequences.fasta.gz
Just in case, surround the taxonomy name with semicolons (;).
Example 5. Select all sequences originating from category 1 genomes.
seqkit grep -nrp "category:1" ribogrove_7.213_sequences.fasta.gz
Example 6. Select all sequences except for those belonging to Firmicutes.
seqkit grep -nvrp ";Firmicutes;" ribogrove_7.213_sequences.fasta.gz
Recognize the -v option within the option sequence -nvrp. This option inverts match, i.e. without it the search would result in sequences belonging to Firmicutes only.
You can use the seqkit seq program to select sequences by length.
Example 1. Select all sequences longer than 1600 bp.
seqkit seq -m 1601 ribogrove_7.213_sequences.fasta.gz
The -m option sets the minimum length of a sequence to be printed to output.
Example 2. Select all sequences shorter than 1500 bp.
seqkit seq -M 1499 ribogrove_7.213_sequences.fasta.gz
The -M option sets the maximum length of a sequence to be printed to output.
Example 3. Select all sequences having length in range [1500, 1600] bp.
seqkit seq -m 1500 -M 1600 ribogrove_7.213_sequences.fasta.gz
It is sometimes useful to retrieve only header information from a fasta file. You can use the seqkit seq program for it.
Example 1. Select all headers.
seqkit seq -n ribogrove_7.213_sequences.fasta.gz
The -n option tells the program to output only headers.
Example 2. Select all SeqIDs (header parts before the first space).
seqkit seq -ni ribogrove_7.213_sequences.fasta.gz
The -i option tells the program to output only sequence IDs.
Example 3. Select all accession numbers.
seqkit seq -ni ribogrove_7.213_sequences.fasta.gz | cut -f1 -d':' | sort | uniq
This might be done only if you have sort, cut and uniq utilities installed (Linux and Mac OS systems should have them built-in).
RiboGrove, 2023-05-19