Main Database Files

File Description
KSGP_v1.0.fasta Version 1 of KSGP database. Contains cleaned GTDB SSU sequences; and Eukaryote sequences from PR2 with their original annotations combined with reannotated SSU rRNA sequences from both Karst et al and Archaea 16S sequences from SILVA. Please note that the taxonomic hierarchy used by PR2 is not compatible with that used by SILVA or NCBI.
KSGP_v1.0.tax LotuS2 tax file for version 1.0 of KSGP database
KSGP_v1.0.tar.gz Complete KSGP v1.0 database

Auxilliary Files

File Description
GTDB_plus_v1.0.fasta Version 1 of GTDB+ database (cleaned GTDB plus eukaryotes from PR2). Please note that the taxonomic hierarchy used by PR2 is not compatible with that used by SILVA or NCBI.
GTDB_plus_v1.0.tax LotuS2 tax file for version 1.0 of GTDB+ database
GTDB_cleaned_v1.0.fasta Cleaned and deduplicated GTDB Fasta File with domain level misassignments removed – should be combined with a database of eukaryote 18S sequences, such as PR2
GTB_cleaned_v1.0.tax Corresponding LotuS2 Tax file
GTDB_214_SSU_sequences_removed_as_wrong_domain.csv SSU sequences in GTDB assigned to a different domain by RDP Classify and removed. The file includes both the original GTDB classification and RDP Classify annotation.

Data sources

GTDB+ contains sequences and taxonomic assignments from Version 214.0 of GTDB and version 5.0 or PR2. KSPG also contains sequences from SILVA SSURef NR99, version 138.1 and ENA accession GCA_900214305.