SCATA - Sequence Clustering and Analysis of Tagged Amplicons

SCATA provids an analysis framework for the analysis of sequenced tagged amplicons, typically derived from high throughput sequencing of microbial communities. It is optimised for target sequences which cannot readily be aligned across wide phylogenies, e.g. the ITS region. For multiple alignable target sequences, such as 16S rRNA, we recommend the use of pipelines optimised for such data.

Please note that the Scata service is offered freely to the non-commercial scientific community, and as such is run on otherwise unused computer time. This implies that at times, analyses will take longer (up to several days) to finish depending on other requirments of other projects for computational resources.

News

Major update of scata capabilites! Today we launched a major revision of the scata system with support for recent sequencing advances. This includes support for reading files from both IonTorrent and MiSeq systems. More specifically the following things have been added/updated:
- New file format support - Scata now supports uploading of sanger fastq files, in addition to the previously supported formats (fasta, fasta+qual, roche/454 sff). The fastq files can be gzipped, which decreases the upload time.
- Merging of paired MiSeq reads - Scata can now merge paired MiSeq reads by aligning the reads and checking for overlap.
- Amplicon quality filtering - New filter, which checks the quality of the amplicon once the primers have been identified. Useful for sequencing methods where the quality drops towards the end, and where low quality bases can occur within the sequence.
- Separate upload and dataset verification - Datasets should now be uploaded under the File tab, and can then be checked and imported as dataset. This makes it easier to try different quality check parameters without having to send the whole file again.
- Dataset check speedup - Checking of datasets is now 5-10 times faster than before!
- Full support for ambiguity bases in primer sequences - Scata now has full support for the IUPAC ambiguity codes in the primer sequence. Thus, you should probably leave the primer score at 0.9 or similar even in cases where there are ambiguity bases in the primer sequence when importing new datasets.
- Fixed 3´ tag support - There was a bug in the 3´ primer matching code which is now fixed.
- Bug fixes - There are several other minor bugfixes and cleanups to the code which improves stability, speed and ease of use.
(2014-10-28)
Wiki with documentation
Scata documentation is growing within the Scata Wiki. Not all documentation is up to date, but we are working on it. If you find Scata a useful service, and use it in a way that can be useful for other people, please feel free to contribute your usage case to the wiki. (2011-06-20)

Disclaimer

We provide SCATA as a free service to the scientific community. Please make sure to download your results when done, as we cannot take any long-term responsibility for data storage (datasets are large and use hard disk real estate!). We have tested the analysis pipeline throroughly and use it regularly for our own projects. However, we cannot guarantee that it is error-free; the final responsibility for ensuring that your results are correct rest with you.

We make no warranty (expressed, implied, or statutory) regarding any data stored whithin this service or any results obtained through using it, including without limitation implied warranties of merchantability, fitness for use, or fitness for a particular purpose.

Citing SCATA

A paper describing SCATA is now submitted. The current citation if you use this service is:

Mikael Brandström Durling, Karina E Clemmensen, Jan Stenlid and Björn
Lindahl (2011): SCATA - An efficient bioinformatic pipeline for species identification and quantification after high-throughput sequencing of tagged amplicons (submitted).

This service is sponsored by the Department of Forest Mycology and Plant Pathology at the Swedish University of Agricultural Sciences. Please direct any questions regarding the system to Mikael Brandström Durling using the email address mikael::durling@slu:;se (replacing the colons with dots).