SCATA - Sequence Clustering and Analysis of Tagged Amplicons
SCATA provids an analysis framework for the analysis of sequenced tagged
amplicons, typically derived from high throughput sequencing of
microbial communities. It is optimised for target sequences which cannot
readily be aligned across wide phylogenies, e.g. the ITS region. For multiple alignable
target sequences, such as 16S rRNA, we recommend the use of purposebuilt
systems, eg. tools provided by the Ribosomal Database Project.
Please note that the Scata service is offered freely to the non-commercial
scientific community, and as such is run on otherwise unused computer
time. This implies that at times, analyses will take longer (up to several days)
to finish depending on other requirments of other projects for computational resources.
News
-
Service disruption contd.
We are still having some problems with the systems after the server move. Most notably
mail notifications are not working as expected. Please check job status on the
web pages. (2011-06-30)
-
Service disruption
The servers running Scata have been moved to a new and better server room. Unfortunately we
have had some problems in association with this. This may result in longer
running times for analyses the coming days. We are sorry for the inconvenience. (2011-06-29)
-
Wiki with documentation
Scata documentation is growing within the Scata Wiki.
Not all documentation is up to date, but we are working on it.
If you find Scata a useful service, and use it in a way that can be useful for other people, please feel
free to contribute your usage case to the wiki. (2011-06-20)
-
SCATA updated!
We have updated several aspects of SCATA during the last few
months. These changes affects several aspects of data handling under the hood
as well as a number of things affecting you as users. Please se below for some
details of what has changed. The old version of SCATA is no longer available (2011-03-23)
-
Improved dataset upload
We have moved the quality screening and filtering of datasets to the import screen.
This implies that you only have to wait for this process once per dataset, instead of
every time you run the clustering. Due to this, all dataset which have previously
been uploaded have to be uploaded again. Another new feature for the filtering
is the ability to extract long high quality regions from the reads, instead of
basing the filtering on the complete read. This can be good for datasets which are not trimmed
by the Roche software. SCATA is now also able to detect primer sequences
in either end of amplicons in order reverse complement sequences as necessary. We have
also added support for tags in both ends. (2011-03-23)
-
Change of homopolymer handling
Scata now gives the option to collapse
homopolymers over a given length before clustering. This only affects the sequences
in the search and clustering process. Sequences in the report files are not
homopolymer collapsed and thus fully comparable to references in other databases. (2011-03-23)
-
Change of default parameters
Many parameters in parameter sets have got
new default values. Most notably, with the new homopolymer handling, we recommend the
use of gap open and extension penalties to avoid over clustering. (2011-03-23)
Disclaimer
We provide SCATA as a free service to the scientific community. Please
make sure to download your results when done, as we cannot take any long-term
responsibility for data storage (datasets are large and use hard disk real
estate!). We have tested the analysis pipeline throroughly and use it
regularly for our own projects. However, we cannot guarantee that it is error-free;
the final responsibility for ensuring that your results are correct
rest with you.
We make no warranty (expressed, implied, or statutory) regarding any
data stored whithin this service or any results obtained through using it,
including without limitation implied warranties of merchantability,
fitness for use, or fitness for a particular purpose.
Citing SCATA
A paper describing SCATA is now submitted. The current citation
if you use this service is:
Mikael Brandström Durling, Karina E Clemmensen, Jan Stenlid and Björn
Lindahl (2011): SCATA - An efficient bioinformatic pipeline for species identification and quantification after high-throughput sequencing of tagged amplicons (submitted).
This service is sponsored by the
Department of Forest Mycology and Plant Pathology at the Swedish University of Agricultural Sciences.
Please direct any questions regarding the system to Mikael Brandström Durling
using the email address mikael::durling@slu:;se (replacing the colons
with dots).