Skip Header

 

Sequence similarity searches

Enter an identifier or paste a protein or nucleotide sequence into the sequence similarity search form. This form can be accessed from all pages via the Blast tab).

In addition to FASTA sequences, the following kinds of identifiers are supported:

P00750 UniProtKB entry
P00750:2 Specific version of a UniProtKB entry
A4_HUMAN UniProtKB entry name
P00750-2 UniProtKB isoform
UPI0000000001 UniParc entry
UniRef100_P00750 UniRef entry

If you open the sequence similarity search form in a UniProtKB, UniRef or UniParc entry page, the current sequence is prefilled in the form.

Options

Database Database against which the search is performed: UniProtKB or clusters of sequences with 100%, 90% or 50% identity.
Threshold The expectation value (E) threshold is a statistical measure of the number of expected matches in a random database. The lower the e-value, the more likely the match is to be significant. E-values between 0.1 and 10 are generally dubious, and over 10 are unlikely to have biological significance. In all cases, those matches need to be verified manually. You may need to increase the E threshold if you have a very short query sequence, to detect very weak similarities, or similarities in a short region, or if your sequence has a low complexity region and you use the "filter" option
Matrix The matrix assigns a probability score for each position in an alignment. The BLOSUM matrix assigns a probability score for each position in an alignment that is based on the frequency with which that substitution is known to occur among consensus blocks within related proteins. BLOSUM62 is among the best of the available matrices for detecting weak protein similarities. The PAM set of matrices is also available. If "Auto" is set, the matrix will be selected depending on the query sequence length.
Filtering Low-complexity regions (e.g. stretches of cysteine in Q03751, or hydrophobic regions in membrane proteins) tend to produce spurious, insignificant matches with sequences in the database which have the same kind of low-complexity regions, but are unrelated biologically. If "Filter low complexity regions" is selected, the query sequence will be run through the program SEG, and all amino acids in low-complexity regions will be replaced by X's.
Gapped This will allow gaps to be introduced in the sequences when the comparison is done.
Hits Limits the number of returned alignments.

Related Services

BLAST (PIR)

Sequence similarity search

BLAST (ExPASy)

Sequence similarity search

BLAST (EBI)

Sequence similarity search

FASTA (PIR)

Sequence similarity search

FASTA (EBI)

Sequence similarity search

MPsrch (EBI)

Sequence similarity search using the Smith-Waterman algorithm

ScanPS (EBI)

Sequence similarity search using the Smith-Waterman algorithm

iProClass (PIR)

Query a sequence against the iProClass database, which contains both sequences and protein families

InterProScan (EBI)

Query a sequence against the InterPro database, which contains protein families, domains, and motifs

ScanProSite (ExPASy)

Scan a protein sequence for patterns, or use a pattern to scan protein sequences

PatMatch (PIR)

Scan a protein sequence for patterns, use a pattern to scan protein sequences, or find exact matches to a peptide

PIRHMM (PIR)

Query a sequence against motif or domain HMMs, or build your own HMM online and query a sequence database