Database Searching
Contents
It is possible to run this program on remote machines. The obvious choice for such a remote machine would be one that has access to the latest sequence information. Both EMBL and DDBJ have permitted this type of access and have implemented FASTA type searches through their machines (NCBI prefers to use BLAST - see below).
There are several flavours to FASTA: fasta scans a protein or DNA sequence library for sequences similar to a query sequence. tfasta compares a protein query sequence to the DNA sequence library, translating the DNA sequence on the fly. lfasta compares two query sequences for local similarity between them and shows the local sequence alignments. plfasta compares two sequences for local similarity and plots the local sequence alignments.
I will illustrate what a FASTA type of search is and what the results look like with an example. Basically the idea is to search through the complete database for any similar sequence.
The remaining options are - LIST n, n top scores listed in the output
[50]. ALIGN n, align the top n to the query sequence [10]. ONE,
compare only the given strand to the database, the default is to use
the complementary strand as well. PROT will force your query sequence
to be a protein (small protein sequences may be otherwise
misinterpreted as DNA). PATH string mails the results back
to string rather than the originator of the message.
After creating this file, mail the file by electronic mail to
LIB SWALL
WORD 1
LIST 50
TITLE HALHA
SEQ
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
DYLQNRVI
The first line contains the data library files to be searched (in
this case all Swiss-Prot and NBRF/PIR entries). It may be EMALL
(all EMBL entries plus those in the latest release), or GENEMBL
(GenBank plus EMBL), or EPRI (EMBL primate entries), etc. The
second line gives the word size or k-tuple value (more on this
below). The third line says to LIST on the output the top 50
scores. The TITLE line is used for the subject of the mail
message. Finally SEQ implies that everything below this line to
the end of the message is part of the sequence. In this case the
sequence is the protein sequence of the ferredoxin gene of
Halobacterium halobium. The other options available for LIB
are...
and the results will be sent back to you by electronic mail.
Alternatively point your web browser to FASTA.
PLEASE NOTE - as a courtesy to others using the system please send only one job at a time. Many other people from all over the world are using this server and the FASTA program is quite computer intensive despite its speed.
An example of the output is shown below. The input file is specifying the Halobacterium halobium ferredoxin amino acid sequence to search the SWISS-PROT database.
The output from the FASTA search begins with some
informational messages. This includes a listing of all the amino acids and
sequences searched (note the size of the former numbers).
Next comes a histogram (lying on its side) of the number of sequences
found with various scores (init1 and initn scores). Each symbol in
this graph is an indicator of two sequences (the = symbol indicates
both init1 and initn scores, the - symbol indicates just init1 and the
+ symbol indicates just initn scores). This histogram gives you an
indication of how similar the query sequence is to some of the database
sequences. For a query sequence that has found a significant match, it
should be well out of the tail of the init1 and initn distributions.
In the above example there are many sequences with init1/initn scores
much higher than the scores obtained from the rest of the sequences and
you will find these are all related ferredoxin sequences from other
species. Their presence at this extreme end of the distribution
indicates that there is much greater and significant similarity between
these sequences and the query sequence than between general sequences
in the database and the query sequence.
Next comes a section that lists the sequences (along with their
locus names) that have the best scores. Finally there is a section
that lists the alignments that have been found by the program.
In comparing a query sequence to the database two scores are
calculated for each and every entry in the database. These scores are
initn and init1. A third score, opt, will be calculated for some of
the top scoring entries.
The first thing that is done is to establish a matrix containing
words (from the sequence) of variable length. The length of these
words (e.g. ATCG or MKR) is set by the WORD or k-tuple value. By
default it is 6 for nucleic acids and 2 for amino acid searches. A
lower k-tuple will give a more sensitive search but will take much
longer. Although a range of 3 to 6 is permitted for nucleic acids a
lower value is generally unnecessary. All places in the sequence are
determined where the k-tuple from both sequences agree perfectly. Then
those regions with the highest density of these identities are
found.
An init1 score is assigned to each of these regions of high
similarity after the regions are extended at the ends to include
regions shorter than the length of a k-tuple and after using a
PAM250 matrix (alternative distance matrices are available - more
on these later) to score mismatches.
Groups of larger regions are attempted to be joined together and an
initn score is generated from these. This is done by setting initn
equal to the sum of the two init1 scores for each region (the final
init1 score of a sequence is the maximum init1 score from all interior
regions). A constant of 20 is then subtracted as a joining penalty.
If the initn score is less than one of the init1 scores it is
discarded, the regions are not joined and the initn score will be equal
to the maximum init1 score (hence initn is greater than or equal to
init1).
Sequences that have an initn score larger than a cutoff value (usually
50 but this can be altered with a "LIST n" command in the query file)
are then used for a Smith-Waterman alignment (see the section on
alignments) and an OPT score is generated from these alignments. Only
the region considered significant by the program is displayed. In
these alignments, the name of the sequence will be presented, the
scores, and the percent similarity over the region aligned. In general
the length of the region aligned is a better indicator of homology than
is the percent similarity. This is because large percentages can be
found in short regions just by chance. A "|" is used to indicate a
complete match, a ":" to indicate a conservative amino acid
replacement, and a "-" to indicate a deletion/insertion.
Note that the opt score can be lower than the initn score. This will
happen when one sequence has two (or more) regions of high similarity
separated by regions that have little/no homology. The two regions are
joined with high init1 scores and the initn score is high because the
gap penalty/join penalty is not sufficiently large. In contrast
sequences with a large number of poorly similar regions will have low
init1 scores but high initn scores and then low opt scores. In general,
unless a very short sequence is used, the init1 score should be much
improved by the opt score for truly significant sequences.
Remember to remove repetitive sequences from your query otherwise you
will get a lot of false hits. The FASTA program itself can be obtained
via anonymous FTP if desired.
With version 2.0 of the FASTA program distribution, FASTA, TFASTA,
and SSEARCH now provide estimates of statistical significance for
library searches. Work by Altschul, Arratia, Karlin, Mott, Waterman,
and others (see
Altschul et al. (1994) Nature Genetics 6:119 for an excellent review)
suggests that local sequence similarity scores follow the extreme value
distribution, so that P(s > x) = 1 - exp(-exp(-lambda(x-u)) where u =
ln(Kmn)/lambda and m,n are the lengths of the query and library
sequence. This formula can be rewritten as: 1 - exp(-Kmn exp(-lambda
x), which shows that the average score for an unrelated library
sequence increases with the logarithm of the length of the library
sequence. FASTA and SSEARCH use simple linear
regression against the the log of the library sequence length to
calculate a normalized "z-score" with mean 50, regardless of library
sequence length, and variance 10. These z-scores can then be used with
the extreme value distribution and the poisson distribution (to account
for the fact that each library sequence comparison is an independent
test) to calculate the number of library sequences to obtain a score
greater than or equal to the score obtained in the search. The original
idea and routines to do the linear regression on library sequence
length were provided Phil Green, U. Washington. This version of FASTA
and SSEARCH uses a slightly different strategy for fitting the data
than those originally provided by Dr. Green.
The expected number of sequences is plotted in the histogram
using an "*". Since the parameters for the extreme value
distribution are not calculated directly from the distribution of
similarity scores, the pattern of "*'s" in the histogram gives a
qualitative view of how well the statistical theory fits the
similarity scores calculated by FASTA and SSEARCH.
For FASTA, if
optimized scores are calculated for each sequence in the database
(-o option), the agreement between the actual distribution of
"z-scores" and the expected distribution based on the length
dependence of the score and the extreme value distribution is
usually very good. Likewise, the distribution of SSEARCH Smith-
Waterman scores typically agrees closely with the actual
distribution of "z-scores." The agreement with unoptimized
scores, ktup=2, is often not very good, with too many high
scoring sequences and too few low scoring sequences compared with
the predicted relationship between sequence length and similarity
score. In those cases, the expectation values may be
overestimates.
The statistical routines assume that the library contains a
large sample of unrelated sequences. If this is not the case,
then the expectation values are meaningless. Likewise, if there
are fewer than 20 sequences in the library, the statistical
calculations are not done.
A complete manual
for FASTA can be consulted for further questions.
While FASTA is a sensitive and rapid algorithm to search for
similar sequences in the database it is not without problems (one of
these being the snail-like pace of the ethernet connection to the FASTA
server at EMBL). Because its initial step looks for perfect matches it
will completely ignore more distantly related sequences that have
functional homology but no longer retain complete identity. If an
amino acid sequence has had many conserved replacements but no longer
has identities then the FASTA algorithm will completely miss this
region. Fortunately, alignments where there are extensive regions of
low but not exact similarity are rare enough that a small WORD or
k-tuple size will pick up most regions.
A different algorithm which improves upon FASTA in speed, if not in
sensitivity, is termed BLAST (Basic Local
Alignment Search Tool). This began with a statistical paper by
Karlin and Altschul (PNAS 87:2264-2268, 1990) who developed a
rigorous method to obtain the probabilities of matches with a query
sequence given that no gaps are permitted. This permits the use of
larger WORD or k-tuple sizes with the concomitant increase in speed but
permitting inexact matches between WORDs. The statistical developments
permit this to be done without loss of sensitivity and permit rigorous
statistical statements to be made about the matches found.
As a result of these developments
Altschul,
Gish, Miller,
Myers and Lipman (J.Mol.Biol. 215:403-415, 1990) created the BLAST
group of programs. These algorithms find ungapped, locally optimal
sequence alignments. There are several versions of the BLAST
programs. These are ...
To carry out this type of search on the NCBI server set up a
file with the following
Only the input options shown above are mandatory, all others are
optional. The input options are PROGRAM blastn, blastp, blastx,
tblastx, blast3, or tblastn. DATALIB has several options like the LIB
option of a FASTA search - the recommended option is "nr" which stands
for non-redundant (it includes sequences from PDB, GenBank, GenBank
updates, EMBL and EMBL updates or sequences from PDB, SWISS-PROT, PIR,
GenPept and GenPept updates) but there are many others.
HISTOGRAM [yes]/no will turn on/off the printing of a histogram of the
scores. DESCRIPTIONS n, number of described matching sequences [100].
ALIGNMENTS n, number of high scoring pairs [50]. EXPECT n, the score
such that n sequences should be found by chance alone [10] (a
fractional value of one or less will give only output which is
statistically unusual, larger values give more output). CUTOFF n,
cutoff score for segment pairs - high scoring pairs are reported in the
output only if one of them scores at least as high as the cutoff
[calculated from and dependant on the value of EXPECT]. MATRIX
determines the scoring matrix for protein comparisons [BLOSUM62].
STRAND restricts a search to the top or bottom strand (top, bottom,
[both]). FILTER will mask parts of your query so that things like
repetitive elements are ignored (FILTER seq - will exclude regions of
low compositional complexity, FILTER xnu - will exclude regions with
short-periodicity internal repeats, [exclude nothing]). The other
options available are QOFFSET, GCODE, PATH, SPLIT, and ACKNOWLEDGE.
More information about the programs and their output can be obtained
from their man(ual)
pages. More information about the algorithm
is also available. BLAST BLASTN and BLASTP run on faster queues than
do BLASTX, TBLASTX, BLAST3 and TBLASTN. The program actually scans entries
twice. Once to find the highest scoring pairs and
then again with a lower CUTOFF to find potential combinations of high
scores that together might do better than a single high score.
The BLAST programs themselves can be obtained if desired by anonymous
FTP to ncbi.nlm.nih.gov.
After some informational material comes ...
Again the BLAST output begins with a histogram though in this case it
is rather superfluous. Following FASTA format the program then reports
a short listing of the highest scores and then a more detailed listing
of the matches. The latter includes the sequence name (along with an
accession number and the source database). Then the raw score (along
with an estimate of the amount of information in the query sequence),
the expected number of equal or better matches, a "Sum" probability,
the number of identical matches and the number of positively scoring
matches. The "Sum" probability is calculated when there are two or
more blocks of high scoring identity per query sequence (remember BLAST
is not permitting gaps in its results). The "Sum" P(n)-value is the
probability that at least n or more such blocks would be found by
chance within the query sequence and that each block would have a score
at least as good as the poorest score. (This is an approximate
statistic since incompatible blocks may be counted as independent).
The BLAST algorithm is capable of speeding through the entire amino
acid database within 13 seconds for this query. Quite an improvement
over FASTA but then it is not doing as much. BLAST is probably not as
sensitive for non-coding nucleotide sequences due to the high
probability of small insertions / deletions (indels) that will occur in
such data.
The BLITZ server uses the Smith-Waterman local similarity algorithm
(see the section on alignments) to compare the query sequence versus
the Swiss-Prot database (they plan to make more databases available in
the near future). The advantage of this algorithm, termed MPsrch, is
mainly that it is running on a MasPar MP-1 computer (a "massively"
parallel computer) with 4096 processors. Because of the use of a
parallel computer, "MPsrch is the fastest implementation of the SW
algorithm currently available on any machine". The implementation is
due to S.S.Sturrock and J.F.Collins (1993, MPsrch version 1.3,
Biocomputing Research Unit, University of Edinburgh, UK (remember to
quote the authors of any search algorithm you use)).
The input format for a BLITZ search is simply
and only SEQ is mandatory, even the TITLE is optional. The other
options are PAM n (sets the PAM n scoring matrix for comparison of
proteins [120]), INDEL n (sets the penalty for indels and gaps
[default is dependant on the PAM matrix chosen, 13 for PAM120]),
ALIGN n (number of best alignments presented [30]), NAMES n (number
of scores to report). Mail the file to
Blitz@ebi.ac.uk
and the results are mailed back to you.
The output generated by this mail message is ...
This particular search took only 34 seconds, again a great
improvement over the FASTA approach. While not as fast as BLAST, it
should (so they say) give a more sensitive search for distant
homologies. The mean and variance of the distribution of scores from
the entire database are calculated. These are used to construct
empirical statistics of the predicted number of random matches in the
database equal to or better than that found. The algorithm then lists
the best scores (50 of them here, the default for NAMES) and then lists
more detailed reports for a subclass of these (30 here, the default for
ALIGN). For each it calculates the raw score, the percent matches, the
predicted number expected, the number of matches, the number of
mismatches, the number of partial matches (residue pairs with a
positive score in the PAM matrix), the number of indels and the number
of gaps. This program considers these two differently in that a single
gap can be composed of any number of adjacent indels.
In this case all 50 hits have very small expected numbers
indicating that they each have statistically significant homology
to the ferredoxin query sequence (not too surprising since they are
all different ferredoxins). These statistics are essentially
extreme value statistics. Also note that the Smith-Waterman
alignment algorithm does a best local alignment (more on this
later) so the entire query sequence may not be presented in the
output. In this case since it does permit the presence of gaps in
the sequence only one match per sequence is recorded.
BLAZE came along before BLAST and, to my knowledge, predated
BLITZ. It is another implementation for a parallel computer
system. This one is operated by Intelligenetics also on a MasPar
MP-1. Although Intelligenetics is no longer in the general
ethernet "cyberspace" it is still operating this program for a real
money cost. Intelligenetics operate a computer upon which time can
be bought. I include it for interests sake but also to note that
nucleotide searches as well as protein searches were permitted on
this machine (as of July 92). It was claimed that the algorithm
could work through Swiss-Prot with a query of 100 amino acids in 15
seconds.
FLASH (Fast-Lookup Algorithm for Sequence Homology) uses a
different approach and concept. This is an IBM (Thomas J. Watson
Research Center at Yorktown Heights) project lead by A.Califano and
I.Rigoutsos (reference the Proceedings of the First Intl. Conf. on
Intelligent Systems for Mol. Biol., July 1993, Bethesda, MD). The
algorithm makes use of an object-recognition technique borrowed
from computer vision technology. Because it is using a "lookup"
algorithm from a preset table of indexed patterns its speed should
not be greatly degraded by increases in the size of the database.
The researchers at IBM claim that this is part of a general
class of algorithms that can be used to search very large databases
for diverse information including finding molecules of similar
shape or structure, text searches and visual object recognition.
The major difference in the algorithm is in its "hash" table. Like
many other search algorithms it uses a "hash" table for speed but
this lacks the sensitivity required. But this algorithm constructs
a series of non-contiguous k-tuples. By constructing these in a
precisely defined manner the amount of work creating the table is
increased but the sensitivity is increased since there are many
more k-tuples per sequence and they are not sensitive to the odd
mismatch. The searches are implemented on seven NON-dedicated
IBM/RS6000 machines (what else were you expecting).
There are several features of this server that are unusual in
comparison to the others. Requests send to the server must
contain a "Subject" line with the word dFLASH or it will be ignored.
Most other servers will ignore a "Subject" line or want it to be
blank. The server input message must have a BEGIN line,
must include a "title" line (beginning with ">") and must
have a terminating "1".
The typical input message would be
This should be mailed to
dflash@watson.ibm.com.
The above mail message yielded output as follows two years ago (a test
of the system in the last week indicated that the server is still
present but it is obviously not functioning in real time - so you have to
put up with two year old data).
As can be seen the FLASH output can be quite large. It is best to
include the ALIGNMENTS parameter (after the score matrix) to
limit output (to something less than the 10000 - the old default, which
would blow your computer anyway).
While the output produces a great deal of information on the
scores and matches it does not really provide any statistical
evaluation of the results. This is a major disadvantage of this server
in comparison to the others. It also appears to have difficulty with
the sequence numbering beyond the end of the sequence.
In terms of speed the 18 seconds reported at the top of the mail
message compares favourably with the BLAST and BLITZ servers but
doesn't quite live up to the advertising hype. The authors claim that
it should best BLAST by two (count em - two) orders of magnitude! As a
result ... FLASH has not really caught on (perhaps why I have not be
able to update the output) and is dominated by BLAST.
Comparing the four different methods is interesting. A few differences
are simply due to the fact that different databases are included with
BLAST but many of the differences are real preferences. The FASTA
algorithm suggests that the top four ferredoxins related to Halobacterium halobium are FER_HALSP, FER_SYNP4, FER_GALSU, and
FER_ANAVA. But via BLAST FER_ANAVA is replaced by FER_PORUM and
FER_ANAVA doesn't only enters 18-th in the list (though some of these
may be duplicates in other databases). Via FASTA FER_PORUM is 33-rd in
the list. Similarly via BLITZ, FER_GALSU is not in the top four but
15-th in the list and FER_PORUM is even further down at 22-nd. This
illustrates that one should not use these algorithms to determine how
closely related two sequences are. (In general you should use methods
that do global comparisons for such a question.) These are however,
measures of how strongly they are "hit" or "indicated" to be related.
The different methods obviously provide quite different answers. FLASH's
results are not really comparable since the output is two years old
and the databases have changed a great deal in the mean time.
The BLOCKS server is
somewhat related to the other servers mentioned above (and hence
included here) but is designed to answer a different question. Instead
of looking for similar sequences in the databases, it scans the PROSITE database
to search the query sequence (must be protein or optionally, it will
translate your nucleotide sequence to a protein) for similar protein
motifs. Blocks are defined as short ungapped (but potentially with
variable length) segments of highly conserved regions of proteins.
Currently (Jan. 1996) the BLOCKS database searches on 3179 block
patterns. This search is particularly useful for analysing distantly
related proteins.
The server searches for any of these blocks throughout the query
sequence and reports the results via e-mail. For each block there are
known frequencies for each amino acid at each site. Every site in the
query is matched against these blocks and scored. The highest scoring
blocks are reported.
A typical input message is
and there are no options needed for this search (the ">" indicates
a sequence title). A nucleotide sequence will be translated in all
frames but a nucleotide sequence with IUBPAC ambiguity codes will
be interpreted as a protein and will remain untranslated.
This message should be mailed to blocks@howard.fhcrc.org with
a blank subject line. Alternatively you can do the search through
their web
page. (References should cite S.Henikoff & J.Henikoff, 1991 Nucl.Acids.Res.
19:6565-6572).
The BLOCKS output is somewhat complicated. It begins with a
lengthy informational message that I have deleted and then
continues with the guts of the message.
In this case, for ferredoxin, the results are not too interesting -
it just says that ferredoxin is probably a ferredoxin (the other blocks
reported do not have high percentiles in the distribution. This also
does not illustrate some of the features that you should know about.
Since the example used in the blocks help file is so fascinating lets
look at its output. Its query sequence is based on a ORF from the
yeast third chromosome ...
The top 400 block scores are retained. Different blocks are
compared by dividing the block score by an empirically determined
99.5% calibration score and multiplying by 1000. Hence a score
above 1000 is expected for 0.5% of the blocks. Thus a typical
protein should yield 16 (0.5% * 3179 = 16) hits just randomly. The
top ten blocks are retained and if other blocks are associated with
these top ten then these will be reported as well.
The best hits for the ORF from yeast are for BL00059 the
zinc-containing alcohol dehydrogenases. This pattern consists of four
blocks of conserved amino acids that have characteristic distances
between the blocks. The best block for a family is chosen as the
anchor block. Empirical tests of the scores expected from randomized
proteins have been carried out by the authors and the score for
BL00059A is very high in comparison to these random choices (indicating
its significance, 98.5th percentile). Next the output lists a
probability estimate of finding blocks B and D given that you have
found A. This probability is based on several things including the
observed distances between the blocks and the number of blocks in a
family. Next, comes a diagrammatic map of where the blocks are located
in the sequence. Note that A, B and D are all in the proper locations
but the rather poor hit for block C is too close to A and B. Next
comes a listing of where the blocks should be located. Block A should
be 1 to 35 residues from the amino terminus, it is 1 away - block B
should be 10 to 14 residues from block A, it is 9 - block D should be
78 to 122 residues from block B, it is 96. Finally, the output
searches through representatives of Block A and aligns this with the
suggested block in the query sequence (in this case it finds ADHX_HORSE
for Block A). The second match listed is typical of a random hit.
This simple but very elegant analysis tells you many things
about the protein. In this case it has identified an unknown ORF
as a "distant member of a large family, apparently one not easily
detected using other approaches. The query [sequence was] not
reported to be a member of any family either in the original study
or in a subsequent more intensive analysis of ORFs from this
chromosome".
In addition to this the BLOCKS server will allow you access to a copy
of the PROSITE database to obtain a listing of the actual block found.
You can get the entry either via their
web
page or via e-mail with a message "get BL00198" to blocks@howard.fhcrc.org and with
a blank "subject" line. This will retrieve the entry for the
ferredoxin example. The following output will be mailed back to you.
These PROSITE entries are from the version of PROSITE used to build
the BLOCKS database. When PROSITE is updated there might be a
discrepancy for about a weeks time until the new BLOCKS database is
built and these corresponding entries are updated.
The SWISS-PROT entries that are linked in reside at the ExPASy World Wide
Web (WWW) Molecular Biology Server of the Geneva University
Hospital and the University of Geneva.
Blocks Database Version 9.0, December 1995
Copyright 1991 by Fred Hutchinson Cancer Research Center
1124 Columbia Street, A1-162, Seattle, WA 98104
Please cite: S Henikoff & JG Henikoff (1991) Automated assembly of
protein blocks for database searching, Nucleic Acids Res. 19:6565-6572.
Based on PROSITE 13.0 and SWISS-PROT 32. ID is from PROSITE, AC is
derived from the prosite.dat PS#, DE is abstracted from the prosite.dat
DE, BL is PROTOMAT information. For each segment, the SWISS-PROT ID is
followed by the position of the first residue in the segment. Segments
are clustered if >=80% of aligned residues match between any pair of
segments. Sequence weights are shown to the right of each segment. The
higher the weight (maximum 100) the more dissimilar the segment is from
other segments in the block. These weights were obtained using the
position-based method of S Henikoff & JG Henikoff (1994), JMB 243:574-578.
Pre-computed position-specific scoring matrices were made using pseudo
counts from a data-dependent method using Blosum 62 and column totals
of five times the number of different amino acids.
========================================================================
ID 4FE4S_FERREDOXIN; BLOCK
AC BL00198; distance from previous block=(7,440)
DE 4Fe-4S ferredoxins, iron-sulfur binding region proteins.
BL ICP motif; width=12; seqs=112; 99.5%=723; strength=1239
ASRA_SALTY ( 230) CISCGRCTTGCP 18
DCMA_METSO ( 441) CVGCQRCEQTCP 27
DHSB_BACSU ( 154) CMTCGVCLEACP 24
FDHB_METFO ( 296) CLKCYGCREACP 27
FER3_DESAF ( 41) CLGCESCVEVCE 20
FER_BACST ( 11) CIACGACGAAAP 22
FER_METTL ( 11) GPECAECVNACP 100
FRDB_WOLSU ( 151) CIECGCCIAACG 39
FRHG_METTH ( 209) CIKCGICYVQCP 24
HMC6_DESVH ( 121) CTCCNRCGQYCP 41
HYCB_ECOLI ( 82) CVSCKLCGIACP 17
NAPF_ECOLI ( 69) CSFCYACAQACP 30
NAPF_HAEIN ( 80) CTFCGKCVDACK 49
NAPH_ECOLI ( 226) CNRCMDCFHVCP 37
NAPH_HAEIN ( 226) CDNCMDCYNVCP 29
PHF1_CLOPA ( 190) CLLCGQCIIACP 15
PHFL_DESVH ( 66) CINCGQCLTHCP 25
YAAT_ECOLI ( 63) CLECGTCRILGL 98
YJES_ECOLI ( 193) CGKCVACMTICP 38
YJJW_ECOLI ( 47) CNDCGECVPQCP 18
ASRC_SALTY ( 212) CIGCGECVLACP 16
COOF_RHORU ( 96) CIGCKLCVMVCP 11
DMSB_ECOLI ( 98) CIGCRYCHMACP 16
DMSB_HAEIN ( 99) CIGCRYCHMACP 16
FDHB_WOLSU ( 91) CIGCGYCLYACP 18
FDNH_ECOLI ( 133) CIGCGYCIAGCP 11
FDOH_ECOLI ( 133) CIGCGYCIAGCP 11
FDXH_HAEIN ( 139) CIGCGYCIAGCP 11
FDXN_RHILT ( 10) CTQCGACEFECP 17
FER1_AZOVI ( 39) CIDCALCEPECP 10
FER1_CHLLI ( 8) CTYCGACEPECP 10
FER1_RHOCA ( 9) CTSCGDCEPVCP 12
FER2_CHLLI ( 8) CTYCAACEPECP 11
FER2_RHOCA ( 39) CIDCGVCEPECP 9
FER3_ANAVA ( 75) CIGCQACARACP 10
FER3_PLEBO ( 75) CIGCEACSRVCP 8
FER3_RHOCA ( 79) CIGCGACARVCP 7
FERN_AZOVI ( 9) CVNCWACVDVCP 19
FERN_BRAJA ( 10) CTSCSACEPLCP 22
FERN_RHIME ( 10) CTQCGACEFECP 17
FERV_AZOVI ( 10) CTVCGDCEPVCP 20
FERX_ANASP ( 9) CISCKLCSSVCP 13
FER_ALIAC ( 39) CIDCAACEPVCP 8
FER_BUTME ( 8) CIACGSCADQCP 15
FER_CHLLT ( 8) CTYCGACEPECP 10
FER_CHRVI ( 8) CINCNVCQPECP 15
FER_CLOTH ( 10) CIACGTCIDLCP 12
FER_ENTHI ( 41) CIGCGACVDACP 7
FER_MEGEL ( 36) CIDCGACEAVCP 7
FER_MYCSM ( 39) CVDCGACEPVCP 8
FER_PEPAS ( 35) CIDCGSCASVCP 11
FER_PSEPU ( 39) CIDCALCEPECP 10
FER_SACER ( 39) CVDCGACEPVCP 8
FER_STRGR ( 39) CVDCGACEPVCP 8
FER_SULAC ( 83) CIFCMACVNVCP 11
FER_THEAC ( 123) CIFCMACESVCP 12
FIXG_RHIME ( 280) CVDCNACVAVCP 11
FRDB_ECOLI ( 148) CINCGLCYAACP 10
FRDB_HAEIN ( 160) CINCGLCYAACP 10
FRDB_PROVU ( 149) CINCGLCYAACP 10
GLPC_ECOLI ( 9) CIKCTVCTTACP 13
GLPC_HAEIN ( 32) CIKCTACTAVCP 12
HMC2_DESVH ( 142) CVGCRYCMVACP 14
HYCF_ECOLI ( 40) CIGCAACVNACP 8
HYDN_ECOLI ( 89) CIGCKTCVVACP 10
MBHT_ECOLI ( 145) CTGCRYCMVACP 14
NAPG_ECOLI ( 61) CVRCGQCVQACP 12
NAPG_HAEIN ( 72) CIRCGQCVQACP 11
NQO9_PARDE ( 103) CIYCGFCQEACP 15
NRFC_ECOLI ( 125) CVGCQYCIAACP 11
NRFC_HAEIN ( 127) CIGCQYCIAVCP 10
NUIC_MAIZE ( 64) CIACEVCVRVCP 8
NUIC_MARPO ( 64) CIACEVCVRVCP 8
NUIC_ORYSA ( 62) CIACEVCVRVCP 8
NUIC_PLEBO ( 64) CIACEVCVRVCP 8
NUIC_SYNY3 ( 65) CIACEVCVRVCP 8
NUIC_TOBAC ( 64) CIACEVCVRVCP 8
NUIC_WHEAT ( 66) CIACEVCVGVCP 16
NUIM_BOVIN ( 152) CIYCGFCQEACP 15
NUIM_RHOCA ( 103) CIYCGYCQEACP 11
NUOI_ECOLI ( 98) CIFCGLCEEACP 9
PHSB_SALTY ( 96) CIGCDYCVAACP 16
PSAC_ANASP ( 10) CIGCTQCVRACP 7
PSAC_ANTSP ( 10) CIGCTQCVRACP 7
PSAC_CHLRE ( 10) CIGCTQCVRACP 7
PSAC_CYAPA ( 10) CIGCTQCVRACP 7
PSAC_EUGGR ( 10) CIGCTQCVRACP 7
PSAC_FREDI ( 10) CIGCTQCVRACP 7
PSAC_MAIZE ( 10) CIGCTHCVRACP 15
PSAC_MARPO ( 10) CIGCTQCVRACP 7
PSAC_PEA ( 10) CIGCTQCVRACP 7
PSAC_PINTH ( 10) CIGCTQCVRACP 7
PSAC_SPIOL ( 10) CIGCTQCVRACP 7
PSAC_SYNEN ( 10) CIGCTQCVRACP 7
PSAC_TOBAC ( 10) CIGCTQCVRACP 7
PSAC_WHEAT ( 10) CIGCTQCVRACP 7
PSAX_SYNY3 ( 10) CIGCTQCVRACP 7
PSRB_WOLSU ( 93) CVGCLYCIAACP 18
RDXA_RHOSH ( 251) CIDCMACVNVCP 10
YA43_HAEIN ( 53) CNGCGECASACP 16
YFFE_ECOLI ( 82) CIGCKLCAVVCP 11
DHSB_DROME ( 195) CILCACCSTSCP 14
DHSB_ECOLI ( 149) CILCACCSTSCP 14
DHSB_HUMAN ( 186) CILCACCSTSCP 14
DHSB_USTMA ( 195) CILCACCSTSCP 14
DHSB_YEAST ( 179) CILCACCSTSCP 14
FER1_DESAF ( 11) CIACESCVEIAP 24
FER2_DESVM ( 11) CMACESCVELCP 15
FER_DESGI ( 8) CMACEACVEICP 14
FIXX_AZOCA ( 65) CVECGTCRVIAE 34
FIXX_BRAJA ( 66) CIECGTCRVIAE 33
FIXX_RHILP ( 67) CMECGTCRVLCE 24
//
1 blocks processed
CC *************************************************************************
CC
CC *************************
CC *** PROSITE data file ***
CC *************************
CC
CC Release 13.0 of November 1995
CC
CC *************************************************************************
CC
CC The patterns section of PROSITE is developed by:
CC
CC Amos Bairoch
CC Medical Biochemistry Department
CC CMU
CC University of Geneva
CC 1, Rue Michel Servet, 1211 Geneva 4
CC Switzerland
CC
CC Email : bairoch@cmu.unige.ch
CC Telephone: (+41 22) 784 40 82
CC
CC
CC The profiles/matrices section of PROSITE is developed by:
CC
CC Philipp Bucher and Kay Oliver Hofmann
CC Biocomputing ISREC
CC Institut Suisse de Recherches Experimentales sur le Cancer
CC 155 ch. des Boveresses, 1066 Epalinges s/Lausanne
CC Switzerland
CC
CC Email : pbucher@isrec-sun1.unil.ch
CC khofmann@isrec-sun1.unil.ch
CC Telephone: (+41 21) 624 99 43
CC
CC *************************************************************************
CC
CC This file may be copied and redistributed freely, without advance
CC permission. You are allowed to reformat it for use with a software
CC package, but you should not modify its content without permission
CC from the author).
CC
CC *************************************************************************
//
ID 4FE4S_FERREDOXIN; PATTERN.
AC PS00198;
DT APR-1990 (CREATED); APR-1990 (DATA UPDATE); NOV-1995 (INFO UPDATE).
DE 4Fe-4S ferredoxins, iron-sulfur binding region signature.
PA C-x(2)-C-x(2)-C-x(3)-C-[PEG].
NR /RELEASE=32,49340;
NR /TOTAL=231(158); /POSITIVE=200(140); /UNKNOWN=4(2); /FALSE_POS=27(16);
NR /FALSE_NEG=7; /PARTIAL=0;
CC /TAXO-RANGE=A?EP?; /MAX-REPEAT=6;
CC /SITE=1,iron_sulfur; /SITE=3,iron_sulfur; /SITE=5,iron_sulfur;
CC /SITE=7,iron_sulfur;
DR P00214, FER1_AZOVI, T; P18082, FER2_RHOCA, T; P80448, FER2_RHORU, T;
DR P00215, FER_MYCSM , T; P24496, FER_SACER , T; P13279, FER_STRGR , T;
DR P00213, FER_PSEPU , T; P08811, FER_PSEST , T; P03942, FER_THETH , T;
DR P00198, FER_CLOAC , T; P00196, FER_CLOBU , T; P00195, FER_CLOPA , T;
DR P22846, FER_CLOPE , T; P00197, FER_CLOSP , T; P80168, FER_CLOST , T;
DR P07508, FER_CLOTM , T; P00200, FER_CLOTS , T; P00201, FER_MEGEL , T;
DR P00193, FER_PEPAS , T; P00194, FER1_RHORU, T; P14073, FER_BUTME , T;
DR P00205, FER_CHLLT , T; P00204, FER1_CHLLI, T; P00206, FER2_CHLLI, T;
DR P00208, FER_CHRVI , T; P00202, FER_METBA , T; P21305, FER_METTL , T;
DR P00218, FER_THEAC , T; P00211, FER2_DESDN, T; P08812, FER3_DESAF, T;
DR P08813, FER1_DESVM, T; P11425, FER_ENTHI , T; P12415, FERX_ANASP, T;
DR P06123, FERN_AZOCH, T; P14939, FERV_AZOVI, T; P42711, FDXN_RHILT, T;
DR P12712, FERN_RHIME, T; P27394, FERN_BRAJA, T; P16021, FER1_RHOCA, T;
DR P03941, FER_ALIAC , T; P00219, FER_SULAC , T; P00207, FER1_RHOPA, T;
DR P11054, FERN_AZOVI, T; P46050, FER3_ANAVA, T; P46036, FER3_PLEBO, T;
DR P20624, FER3_RHOCA, T; P00203, FER_CLOTH , T; P00209, FER_DESGI , T;
DR P07485, FER1_DESDN, T; P10624, FER2_DESVM, T; P29604, FER_THELI , T;
DR P46797, FER_THEMA , T; Q05561, FIXX_RHILP, T; P09822, FIXX_RHIME, T;
DR P08710, FIXX_RHILT, T; Q06439, PSAC_ANTSP, T; Q00914, PSAC_CHLRE, T;
DR P42046, PSAC_CUCSA, T; P31556, PSAC_EUGGR, T; P11601, PSAC_MAIZE, T;
DR P06251, PSAC_MARPO, T; P10793, PSAC_PEA , T; P41649, PSAC_PINTH, T;
DR P10098, PSAC_SPIOL, T; P07136, PSAC_TOBAC, T; P10794, PSAC_WHEAT, T;
DR P31173, PSAC_CYAPA, T; P23392, PSAC_ANASP, T; P31086, PSAC_ANAVA, T;
DR P23810, PSAC_FREDI, T; P18083, PSAC_SYNEN, T; P31087, PSAC_SYNP2, T;
DR P31085, PSAC_SYNP6, T; P25252, PSAC_SYNY3, T; P32422, PSAX_SYNY3, T;
DR P08066, DHSB_BACSU, T; P07014, DHSB_ECOLI, T; P00364, FRDB_ECOLI, T;
DR P44893, FRDB_HAEIN, T; P20921, FRDB_PROVU, T; P20925, YFRA_PROVU, T;
DR P17596, FRDB_WOLSU, T; P06130, FDHB_METFO, T; P19498, FRHG_METTH, T;
DR P18396, FIXG_RHIME, T; Q01854, RDXA_RHOSH, T; P07598, PHFL_DESVH, T;
DR P13629, PHFL_DESVO, T; P31894, COOF_RHORU, T; P18776, DMSB_ECOLI, T;
DR P45003, DMSB_HAEIN, T; P23481, YFFE_ECOLI, T; P24184, FDNH_ECOLI, T;
DR P32175, FDOH_ECOLI, T; P44450, FDXH_HAEIN, T; P27273, FDHB_WOLSU, T;
DR P33389, HMC2_DESVH, T; P33393, HMC6_DESVH, T; P26474, ASRA_SALTY, T;
DR P13034, GLPC_ECOLI, T; P43801, GLPC_HAEIN, T; P16428, HYCB_ECOLI, T;
DR P16432, HYCF_ECOLI, T; P30132, HYDN_ECOLI, T; P37601, PHSB_SALTY, T;
DR P31076, PSRB_WOLSU, T; P32708, NRFC_ECOLI, T; P45015, NRFC_HAEIN, T;
DR P33939, NAPF_ECOLI, T; P44650, NAPF_HAEIN, T; P33936, NAPG_ECOLI, T;
DR P44652, NAPG_HAEIN, T; P33934, NAPH_ECOLI, T; P44653, NAPH_HAEIN, T;
DR P32815, YGL5_BACST, T; P39288, YJES_ECOLI, T; P44101, YA43_HAEIN, T;
DR P32420, DHSB_USTMA, T; P21801, DHSB_YEAST, T; P21911, DHSB_SCHPO, T;
DR P21912, DHSB_HUMAN, T; P21913, DHSB_RAT , T; P21914, DHSB_DROME, T;
DR P21915, DHSB_ARATH, T; P37179, MBHT_ECOLI, T; P29166, PHF1_CLOPA, T;
DR P26476, ASRC_SALTY, T; P46722, NUIC_MAIZE, T; P06253, NUIC_MARPO, T;
DR P12099, NUIC_ORYSA, T; P06252, NUIC_TOBAC, T; P05312, NUIC_WHEAT, T;
DR Q00236, NUIC_PLEBO, T; P26525, NUIC_SYNY3, T; P42028, NUIM_BOVIN, T;
DR P42031, NUIM_RHOCA, T; P29921, NQO9_PARDE, T; P33604, NUOI_ECOLI, T;
DR P26692, DCMA_METSO, T; P39409, YJJW_ECOLI, T;
DR Q06879, NIFJ_ANASP, ?; P03833, NIFJ_KLEPN, ?;
DR P00212, FER_BACST , N; P10245, FER_BACTH , N; P00210, FER1_DESAF, N;
DR P26485, FIXX_AZOCA, N; P10326, FIXX_BRAJA, N; P09823, FIXX_RHILE, N;
DR P31576, YAAT_ECOLI, N;
DR P05687, CHH2_BOMMO, F; P20730, CHHC_BOMMO, F; P30826, ISP1_TRYBB, F;
DR P05107, ITB2_HUMAN, F; P26010, ITB7_HUMAN, F; P26011, ITB7_MOUSE, F;
DR P26372, KRUC_SHEEP, F; Q01642, M84A_DROME, F; Q01643, M84B_DROME, F;
DR Q01644, M84C_DROME, F; Q01645, M84D_DROME, F; P08175, M87F_DROME, F;
DR P23327, SRCH_HUMAN, F; P16230, SRCH_RABIT, F; P37127, YFFG_ECOLI, F;
DR P45866, YWJF_BACSU, F;
3D 5FD1; 1FD2; 2FD2; 1FDA; 1FDB; 1FDC; 1FDD; 1FDX; 1FER; 2FXB; 1FXD;
DO PDOC00176;
The following lines are also links to the prosite entries at the ExPASy World Wide
Web (WWW) molecular biology server of the Geneva University
Hospital and the University of Geneva.
This is probably more information about ferredoxin than you would ever
want. The above logo link even gives a graphical view of the
nature of the protein block. BUT if you do want more then there
are references and on some entries, they also list a contact person who
is an expert on "whatever" and in some entries they will often give you
this person's e-mail address.
The actual output consists of several parts; one part from the
Hutchinson center and the other part a copy of the PROSITE entry. The
output begins a note about blocks in general, the Block entry for
#BL00198 along with all the database entries that this block occurs
in. with a listing of the other known ferredoxin conserved blocks. In
this case you can see that it ranges from 5 to 438 residues into the
protein (i.e. anywhere). Then comes a general statement regarding
PROSITE, the PROSITE entry with a great deal of information. The
pattern for these iron-sulfur binding region signatures is
CXXCXXCXXXC[PEG], where [PEG] stands for either proline, glutamic acid
or glycine. The other pattern that you are likely to run across is
{A,G} which means any residue except alanine or glycine. Besides all
of the other information, the listing also gives you information on how
well this pattern detects iron-sulfur binding proteins. In this case
there were a total of 231 blocks in 158 sequences - of these this
pattern will detect 200, with 27 false positives from 16 sequences and
7 false negatives (4 are of uncertain affinity). Last comes a general
description of the ferredoxin pattern in proteins and a list of
references.
A really great resource - this was part of Amos Bairoch's
Ph.D. thesis.
Often you are interested in simply finding out if what you
have sequenced is already known (a more and more common occurrence)
rather than is there anything homologous to it. This question can
be answered much more easily and EMBL has implemented the
QUICKSEARCH server to carry it out. The program will detect hits
even if there are a small number of mismatches.
The input message should be of the form ...
The sequence must be a nucleotide sequence. Only SEQ is required,
all others will default. The MATCH n option says that only entries
with more than n% identity should be reported [90]. The BEST
option says that a Smith-Waterman alignment should be done rather
than a Needleman-Wunsch alignment.
The QUICKSEARCH method is very similar to the FASTA approach
but uses a very large WORD size for the "hash" table and hence the
difference in speed. Again this is more appropriate to ask are
there very similar sequences already in the database. It can not
answer if there are distantly related sequences. The original
program was part of J.Devereux's Ph.D. thesis.
The input file should be mailed to
quick@ebi.ac.uk.
At the opposite extreme is SSEARCH. This does a universal sequence
comparison using the Smith-Waterman algorithm ( T. F. Smith and M. S.
Waterman (1981) J.Mol.Biol. 147:195-197). This program uses code
developed by Huang and Miller (X. Huang, R. C. Hardison, W. Miller
(1990) CABIOS 6:373-381) for calculating the local similarity score and
code from the ALIGN program (see below) for calculating the local
alignment. SSEARCH is about 50-times slower than FASTA with ktup=2
(for proteins).
The following is an example of why you should routinely do a
search (FASTA, BLAST or whatever) for any new sequence that you are
working on. This is a copy of a letter to the editor of NATURE
vol. 358, p.271.
_______________________________
(Peptide) FASTA of: 260117af.Seq from: 1 to: 128 February 3, 1996 19:43
TO: SWALL:* Sequences: 51,998 Symbols: 18,448,967 Word Size: 1
Score Init1 Initn
< 2 0 0:
4 0 0:
6 9 9:=====
8 7 7:====
10 83 83:==========================================
12 160 160:==================================================
14 191 191:==================================================
16 362 362:==================================================
18 768 768:==================================================
20 1279 1279:==================================================
22 2370 2370:==================================================
24 3774 3774:==================================================
26 5417 5417:==================================================
28 6770 6770:==================================================
30 6878 6878:==================================================
32 6299 6244:==================================================
34 5114 4762:==================================================
36 3941 3487:==================================================
38 2686 2350:==================================================
40 2087 1747:==================================================
42 1319 1105:==================================================
44 897 798:==================================================
46 567 722:==================================================
48 312 556:==================================================
50 248 486:==================================================
52 121 375:==================================================
54 93 329:===============================================+++
56 60 230:==============================++++++++++++++++++++
58 37 179:===================+++++++++++++++++++++++++++++++
60 35 153:==================++++++++++++++++++++++++++++++++
62 12 93:======+++++++++++++++++++++++++++++++++++++++++
64 10 58:=====++++++++++++++++++++++++
66 4 56:==++++++++++++++++++++++++++
68 3 32:==++++++++++++++
70 2 18:=++++++++
72 5 23:===+++++++++
74 0 15:++++++++
76 0 13:+++++++
78 1 8:=+++
80 0 7:++++
> 80 77 84:=======================================+++
mean initn score: 23.6 (3.43)
mean init1 score: 23.6 (3.43)
The best scores are: init1 initn opt..
Sw:Fer_Halha P00216 halobacterium halobium. ferredoxin. ... 635 635 635
Sw:Fer_Halsp P00217 halobacterium sp. ferredoxin. 11/88 571 571 571
Sw:Fer_Synp4 P15788 synechococcus sp. (strain pcc 7418) ... 182 182 210
Sw:Fer_Galsu P00241 galdieria sulphuraria (cyanidium cal... 163 182 188
Sw:Fer1_Anava P00254 anabaena variabilis, and anabaena s... 180 180 203
Sw:Fer2_Nosmu P00249 nostoc muscorum. ferredoxin ii. 11/88 179 179 200
Sw:Fer1_Synp7 P06517 synechococcus sp. (strain pcc 7942)... 162 179 185
Sw:Fer1_Anasp P06543 anabaena sp. (strain pcc 7120). fer... 176 176 205
Sw:Fer_Synsp P00256 synechococcus sp. ferredoxin. 11/88 175 175 199
Sw:Fer_Synli P00255 synechococcus lividus. ferredoxin. 1... 175 175 199
Sw:Fer2_Spiol P00224 spinacia oleracea (spinach). ferred... 157 175 180
Sw:Fer_Marpo P09735 marchantia polymorpha (liverwort). f... 158 174 186
Sw:Fer_Nosmu P00253 nostoc muscorum. ferredoxin. 11/88 174 174 203
Sw:Fer_Gleja P00233 gleichenia japonica (urajiro) (fern)... 158 174 180
Sw:Fer1_Cyapa P17007 cyanophora paradoxa. ferredoxin i. ... 159 174 180
Sw:Fer_Rhopl P07484 rhodymenia palmata (dulse). ferredox... 157 172 180
Sw:Fer_Chlfr P00247 chlorogloeopsis fritschii. ferredoxi... 168 168 189
Sw:Fer_Scequ P00238 scenedesmus quadricauda. ferredoxin.... 153 168 174
Sw:Fer1_Nosmu P00252 nostoc muscorum. ferredoxin i. 11/88 166 166 191
Sw:Fer1_Orysa P11051 oryza sativa (rice). ferredoxin i. ... 152 165 174
Sw:Fer_Eugvi P22341 euglena viridis. ferredoxin. 8/91 164 164 190
Sw:Fer5_Maize P27789 zea mays (maize). ferredoxin v prec... 142 163 173
Sw:Fer_Masla P00248 mastigocladus laminosus (fischerella... 162 162 186
Sw:Fer_Chlre P07839 chlamydomonas reinhardtii. ferredoxi... 147 162 168
Sw:Fer1_Equte P00234 equisetum telmateia (giant horsetai... 160 160 188
Sw:Fer1_Maize P27787 zea mays (maize). ferredoxin i prec... 145 160 170
Sw:Fer1_Equar P00235 equisetum arvense (field horsetail)... 159 159 187
Sw:Fer_Bryma P07838 bryopsis maxima. ferredoxin. 2/94 159 159 181
Sw:Fer_Wheat P00228 triticum aestivum (wheat). ferredoxi... 156 156 177
Sw:Fer2_Dunsa P00240 dunaliella salina. ferredoxin ii. 2/94 154 154 175
Sw:Fer1_Spiol P00221 spinacia oleracea (spinach). ferred... 154 154 171
Sw:Fer1_Dunsa P00239 dunaliella salina. ferredoxin i. 2/94 151 151 175
Sw:Fer_Porum P00242 porphyra umbilicalis (laver). ferred... 151 151 179
Sw:Fer_Perbi P10770 peridinium bipes (dinoflagellate). f... 137 151 159
Sw:Fer_Spipl P00246 spirulina platensis. ferredoxin. 3/92 149 149 180
Sw:Fer1_Phyes P00230 phytolacca esculenta (food pokeberr... 148 148 178
Sw:Fer_Brana P00227 brassica napus (rape). ferredoxin. 2/94 136 148 161
Sw:Fer_Aphsa P00250 aphanothece sacrum. ferredoxin i. 5/92 132 148 167
Sw:Fer_Leugl P00225 leucaena glauca (white popinac) (leu... 147 147 170
Sw:Fer1_Phyam P00229 phytolacca americana (common pokebe... 147 147 177
Sw:Fer_Bumfi P13106 bumilleriopsis filiformis. ferredoxi... 132 147 167
Sw:Fer_Spima P00245 spirulina maxima. ferredoxin. 11/88 147 147 178
Sw:Fer_Syny4 P00243 synechocystis sp. (strain pcc 6714).... 146 146 178
Sw:Fer2_Rapsa P14937 raphanus sativus (radish). ferredox... 146 146 179
Sw:Ferh_Anava P46046 anabaena variabilis. ferredoxin, he... 146 146 161
Sw:Fer_Syny3 P27320 synechocystis sp. (strain pcc 6803).... 146 146 179
Sw:Fer1_Rapsa P14936 raphanus sativus (radish). ferredox... 145 145 181
Sw:Fer2_Plebo P46035 plectonema boryanum. ferredoxin ii ... 144 144 161
Sw:Ferh_Anasp P11053 anabaena sp. (strain pcc 7120). fer... 144 144 159
Swnew:Fer2_Plebo P46035 FERREDOXIN II (FDII). 2/96 144 144 161
Sw:Fer_Coles P00222 colocasia esculenta (elephant's ear)... 144 144 168
Sw:Fer_Arath P16972 arabidopsis thaliana (mouse-ear cres... 144 144 166
Sw:Fer1_Synp2 P31965 synechococcus sp. (strain pcc 7002)... 144 144 173
Sw:Fer1_Aphfl P00244 aphanizomenon flos-aquae. ferredoxi... 143 143 177
Sw:Fer3_Rapsa P14938 raphanus sativus (radish). ferredox... 143 143 166
Sw:Fer_Silpr P04669 silene pratensis (white campion) (ly... 143 143 166
Sw:Ferh_Fredi P28610 fremyella diplosiphon (calothrix pc... 142 142 158
Sw:Fer_Samni P00226 sambucus nigra (european elder). fer... 140 140 163
Sw:Fer_Medsa P00220 medicago sativa (alfalfa). ferredoxi... 139 139 159
Sw:Fer2_Phyam P00231 phytolacca americana (common pokebe... 139 139 164
260117af.Seq
Sw:Fer_Halha
ID FER_HALHA STANDARD; PRT; 128 AA.
AC P00216;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-JUN-1994 (REL. 29, LAST ANNOTATION UPDATE)
DE FERREDOXIN. . . .
SCORES Init1: 635 Initn: 635 Opt: 635
100.0% identity in 128 aa overlap
10 20 30 40 50 60
260117 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fer_Ha PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
10 20 30 40 50 60
70 80 90 100 110 120
260117 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fer_Ha FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
70 80 90 100 110 120
260117 DYLQNRVI
||||||||
Fer_Ha DYLQNRVI
260117af.Seq
Sw:Fer_Halsp
ID FER_HALSP STANDARD; PRT; 128 AA.
AC P00217;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-NOV-1988 (REL. 09, LAST ANNOTATION UPDATE)
DE FERREDOXIN. . . .
SCORES Init1: 571 Initn: 571 Opt: 571
84.4% identity in 128 aa overlap
10 20 30 40 50 60
260117 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
|||||||||::||:|||| |||:|::|:| :||:||||::||:|||||||||||||||||
Fer_Ha PTVEYLNYEVVDDNGWDMYDDDVFGEASDMDLDDEDYGSLEVNEGEYILEAAEAQGYDWP
10 20 30 40 50 60
70 80 90 100 110 120
260117 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
||||||||||||:|| ||:|||||||||||||||:|:|||||||||:|||||||||||||
Fer_Ha FSCRAGACANCAAIVLEGDIDMDMQQILSDEEVEDKNVRLTCIGSPDADEVKIVYNAKHL
70 80 90 100 110 120
260117 DYLQNRVI
||||||||
Fer_Ha DYLQNRVI
260117af.Seq
Sw:Fer_Synp4
ID FER_SYNP4 STANDARD; PRT; 98 AA.
AC P15788;
DT 01-APR-1990 (REL. 14, CREATED)
DT 01-APR-1990 (REL. 14, LAST SEQUENCE UPDATE)
DT 01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE)
DE FERREDOXIN. . . .
SCORES Init1: 182 Initn: 182 Opt: 210
48.8% identity in 82 aa overlap
10 20 30 40 50 60
260117 ETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGAC
|:||:::||||::||::| | |:|||||||
Fer_Sy ASYKVTLINEEMGLNETIEVPDDEYILDVAEEEGIDLPYSCRAGAC
10 20 30 40
70 80 90 100 110 120
260117 ANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
::||: :|||||| : | :|:|:::|: | |||:: ||:| : |::::::|
Fer_Sy STCAGKIKEGEIDQSDQSFLDDDQIEAGYV-LTCVAYPASDCTIITHQEEELY
50 60 70 80 90
260117af.Seq
Sw:Fer_Galsu
ID FER_GALSU STANDARD; PRT; 98 AA.
AC P00241;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-OCT-1994 (REL. 30, LAST ANNOTATION UPDATE)
DE FERREDOXIN. . . .
SCORES Init1: 163 Initn: 182 Opt: 188
41.5% identity in 82 aa overlap
10 20 30 40 50 60
260117 ETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGAC
|:| ::::|||:|||:|| | |:|||||||
Fer_Ga ASYKIHLVNKDQGIDETIECPDDQYILDAAEEQGLDLPYSCRAGAC
10 20 30 40
70 80 90 100 110 120
260117 ANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
::||: : |||:| : | :|:|::|:: : |||:: |::::: :::::: |
Fer_Ga STCAGKLLEGEVDQSDQSFLDDDQVKA-GFVLTCVAYPTSNATILTHQEESLY
50 60 70 80 90
260117af.Seq
Sw:Fer1_Anava
ID FER1_ANAVA STANDARD; PRT; 98 AA.
AC P00254;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 01-NOV-1988 (REL. 09, LAST SEQUENCE UPDATE)
DT 01-FEB-1994 (REL. 28, LAST ANNOTATION UPDATE)
DE FERREDOXIN I. . . .
SCORES Init1: 180 Initn: 180 Opt: 203
45.1% identity in 82 aa overlap
10 20 30 40 50 60
260117 ETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGAC
|::|:::||||:|||:|||| |||||||||
Fer1_A ATFKVTLINEAEGTSNTIDVPDDEYILDAAEEQGYDLPFSCRAGAC
10 20 30 40
70 80 90 100 110 120
260117 ANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
::||: : :|::| : | :|:|:::|: | |||:: |::| : ::::::|
Fer1_A STCAGKLVSGTVDQSDQSFLDDDQIEAGYV-LTCVAYPTSDVTIQTHKEEDLY
50 60 70 80 90
260117af.Seq
Sw:Fer2_Nosmu
ID FER2_NOSMU STANDARD; PRT; 98 AA.
AC P00249;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-NOV-1988 (REL. 09, LAST ANNOTATION UPDATE)
DE FERREDOXIN II. . . .
SCORES Init1: 179 Initn: 179 Opt: 200
45.1% identity in 71 aa overlap
10 20 30 40 50 60
260117 ETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGAC
|:||:::||||:|||::| | |||||:|:|
Fer2_N ATYKVRLFNAAEGLDETIEVPDDEYILDAAEEAGLDLPFSCRSGSC
10 20 30 40
70 80 90 100 110 120
260117 ANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
::|::|:|:|::| : |::|:|::::: :| |||:: |:::
Fer2_N SSCNGILKKGTVDQSDQNFLDDDQIAAGNV-LTCVAYPTSNCEIETHREDAIA
50 60 70 80 90
260117af.Seq
Sw:Fer1_Synp7
ID FER1_SYNP7 STANDARD; PRT; 98 AA.
AC P06517;
DT 01-JAN-1988 (REL. 06, CREATED)
DT 01-JAN-1988 (REL. 06, LAST SEQUENCE UPDATE)
DT 01-AUG-1990 (REL. 15, LAST ANNOTATION UPDATE)
DE FERREDOXIN I. . . .
SCORES Init1: 162 Initn: 179 Opt: 185
41.5% identity in 82 aa overlap
10 20 30 40 50 60
260117 ETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGAC
|::||:::|||:|||:|| | |:|||||||
Fer1_S ATYKVTLVNAAEGLNTTIDVADDTYILDAAEEQGIDLPYSCRAGAC
10 20 30 40
70 80 90 100 110 120
260117 ANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
::||: | :|::| : | :|:|::::: : |||:: |::| : ::::::|
Fer1_S STCAGKVVSGTVDQSDQSFLDDDQIAA-GFVLTCVAYPTSDVTIETHKEEDLY
50 60 70 80 90
......................................................
..................Material deleted....................
......................................................
260117af.Seq
Sw:Fer_Synsp
ID FER_SYNSP STANDARD; PRT; 97 AA.
AC P00256;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-NOV-1988 (REL. 09, LAST ANNOTATION UPDATE)
DE FERREDOXIN. . . .
SCORES Init1: 175 Initn: 175 Opt: 199
48.8% identity in 84 aa overlap
10 20 30 40 50 60
260117 ETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGAC
|::|:|:||||::||:|| | |||||||||
Fer_Sy ATYKVTLVRPDGSETTIDVPEDEYILDVAEEQGLDLPFSCRAGAC
10 20 30 40
70 80 90 100 110 120
260117 ANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
::||: : |||:| : | :|:|::: ||: |||:: | :| ||: |::: |
Fer_Sy STCAGKLLEGEVDQSDQSFLDDDQI-EKGFVLTCVAYPRSD-CKILTNQEEELY
50 60 70 80 90
260117af.Seq
Sw:Fer_Synli
ID FER_SYNLI STANDARD; PRT; 96 AA.
AC P00255;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-NOV-1988 (REL. 09, LAST ANNOTATION UPDATE)
DE FERREDOXIN. . . .
SCORES Init1: 175 Initn: 175 Opt: 199
46.3% identity in 82 aa overlap
10 20 30 40 50 60
260117 ETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGAC
|::|:|:||||::||:|| | |||||||||
Fer_Sy ATYKVTLVRPDGETTIDVPEDEYILDVAEEQGLDLPFSCRAGAC
10 20 30 40
70 80 90 100 110 120
260117 ANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
::||: : |||:| : | :|:|::: ||: |||:: | :| :::::::|
Fer_Sy STCAGKLLEGEVDQSDQSFLDDDQI-EKGFVLTCVAYPRSDCKILTHQEEELY
50 60 70 80 90
CPU time: 0:01:02
Output File: Local$Scratch:260117af.Res;
FASTA format
Statistical Significance
BLAST
PROGRAM blastp
DATALIB nr
BEGIN
> Hal ha.
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
DYLQNRVI
and mail the file to
blast@ncbi.nlm.nih.gov.
You should receive the results by mail within a few minutes.
BLAST output
BLASTP 1.4.8MP [20-June-1995] [Build 13:58:02 Oct 17 1995]
Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol.
215:403-10.
Query= Hal ha.
(128 letters)
Database: Non-redundant PDB+SwissProt+SPupdate+PIR+GenPept+GPupdate
173,745 sequences; 51,502,515 total letters.
Searching..................................................done
Observed Numbers of Database Sequences Satisfying
Various EXPECTation Thresholds (E parameter values)
Histogram units: = 28 Sequences : less than 28 sequences
EXPECTation Threshold
(E parameter)
|
V Observed Counts-->
10000 6826 1707 |============================================================
6310 5119 1608 |=========================================================
3980 3511 999 |===================================
2510 2512 699 |========================
1580 1813 648 |=======================
1000 1165 328 |===========
631 837 237 |========
398 600 157 |=====
251 443 87 |===
158 356 71 |==
100 285 38 |=
63.1 247 50 |=
39.8 197 26 |:
25.1 171 22 |:
15.8 149 7 |:
>>>>>>>>>>>>>>>>>>>>> Expect = 10.0, Observed = 142 <<<<<<<<<<<<<<<<<
10.0 142 8 |:
6.31 134 5 |:
3.98 129 5 |:
2.51 124 3 |:
1.58 121 1 |:
1.00 120 1 |:
0.63 119 1 |:
0.40 118 1 |:
0.25 117 0 |
0.16 117 1 |:
0.10 116 2 |:
0.063 114 2 |:
0.040 112 2 |:
0.025 110 0 |
0.016 110 1 |:
0.010 109 2 |:
0.0063 107 0 |
0.0040 107 0 |
0.0025 107 0 |
0.0016 107 1 |:
Smallest
Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N) N
pir|S35235|S35235 ferredoxin [2Fe-2S] - Halobacterium ... 681 3.9e-89 1
sp|P00216|FER_HALHA FERREDOXIN. >pir|A00220|FEHS ferredo... 681 3.9e-89 1
sp|P00217|FER_HALSP FERREDOXIN. >pir|A00221|FEHSX ferred... 583 1.1e-75 1
sp|P15788|FER_SYNP4 FERREDOXIN. >pir|A28858|A28858 ferre... 176 1.7e-25 2
sp|P00241|FER_GALSU FERREDOXIN. >pir|A00245|FEKK ferredo... 162 5.7e-22 2
sp|P00242|FER_PORUM FERREDOXIN. >pir|A00246|FEPRU ferred... 150 2.7e-21 2
sp|P00234|FER1_EQUTE FERREDOXIN I. >pir|A00240|FEEQ1 ferr... 153 2.8e-21 2
pir|S08122|S08122 ferredoxin [2Fe-2S] I - Synechococcu... 156 3.6e-21 2
pir|S11048|FEKT1 ferredoxin [2Fe-2S] I - Cyanophora p... 155 3.6e-21 2
sp|P06517|FER1_SYNP7 FERREDOXIN I. >pir|A30022|A30022 fer... 156 3.7e-21 2
sp|P17007|FER1_CYAPA FERREDOXIN I. 155 3.7e-21 2
pir|S28198|S28198 ferredoxin [2Fe-2S] A - giant taro 146 3.7e-21 2
pdb|1FRR|A Ferredoxin I >pdb|1FRR|B Ferredoxin ... 151 5.2e-21 2
sp|P00224|FER2_SPIOL FERREDOXIN II. >pir|A00231|FESP2 fer... 154 6.9e-21 2
sp|P09735|FER_MARPO FERREDOXIN. >pir|A24126|FELV ferredo... 149 7.1e-21 2
sp|P15789|FER_CYACA FERREDOXIN. 141 1.9e-19 2
sp|P00256|FER_SYNSP FERREDOXIN. >pir|A00259|FEYCT ferred... 173 2.6e-19 1
sp|P00255|FER_SYNLI FERREDOXIN. >pir|A00258|FEYCAL ferre... 173 2.7e-19 1
pir|A25761|FEAI ferredoxin [2Fe-2S] - Anabaena varia... 170 6.7e-19 1
sp|P00254|FER1_ANAVA FERREDOXIN I. 170 6.7e-19 1
sp|P00240|FER2_DUNSA FERREDOXIN II. >pir|A00244|FEDH2 fer... 148 9.7e-19 2
pir|S25233|S25233 ferredoxin [2Fe-2S] I - Anabaena sp.... 168 1.3e-18 1
pdb|1FXA|A [2Fe-2S] Ferredoxin >pdb|1FXA|B [2Fe... 168 1.3e-18 1
sp|P00253|FER_NOSMU FERREDOXIN. >pir|A00257|FENM ferredo... 165 3.3e-18 1
sp|P14936|FER1_RAPSA FERREDOXIN, ROOT R-B1. >pir|JX0084|J... 139 3.4e-18 2
sp|P00238|FER_SCEQU FERREDOXIN. >pir|A00242|FESC ferredo... 144 3.5e-18 2
sp|P00239|FER1_DUNSA FERREDOXIN I. >pir|A00243|FEDH1 ferr... 144 6.2e-18 2
sp|P22341|FER_EUGVI FERREDOXIN. >pir|S15425|S15425 ferre... 163 6.3e-18 1
sp|P14937|FER2_RAPSA FERREDOXIN, ROOT R-B2. >pir|JX0083|J... 138 1.0e-17 2
gp|U33848|PBU33848_1 PetF1 [Plectonema boryanum] 161 1.1e-17 1
pir|JA0098|JA0098 ferredoxin [2Fe-2S] - Synechococcus sp. 160 2.8e-17 1
sp|P00249|FER2_NOSMU FERREDOXIN II. >pir|A00253|FENM2M fe... 159 5.6e-17 1
pir|S28199|S28199 ferredoxin [2Fe-2S] B - giant taro 136 7.4e-17 2
sp|P00247|FER_CHLFR FERREDOXIN. >pir|A00251|FEEF ferredo... 158 1.0e-16 1
pir|S00361|FEKM ferredoxin [2Fe-2S] - Chlamydomonas ... 136 1.9e-16 2
sp|P00252|FER1_NOSMU FERREDOXIN I. >pir|A00256|FENM1M fer... 156 2.8e-16 1
pir|S49989|S49989 2Fe-2S-ferredoxin - Anabaena variabi... 133 2.8e-16 2
sp|P00232|FER2_PHYES FERREDOXIN II. >pir|A00238|FEFW2E fe... 126 2.8e-16 2
sp|P46046|FERH_ANAVA FERREDOXIN, HETEROCYST. 133 2.8e-16 2
sp|P00222|FER_COLES FERREDOXIN. >pir|A00229|FETA ferredo... 133 6.7e-16 2
pir|S04543|S04543 ferredoxin [2Fe-2S] - Anabaena sp. (... 130 9.9e-16 2
sp|P00231|FER2_PHYAM FERREDOXIN II. >pir|A00237|FEFW2 fer... 127 1.0e-15 2
pdb|1FRD| Heterocyst [2fe-2s] Ferredoxin (Oxid... 130 1.0e-15 2
sp|P00248|FER_MASLA FERREDOXIN. >pir|A00252|FEMW ferredo... 152 1.6e-15 1
sp|P07839|FER_CHLRE FERREDOXIN PRECURSOR. >gp|L10349|CRE... 136 9.1e-15 2
pdb|3FXC| Ferredoxin >sp|P00246|FER_SPIPL FERR... 147 1.2e-14 1
sp|P00245|FER_SPIMA FERREDOXIN. >pir|A00249|FESG ferredo... 146 1.7e-14 1
sp|P07484|FER_RHOPL FERREDOXIN. >pir|A93760|FEPRR ferred... 146 1.7e-14 1
gp|D64000|SYCSLRB_86 ferredoxin [Synechocystis sp.] 146 1.7e-14 1
sp|P27320|FER_SYNY3 FERREDOXIN. 146 1.7e-14 1
sp|P00243|FER_SYNY4 FERREDOXIN. >pir|A00247|FEYB6 ferred... 146 1.7e-14 1
sp|P27789|FER5_MAIZE FERREDOXIN V PRECURSOR. >gp|M73828|M... 135 2.1e-14 2
gp|D30794|RICFERRA_1 ferredoxin [Oryza sativa] 135 2.2e-14 2
sp|P09911|FER1_PEA FERREDOXIN I PRECURSOR. >pir|S11495|... 130 2.3e-14 2
sp|P11051|FER1_ORYSA FERREDOXIN I. 144 3.7e-14 1
sp|P00230|FER1_PHYES FERREDOXIN I. 144 3.7e-14 1
pir|S03730|FERZ ferredoxin [2Fe-2S] I - rice 144 3.7e-14 1
sp|P00244|FER1_APHFL FERREDOXIN I. >pir|A00248|FEFZ1 ferr... 143 5.2e-14 1
sp|P10770|FER_PERBI FERREDOXIN. >pir|A30036|FEDQ ferredo... 124 5.3e-14 2
sp|P07838|FER_BRYMA FERREDOXIN. >pir|S07452|FEYO ferredo... 142 7.5e-14 1
sp|P00233|FER_GLEJA FERREDOXIN. >pir|A00239|FEFNG ferred... 142 7.7e-14 1
sp|P00225|FER_LEUGL FERREDOXIN. >pir|A92055|FELG ferredo... 141 1.1e-13 1
sp|P00229|FER1_PHYAM FERREDOXIN I. >pir|A00236|FEFW1 ferr... 140 1.6e-13 1
pir|B00238|FEFWF ferredoxin [2Fe-2S] I - food pokeweed 140 1.6e-13 1
sp|P27788|FER3_MAIZE FERREDOXIN III PRECURSOR. >gp|M73831... 129 1.9e-13 2
sp|P00226|FER_SAMNI FERREDOXIN. >pir|A00233|FEED ferredo... 139 2.2e-13 1
sp|P00223|FER_ARCLA FERREDOXIN. >pir|A00230|FEBQ ferredo... 115 2.3e-13 2
pir|C47673|C47673 ferredoxin [2Fe-2S] - Synechococcus ... 138 3.2e-13 1
sp|P31965|FER1_SYNP2 FERREDOXIN I. 138 3.2e-13 1
sp|P14938|FER3_RAPSA FERREDOXIN, LEAF L-A. 138 3.2e-13 1
pir|JX0082|JX0082 ferredoxin [2Fe-2S] A, leaf - radish 138 3.2e-13 1
sp|P00228|FER_WHEAT FERREDOXIN PRECURSOR. >pir|S37226|FE... 146 3.2e-13 1
sp|P00221|FER1_SPIOL FERREDOXIN I PRECURSOR. >pir|S00437|... 146 3.6e-13 1
gp|D30763|RICFERR_1 ferredoxin [Oryza sativa] 144 6.4e-13 1
pir|S40169|S40169 FdxH protein - Plectonema boryanum >... 135 9.1e-13 1
sp|P46035|FER2_PLEBO FERREDOXIN II (FDII). 135 9.2e-13 1
sp|P13106|FER_BUMFI FERREDOXIN. >pir|A28857|FEBF2 ferred... 135 9.2e-13 1
pir|A61291|A61291 ferredoxin [2Fe-2S] - parsley 133 1.9e-12 1
sp|P00227|FER_BRANA FERREDOXIN. >pir|A00234|FERP ferredo... 132 2.7e-12 1
sp|P16972|FER_ARATH FERREDOXIN PRECURSOR. >pir|S09979|S0... 140 3.8e-12 1
pir|S20934|S20934 ferredoxin [2Fe-2S] - Calothrix sp. ... 129 7.4e-12 1
pir|S49996|S49996 2Fe-2S-ferredoxin - Anabaena variabi... 129 7.4e-12 1
sp|P28610|FERH_FREDI FERREDOXIN, HETEROCYST. 129 7.5e-12 1
sp|P46047|FERV_ANAVA FERREDOXIN, VEGETATIVE. 129 7.5e-12 1
pir|JA0099|JA0099 ferredoxin [2Fe-2S] - Ochromonas danica 129 7.5e-12 1
pdb|1FXI|A Ferredoxin I >pdb|1FXI|B Ferredoxin ... 129 7.6e-12 1
sp|P04669|FER_SILPR FERREDOXIN PRECURSOR. >pir|A23011|FE... 138 7.9e-12 1
sp|P00220|FER_MEDSA FERREDOXIN. >pir|A00227|FEAA ferredo... 126 2.1e-11 1
sp|P27787|FER1_MAIZE FERREDOXIN I PRECURSOR. >gp|M73829|M... 131 1.1e-10 1
sp|P00251|FER2_APHSA FERREDOXIN II. >pir|A00255|FEAH2 fer... 121 4.1e-10 1
sp|P19734|DMPP_PSEPU PHENOL HYDROXYLASE P5 PROTEIN (EC 1.... 104 2.6e-06 1
pir|F37831|F37831 phenol 2-monooxygenase (EC 1.14.13.7... 104 2.6e-06 1
gp|D28864|PSEPHEAA_6 one component of phenol hydroxylase ... 104 2.6e-06 1
pir|S47419|S47419 phenolhydroxylase chain - Pseudomona... 102 5.2e-06 1
pir|S44308|S44308 phenol hydroxylase - Pseudomonas put... 100 1.0e-05 1
sp|P00237|FER2_EQUAR FERREDOXIN II. >pir|B04609|FEEQ2F fe... 92 4.1e-05 1
sp|P00236|FER2_EQUTE FERREDOXIN II. >pir|A00241|FEEQ2 fer... 92 4.1e-05 1
sp|P08451|FER2_SYNP6 FERREDOXIN II. 92 5.6e-05 1
pir|S10833|FEYC2 ferredoxin [2Fe-2S] II - Synechococc... 92 5.7e-05 1
gp|D31732|PEENIRA_1 nitrite reductase [Plectonema boryanum] 93 0.00012 1
WARNING: Descriptions of 42 database sequences were not reported due to the
limiting value of parameter V = 100.
>pir|S35235|S35235 ferredoxin [2Fe-2S] - Halobacterium salinarium
>gp|X68103|HSFDXG_1 ferredoxin [Halobacterium salinarium]
Length = 129
Score = 681 (310.4 bits), Expect = 3.9e-89, P = 3.9e-89
Identities = 128/128 (100%), Positives = 128/128 (100%)
Query: 1 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 60
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
Sbjct: 2 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 61
Query: 61 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
Sbjct: 62 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 121
Query: 121 DYLQNRVI 128
DYLQNRVI
Sbjct: 122 DYLQNRVI 129
>sp|P00216|FER_HALHA FERREDOXIN. >pir|A00220|FEHS ferredoxin [2Fe-2S] -
Halobacterium halobium
Length = 128
Score = 681 (310.4 bits), Expect = 3.9e-89, P = 3.9e-89
Identities = 128/128 (100%), Positives = 128/128 (100%)
Query: 1 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 60
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
Sbjct: 1 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 60
Query: 61 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
Sbjct: 61 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
Query: 121 DYLQNRVI 128
DYLQNRVI
Sbjct: 121 DYLQNRVI 128
>sp|P00217|FER_HALSP FERREDOXIN. >pir|A00221|FEHSX ferredoxin [2Fe-2S] -
Halobacterium sp.
Length = 128
Score = 583 (265.7 bits), Expect = 1.1e-75, P = 1.1e-75
Identities = 108/128 (84%), Positives = 118/128 (92%)
Query: 1 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 60
PTVEYLNYE +DD GWDM DDD+F +A+D LD EDYG++EV EGEYILEAAEAQGYDWP
Sbjct: 1 PTVEYLNYEVVDDNGWDMYDDDVFGEASDMDLDDEDYGSLEVNEGEYILEAAEAQGYDWP 60
Query: 61 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
FSCRAGACANCA+IV EG+IDMDMQQILSDEEVE+K+VRLTCIGSP ADEVKIVYNAKHL
Sbjct: 61 FSCRAGACANCAAIVLEGDIDMDMQQILSDEEVEDKNVRLTCIGSPDADEVKIVYNAKHL 120
Query: 121 DYLQNRVI 128
DYLQNRVI
Sbjct: 121 DYLQNRVI 128
>sp|P15788|FER_SYNP4 FERREDOXIN. >pir|A28858|A28858 ferredoxin [2Fe-2S] -
Synechococcus sp.
Length = 98
Score = 176 (80.2 bits), Expect = 1.7e-25, Sum P(2) = 1.7e-25
Identities = 31/56 (55%), Positives = 41/56 (73%)
Query: 39 TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVE 94
T+EV + EYIL+ AE +G D P+SCRAGAC+ CA +KEGEID Q L D+++E
Sbjct: 17 TIEVPDDEYILDVAEEEGIDLPYSCRAGACSTCAGKIKEGEIDQSDQSFLDDDQIE 72
Score = 45 (20.5 bits), Expect = 1.7e-25, Sum P(2) = 1.7e-25
Identities = 11/35 (31%), Positives = 17/35 (48%)
Query: 86 QILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
Q D++ E LTC+ PA+D I + + L
Sbjct: 63 QSFLDDDQIEAGYVLTCVAYPASDCTIITHQEEEL 97
>sp|P00241|FER_GALSU FERREDOXIN. >pir|A00245|FEKK ferredoxin [2Fe-2S] - red
alga (Cyanidium caldarium)
Length = 98
Score = 162 (73.8 bits), Expect = 5.7e-22, Sum P(2) = 5.7e-22
Identities = 29/56 (51%), Positives = 40/56 (71%)
Query: 39 TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVE 94
T+E + +YIL+AAE QG D P+SCRAGAC+ CA + EGE+D Q L D++V+
Sbjct: 17 TIECPDDQYILDAAEEQGLDLPYSCRAGACSTCAGKLLEGEVDQSDQSFLDDDQVK 72
Score = 33 (15.0 bits), Expect = 5.7e-22, Sum P(2) = 5.7e-22
Identities = 7/35 (20%), Positives = 16/35 (45%)
Query: 86 QILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
Q D++ + LTC+ P ++ + + + L
Sbjct: 63 QSFLDDDQVKAGFVLTCVAYPTSNATILTHQEESL 97
.........................................................
.................Material Deleted........................
.........................................................
>gp|D64000|SYCSLRB_86 ferredoxin [Synechocystis sp.]
Length = 97
Score = 146 (66.6 bits), Expect = 1.7e-14, P = 1.7e-14
Identities = 25/56 (44%), Positives = 37/56 (66%)
Query: 39 TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVE 94
++E ++ YIL+AAE G D P+SCRAGAC+ CA + G +D Q L D+++E
Sbjct: 16 SIECSDDTYILDAAEEAGLDLPYSCRAGACSTCAGKITAGSVDQSDQSFLDDDQIE 71
>sp|P27320|FER_SYNY3 FERREDOXIN.
Length = 96
Score = 146 (66.6 bits), Expect = 1.7e-14, P = 1.7e-14
Identities = 25/56 (44%), Positives = 37/56 (66%)
Query: 39 TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVE 94
++E ++ YIL+AAE G D P+SCRAGAC+ CA + G +D Q L D+++E
Sbjct: 15 SIECSDDTYILDAAEEAGLDLPYSCRAGACSTCAGKITAGSVDQSDQSFLDDDQIE 70
WARNING: HSPs involving 92 database sequences were not reported due to the
limiting value of parameter B = 50.
Parameters:
V=100
B=50
H=1
-qtype
-ctxfactor=1.00
E=10
Query ----- As Used ----- ----- Computed ----
Frame MatID Matrix name Lambda K H Lambda K H
+0 0 BLOSUM62 0.316 0.136 0.401 same same same
Query
Frame MatID Length Eff.Length E S W T X E2 S2
+0 0 128 128 10. 59 3 11 22 0.22 31
Statistics:
Query Expected Observed HSPs HSPs
Frame MatID High Score High Score Reportable Reported
+0 0 63 (28.7 bits) 681 (310.4 bits) 205 84
Query Neighborhd Word Excluded Failed Successful Overlaps
Frame MatID Words Hits Hits Extensions Extensions Excluded
+0 0 3526 13602898 2826310 10729133 47445 547
Database: Non-redundant PDB+SwissProt+SPupdate+PIR+GenPept+GPupdate
Release date: 6:03 AM EST Feb 3, 1996
Posted date: 6:04 AM EST Feb 3, 1996
# of letters in database: 51,502,515
# of sequences in database: 173,745
# of database sequences satisfying E: 142
No. of states in DFA: 531 (52 KB)
Total size of DFA: 91 KB (128 KB)
Time to generate neighborhood: 0.02u 0.01s 0.03t Real: 00:00:00
No. of processors used: 8
Time to search database: 62.74u 1.36s 64.10t Real: 00:00:13
Total cpu time: 62.83u 1.47s 64.30t Real: 00:00:13
WARNINGS ISSUED: 2
BLAST format
BLITZ
TITLE HALHA FER
SEQ
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
DYLQNRVI
BLITZ output
Search started: Sat Feb 3 21:12:49 1996
MPsrch: Version 1.5 - Shane S. Sturrock & John F. Collins 1993.
Biocomputing Research Unit, University of Edinburgh, UK.
Execution: MasPar time 34.07 Seconds at EMBL, Heidelberg, Germany
65.325 Million cell updates/sec
Title: HALHA
Description: FER
Sequence: 1 PTVEYLNYETLDDQGWDMDD..........DEVKIVYNAKHLDYLQNRVI 128
Parameters: swissprot (49340 seqs, 17385503 residues)
PAM 120; Penalty 13; Perfect Score 1009; Align 30
Predicted No. is the number of results expected by chance to have a score
greater than or equal to the score of the result being printed, and is
derived by analysis of the total score distribution which gave:
Statistics: Mean 41.196; Variance 64.575; scale 0.638
No. Score %Match Length ID Description Pred. No.
--------------------------------------------------------------------------------
1 1009 100.0 128 FER_HALHA FERREDOXIN. 0.00e+00
2 877 86.9 128 FER_HALSP FERREDOXIN. 0.00e+00
3 291 28.8 98 FER_SYNP4 FERREDOXIN. 0.00e+00
4 281 27.8 98 FER1_ANAVA FERREDOXIN I. 0.00e+00
5 279 27.7 98 FER1_ANASP FERREDOXIN I. 0.00e+00
6 277 27.5 97 FER_SYNSP FERREDOXIN. 0.00e+00
7 275 27.3 98 FER_NOSMU FERREDOXIN. 6.91e-41
8 274 27.2 96 FER_SYNLI FERREDOXIN. 6.91e-41
9 271 26.9 98 FER2_NOSMU FERREDOXIN II. 4.84e-40
10 261 25.9 98 FER1_NOSMU FERREDOXIN I. 1.06e-37
11 257 25.5 95 FER_MARPO FERREDOXIN. 9.22e-37
12 256 25.4 96 FER_EUGVI FERREDOXIN. 1.58e-36
13 256 25.4 98 FER_CHLFR FERREDOXIN. 1.58e-36
14 252 25.0 98 FER1_SYNP7 FERREDOXIN I. 1.37e-35
15 251 24.9 98 FER_GALSU FERREDOXIN. 2.34e-35
16 249 24.7 98 FER_MASLA FERREDOXIN. 6.86e-35
17 248 24.6 98 FER1_CYAPA FERREDOXIN I. 1.17e-34
18 243 24.1 97 FER_RHOPL FERREDOXIN. 1.71e-33
19 242 24.0 98 FER_SPIPL FERREDOXIN. 2.92e-33
20 242 24.0 97 FER2_SPIOL FERREDOXIN II. 2.92e-33
21 242 24.0 95 FER1_EQUTE FERREDOXIN I. 2.92e-33
22 240 23.8 98 FER_PORUM FERREDOXIN. 8.49e-33
23 239 23.7 96 FER_SYNY3 FERREDOXIN. 1.45e-32
24 239 23.7 98 FER_SPIMA FERREDOXIN. 1.45e-32
25 239 23.7 95 FER1_EQUAR FERREDOXIN I. 1.45e-32
26 238 23.6 96 FER_SYNY4 FERREDOXIN. 2.47e-32
27 236 23.4 95 FER2_DUNSA FERREDOXIN II. 7.15e-32
28 234 23.2 97 FER1_APHFL FERREDOXIN I. 2.07e-31
29 233 23.1 96 FER1_PHYES FERREDOXIN I. 3.52e-31
30 232 23.0 96 FER1_SYNP2 FERREDOXIN I. 5.97e-31
31 232 23.0 95 FER_GLEJA FERREDOXIN. 5.97e-31
32 232 23.0 98 FER_BRYMA FERREDOXIN. 5.97e-31
33 230 22.8 96 FER1_PHYAM FERREDOXIN I. 1.72e-30
34 229 22.7 147 FER1_SPIOL FERREDOXIN I PRECURSOR. 2.92e-30
35 228 22.6 96 FER_SCEQU FERREDOXIN. 4.95e-30
36 227 22.5 97 FER_CYACA FERREDOXIN. 8.40e-30
37 226 22.4 98 FER2_RAPSA FERREDOXIN, ROOT R-B2. 1.42e-29
38 225 22.3 96 FER1_ORYSA FERREDOXIN I. 2.41e-29
39 225 22.3 98 FER1_RAPSA FERREDOXIN, ROOT R-B1. 2.41e-29
40 224 22.2 143 FER_WHEAT FERREDOXIN PRECURSOR. 4.08e-29
41 223 22.1 98 FER_BUMFI FERREDOXIN. 6.90e-29
42 223 22.1 135 FER5_MAIZE FERREDOXIN V PRECURSOR. 6.90e-29
43 221 21.9 95 FER1_DUNSA FERREDOXIN I. 1.97e-28
44 218 21.6 126 FER_CHLRE FERREDOXIN PRECURSOR. 9.49e-28
45 218 21.6 148 FER_ARATH FERREDOXIN PRECURSOR. 9.49e-28
46 218 21.6 146 FER_SILPR FERREDOXIN PRECURSOR. 9.49e-28
47 217 21.5 152 FER3_MAIZE FERREDOXIN III PRECURSOR. 1.60e-27
48 216 21.4 96 FER_LEUGL FERREDOXIN. 2.70e-27
49 215 21.3 96 FER_APHSA FERREDOXIN I. 4.55e-27
50 215 21.3 96 FER3_RAPSA FERREDOXIN, LEAF L-A. 4.55e-27
RESULT 1 Score 1009; Match 0.0%; Predicted No. 0.00e+00;
ID FER_HALHA STANDARD; PRT; 128 AA.
DE FERREDOXIN.
Matches 128; Mismatches 0; Partials 0; Indels 0; Gaps 0;
************************************************************
Db 1 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 60
Qy 1 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 60
************************************************************
Db 61 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
Qy 61 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
********
Db 121 DYLQNRVI 128
Qy 121 DYLQNRVI 128
RESULT 2 Score 877; Match 0.0%; Predicted No. 0.00e+00;
ID FER_HALSP STANDARD; PRT; 128 AA.
DE FERREDOXIN.
Matches 108; Mismatches 11; Partials 9; Indels 0; Gaps 0;
********* .** **** ***.* *.* ** ****..** *****************
Db 1 PTVEYLNYEVVDDNGWDMYDDDVFGEASDMDLDDEDYGSLEVNEGEYILEAAEAQGYDWP 60
Qy 1 PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP 60
************.** **.***************.*.********* *************
Db 61 FSCRAGACANCAAIVLEGDIDMDMQQILSDEEVEDKNVRLTCIGSPDADEVKIVYNAKHL 120
Qy 61 FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL 120
********
Db 121 DYLQNRVI 128
Qy 121 DYLQNRVI 128
RESULT 3 Score 291; Match 0.0%; Predicted No. 0.00e+00;
ID FER_SYNP4 STANDARD; PRT; 98 AA.
DE FERREDOXIN.
Matches 38; Mismatches 17; Partials 15; Indels 1; Gaps 1;
*.**.. ****. ** .* * *.*******. **. .****** * * *...* *
Db 17 TIEVPDDEYILDVAEEEGIDLPYSCRAGACSTCAGKIKEGEIDQSDQSFLDDDQIEAGYV 76
Qy 39 TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDV 98
***.. **.*
Db 77 -LTCVAYPASD 86
Qy 99 RLTCIGSPAAD 109
RESULT 4 Score 281; Match 0.0%; Predicted No. 0.00e+00;
ID FER1_ANAVA STANDARD; PRT; 98 AA.
DE FERREDOXIN I.
Matches 38; Mismatches 19; Partials 16; Indels 2; Gaps 2;
*..*.. ****.*** **** *********. **. . * .* * * *...* *
Db 17 TIDVPDDEYILDAAEEQGYDLPFSCRAGACSTCAGKLVSGTVDQSDQSFLDDDQIEAGYV 76
Qy 39 TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDV 98
***.. *..* * *
Db 77 -LTCVAYPTSD-VTI 89
Qy 99 RLTCIGSPAADEVKI 113
RESULT 5 Score 279; Match 0.0%; Predicted No. 0.00e+00;
ID FER1_ANASP STANDARD; PRT; 98 AA.
DE FERREDOXIN I.
Matches 37; Mismatches 19; Partials 15; Indels 1; Gaps 1;
.**.. ****.*** **** *********. **. . * .* * * *...* *
Db 18 IEVPDDEYILDAAEEQGYDLPFSCRAGACSTCAGKLVSGTVDQSDQSFLDDDQIEAGYV- 76
Qy 40 MEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVR 99
***.. *..* *
Db 77 LTCVAYPTSDVV 88
Qy 100 LTCIGSPAADEV 111
RESULT 6 Score 277; Match 0.0%; Predicted No. 0.00e+00;
ID FER_SYNSP STANDARD; PRT; 97 AA.
DE FERREDOXIN.
Matches 39; Mismatches 21; Partials 15; Indels 2; Gaps 2;
** . *..*.* ****. ** ** * *********. **. . ***.* * * *..
Db 11 DGSE-TTIDVPEDEYILDVAEEQGLDLPFSCRAGACSTCAGKLLEGEVDQSDQSFLDDDQ 69
Qy 33 DGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEE 92
. ** ***.. * .*
Db 70 I-EKGFVLTCVAYPRSD 85
Qy 93 VEEKDVRLTCIGSPAAD 109
RESULT 7 Score 275; Match 0.0%; Predicted No. 6.91e-41;
ID FER_NOSMU STANDARD; PRT; 98 AA.
DE FERREDOXIN.
Matches 36; Mismatches 19; Partials 16; Indels 1; Gaps 1;
.**.. ****.*** .*** *********. **. . * .* * * *...* *
Db 18 IEVPDDEYILDAAEEEGYDLPFSCRAGACSTCAGKLVSGTVDQSDQSFLDDDQIEAGYV- 76
Qy 40 MEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVR 99
***.. *..* *
Db 77 LTCVAYPTSDVV 88
Qy 100 LTCIGSPAADEV 111
..................................................
...............Material Deleted ..................
..................................................
RESULT 29 Score 233; Match 0.0%; Predicted No. 3.52e-31;
ID FER1_PHYES STANDARD; PRT; 96 AA.
DE FERREDOXIN I.
Matches 32; Mismatches 22; Partials 19; Indels 2; Gaps 2;
*.. .. *.*.*** * * *.*****.*..**. * * .* . * * *...* *
Db 15 TIDCPDDTYVLDAAEEAGLDLPYSCRAGSCSSCAGKVTAGTVDQEDQSFLDDDQIEAGFV 74
Qy 39 TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDV 98
***.. * .* * *
Db 75 -LTCVAYPKGD-VTI 87
Qy 99 RLTCIGSPAADEVKI 113
RESULT 30 Score 232; Match 0.0%; Predicted No. 5.97e-31;
ID FER1_SYNP2 STANDARD; PRT; 96 AA.
DE FERREDOXIN I.
Matches 33; Mismatches 24; Partials 14; Indels 2; Gaps 2;
. .. ****..* *** * *******. **. . * .* * * *...* * *
Db 17 DAPDDEYILDSAGDAGYDLPASCRAGACSTCAGKIVSGTVDQSEQSFLDDDQIEAGYV-L 75
Qy 41 EVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRL 100
***. * .* * *
Db 76 TCIAYPQSD-VTI 87
Qy 101 TCIGSPAADEVKI 113
Search completed: Sat Feb 3 21:13:27 1996
BLITZ format
BLAZE
FLASH
BLOSUM 62
BEGIN
>HALHA FER # mandatory title
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWP
FSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHL
DYLQNRVI
1
A PAM or BLOSUM matrix can be chosen and other options
include ALIGNMENTS n (sets the number of alignments to
report, up to 10000), THRESHOLD n (score must be greater than n
to be reported [30]), PENALTY, VERBOSE, SEQUENCES,
KEY XMATCH, SOURCE PROTEIN, TARGET SP.
The query sequence must be less than 1500 characters.
FLASH output
ELAPSED WALL TIME: 18.0 secs
TOTAL CPU TIME OVER ALL SERVERS: 13.4 secs
Alignments for sequence: HALHA-FER--#-MANDATORY-TITLE
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGACAN
CASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRVI
Score Matrix: BLOSUM62
Max Reported Alignments: 10000
Score Threshold At: 30
Id Label NRes Score NrmSc Match% Peak NrmPeak
----------------------------------------------------------------------------
0. FER_HALHA 128 677 127 100% 125 210
1. FER_HALSP 128 579 106 84% 125 209
2. FER_SYNSP 87 205 39 48% 90 158
3. FER_SYNP4 71 199 37 52% 90 157
4. FER_SYNLI 71 196 36 52% 90 158
5. FER1_ANAVA 71 189 34 49% 98 179
6. FER1_ANASP 72 189 35 50% 98 179
7. FER_NOSMU 72 186 34 48% 95 173
8. FER1_NOSMU 71 180 33 46% 86 149
9. FER_CHLFR 78 178 34 44% 81 151
10. FER1_SYNP7 71 178 34 47% 87 163
11. FER1_CYAPA 71 178 33 49% 87 163
12. FER1_CYACA 71 176 32 46% 87 163
13. FER2_NOSMU 77 174 32 42% 80 148
14. FER1_RAPSA 94 173 32 41% 81 151
15. FER_EUGVI 71 172 32 46% 90 157
16. FER_MASLA 86 171 31 44% 81 151
17. FER2_SPIOL 70 171 31 48% 84 155
18. FER_MARPO 70 168 31 45% 79 141
19. FER_SYNY4 77 167 32 42% 81 151
20. FER1_EQUTE 72 167 30 48% 93 167
21. FER_SPIPL 71 166 30 43% 81 151
22. FER2_RAPSA 81 166 31 41% 81 151
23. FER_SYNY3 71 165 30 42% 81 151
24. FER_SPIMA 71 165 30 43% 81 151
25. FER_RHOPL 70 165 30 44% 87 154
26. FER1_EQUAR 72 165 30 48% 93 167
27. FER_PORUM 69 163 30 46% 80 150
28. FER_GLEJA 71 163 30 42% 78 141
29. FER_SCEQU 71 162 29 43% 82 151
30. FER1_APHFL 71 162 29 45% 81 151
................................................................................
...............................Material deleted.................................
................................................................................
112. VG56_HSVI1 26 31 7 34% 26 60
113. NUPL_XENLA 23 31 6 26% 30 64
114. YJAC_ECOLI 32 30 7 31% 26 60
-----------------------------------------------------------------------------
1. FER_HALHA
Abs. Alignment: 2934561
N. Residues: 128
Sequence Score: 677
Normalized Score: 127
Score/Residues: 5.28906
Exact Matches: 128
Exact Match%: 100
Conservative Matches: 0
Conservative Matches%: 0
Total Matches%: 100
Mismatches: 0
Peak Score: 125
Normalized Peak Score: 21
1 21 41 61
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGACAN
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGACAN
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGACAN
1 21 41 61
71
CASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRV
CASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRV
CASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRV
71
-----------------------------------------------------------------------------
2. FER_HALSP
Abs. Alignment: 2934700
N. Residues: 128
Sequence Score: 579
Normalized Score: 106.883
Score/Residues: 4.52344
Exact Matches: 108
Exact Match%: 84
Conservative Matches: 10
Conservative Matches%: 7
Total Matches%: 92
Mismatches: 10
Peak Score: 125
Normalized Peak Score: 21
1 21 41 61
PTVEYLNYETLDDQGWDMDDDDLFEKAADAGLDGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGACAN
PTVEYLNYE +DD GWDM DDD+F +A+D LD EDYG++EV EGEYILEAAEAQGYDWPFSCRAGACAN
PTVEYLNYEVVDDNGWDMYDDDVFGEASDMDLDDEDYGSLEVNEGEYILEAAEAQGYDWPFSCRAGACAN
1 21 41 61
71
CASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAADEVKIVYNAKHLDYLQNRV
CA+IV EG+IDMDMQQILSDEEVE+K+VRLTCIGSP ADEVKIVYNAKHLDYLQNRV
CAAIVLEGDIDMDMQQILSDEEVEDKNVRLTCIGSPDADEVKIVYNAKHLDYLQNRV
71
-----------------------------------------------------------------------------
3. FER_SYNSP
Abs. Alignment: 2937680
N. Residues: 87
Sequence Score: 205
Normalized Score: 39.7063
Score/Residues: 2.35632
Exact Matches: 42
Exact Match%: 48
Conservative Matches: 14
Conservative Matches%: 16
Total Matches%: 64
Mismatches: 31
Peak Score: 90
Normalized Peak Score: 15.8253
33 53 73 93
DGEDYGTMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTC
DG + T++V E EYIL+ AE QG D PFSCRAGAC+ CA + EGE+D Q L D+++E K LTC
DGSE-TTIDVPEDEYILDVAEEQGLDLPFSCRAGACSTCAGKLLEGEVDQSDQSFLDDDQIE-KGFVLTC
11 30 50 70
103
IGSPAADEVKIVYNAKH
+ P +D KI+ N +
VAYPRSD-CKILTNQEE
79
-----------------------------------------------------------------------------
4. FER_SYNP4
Abs. Alignment: 2937572
N. Residues: 71
Sequence Score: 199
Normalized Score: 37.642
Score/Residues: 2.80282
Exact Matches: 37
Exact Match%: 52
Conservative Matches: 12
Conservative Matches%: 16
Total Matches%: 69
Mismatches: 22
Peak Score: 90
Normalized Peak Score: 15.75
39 59 79 99
TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAA
T+EV + EYIL+ AE +G D P+SCRAGAC+ CA +KEGEID Q L D+++E LTC+ PA+
TIEVPDDEYILDVAEEEGIDLPYSCRAGACSTCAGKIKEGEIDQSDQSFLDDDQIE-AGYVLTCVAYPAS
17 37 57 76
109
D
D
D
86
-----------------------------------------------------------------------------
5. FER_SYNLI
Abs. Alignment: 2937463
N. Residues: 71
Sequence Score: 196
Normalized Score: 36.542
Score/Residues: 2.76056
Exact Matches: 37
Exact Match%: 52
Conservative Matches: 11
Conservative Matches%: 15
Total Matches%: 67
Mismatches: 23
Peak Score: 90
Normalized Peak Score: 15.8253
39 59 79 99
TMEVAEGEYILEAAEAQGYDWPFSCRAGACANCASIVKEGEIDMDMQQILSDEEVEEKDVRLTCIGSPAA
T++V E EYIL+ AE QG D PFSCRAGAC+ CA + EGE+D Q L D+++E K LTC+ P +
TIDVPEDEYILDVAEEQGLDLPFSCRAGACSTCAGKLLEGEVDQSDQSFLDDDQIE-KGFVLTCVAYPRS
15 35 55 74
109
D
D
D
84
.................................................................
................Lots of Material deleted.........................
.................................................................
-----------------------------------------------------------------------------
114. NUPL_XENLA
Abs. Alignment: 5912326
N. Residues: 23
Sequence Score: 31
Normalized Score: 6.73889
Score/Residues: 1.34783
Exact Matches: 6
Exact Match%: 26
Conservative Matches: 8
Conservative Matches%: 34
Total Matches%: 60
Mismatches: 9
Peak Score: 30
Normalized Peak Score: 6.40556
81 101 32
DMDMQQILSDEEVEEKDVRLTCI
+ ++ +I++ EE EK V + +
EFNIVEIVTQEEGAEKSVPIATL
59 79 2
-----------------------------------------------------------------------------
115. YJAC_ECOLI
Abs. Alignment: 10405998
N. Residues: 32
Sequence Score: 30
Normalized Score: 7.21032
Score/Residues: 0.9375
Exact Matches: 10
Exact Match%: 31
Conservative Matches: 5
Conservative Matches%: 15
Total Matches%: 46
Mismatches: 17
Peak Score: 26
Normalized Peak Score: 6.09365
86 106 -2
QILSDEEVEEKDVRLTCIGSPAADEVKIVYNA
+ LS + V + + AD +KIV NA
KFLSAKNRTSSHVLYHVMANGDADMLKIVLNA
362 382 277
FLASH format
BLOCKS
> Ferredoxin
GIDPNYRTHKPVVGDSSGHKIYGPVESPKVLGVHGTIVGVDFDLCIADGSCITACPVNVF
QWYETPGHPASEKKADPVNQQACIFCMACVNVCPVAAIDVKPP
BLOCKS output
Query=Ferredoxin ,
Size=103 Amino Acids
Database=mats.dat, Blocks Searched=3179
1.------------------------------------------------------------------------
Block Rank Frame Score Strength Location (aa) Description
BL00198 1 0 1271 1239 83- 94 4Fe-4S ferredoxins, iron
BL00198 2 0 1121 1239 45- 56 4Fe-4S ferredoxins, iron
1271=99.91th percentile of anchor block scores for shuffled queries
P not calculated for single block BL00198
Maximum number of repeats (from Prosite MAX-REPEAT) = 4
1 non-overlapping repeats in support of BL00198
BL00198 <-> (7,440):82
FER_SULAC 83 CIFCMACVNVCP
||||||||||||
Ferredoxin 83 CIFCMACVNVCP
45 ciadgscitacp
2.------------------------------------------------------------------------
Block Rank Frame Score Strength Location (aa) Description
BL00596A 3 0 1020 1367 70- 89 High potential iron-sulf
BL00596A 60 0 943 1367 73- 92 High potential iron-sulf
1020=13.70th percentile of anchor block scores for shuffled queries
P not calculated for single block BL00596A
|--- 22 amino acids---|
BL00596 AAAAAAAAAAAAAAAAAAAAAAA::::::::::::::::::::........BBBBBBBBB
Ferredoxin AAAAAAAAAAAAAAAAAAAAAAA
Ferredoxin < AAAAAAAAAAAAAAAAAAAAAAA
BL00596A <->A (8,29):69
HPI1_ECTVA 21 ASVDHPSHAAGQKCINCLLY
|| | || |
Ferredoxin 70 ASeKkAdPVNqQaCIfCMac
3.------------------------------------------------------------------------
Block Rank Frame Score Strength Location (aa) Description
BL00590C 4 0 1006 1825 51- 100 LIF / OSM family protein
1006=2.56th percentile of anchor block scores for shuffled queries
P not calculated for single block BL00590C
|--- 93 amino acids---|
BL00590 AAAAAAAAAAAAAAA...........BBBBBBBBBBBBB.......CCCCCCCCCCCCC
Ferredoxin :::::::::::::CCCCCCCCCCCCC
BL00590C <->C (119,193):50
LIF_HUMAN 153 CRLCSKYHVGHVDVTYGPDTSGKDVFQKKKLGCQLLGKYKQIIAVLAQAF
| | | | | | | | |
Ferredoxin 51 CiTAcPvnVfqwyeTPGhPaSeKkAdpvnqqaCiFcmacvnVcpVaAidv
4.------------------------------------------------------------------------
Block Rank Frame Score Strength Location (aa) Description
BL00987B 5 0 1005 2124 1- 45 6-pyruvoyl tetrahydropte
BL00987D 320 0 904 2010 26- 75 6-pyruvoyl tetrahydropte
1005=2.10th percentile of anchor block scores for shuffled queries
P not calculated for single block BL00987B
|--- 60 amino acids---|
BL00987 AAAA:::BBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCDDDDDDDDDDDDDDDDDDDDD
Ferredoxin BBBBBBBBBBBBBBBBBBB
Ferredoxin DDDDDDDDDDDDDDDDDDDDD
BL00987B <->B (17,18):0
PTPS_RAT 18 SFSASHRLHSPSLSAEENLKVFGKCNNPNGHGHNYKVVVTIHGEI
| | | | | | | |
Ferredoxin 1 gidpnyRtHkPvvgDssghKiyGpvesPkvlGvhgtiVgvdfdlc
5.------------------------------------------------------------------------
Block Rank Frame Score Strength Location (aa) Description
BL00144A 6 0 1001 1327 30- 39 Asparaginase / glutamina
1001=0.44th percentile of anchor block scores for shuffled queries
P not calculated for single block BL00144A
|--- 103 amino acids---|
BL00144 AA:::::::::::::::..BBBBB:........................CCCCCCCCCCC
Ferredoxin AA
BL00144A <->A (5,57):29
ASG1_YEAST 58 ILGTGGTIAS
|| |||
Ferredoxin 30 VLGvhGTIvG
6.------------------------------------------------------------------------
Block Rank Frame Score Strength Location (aa) Description
BL00261A 7 0 1001 1521 22- 53 Glycoprotein hormones be
BL00261A 114 0 927 1521 26- 57 Glycoprotein hormones be
BL00261B 213 0 912 1614 41- 83 Glycoprotein hormones be
BL00261B 381 0 900 1614 51- 93 Glycoprotein hormones be
BL00261B 397 0 899 1614 54- 96 Glycoprotein hormones be
1001=0.44th percentile of anchor block scores for shuffled queries
P< 0.053 for BL00261B in support of BL00261A
|--- 39 amino acids---|
BL00261 AAAAAAAAAAAAAAAAAAAA::::::......BBBBBBBBBBBBBBBBBBBBBBBBBBB
Ferredoxin AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBB
Ferredoxin < AAAAAAAAAAAAAAAAAAAA
Ferredoxin < BBBBBBBBBBBBBBBBBBBBBBBBBBB
Ferredoxin < BBBBBBBBBBBBBBBBBBBBBBBBBBB
BL00261A <->A (2,55):21
GTH2_ANGAN 30 CEPINETISVEKDGCPKCLVFQTSICSGHCIT
| | | | | |||
Ferredoxin 22 YGPVEspkvLgVhGtiVgVdFdlcIadGsCIT
BL00261B A<->B (9,19):0
TSHB_ANGAN 73 TYQAVEYRTAELPGCPPHVDPRFSYPVALHCTCRACDPARDEC
| || | | ||
Ferredoxin 54 acpVnVFqWyEtPGHPaSekkadpVnqqacifCmACvnvcpva
7.------------------------------------------------------------------------
Block Rank Frame Score Strength Location (aa) Description
BL01039A 8 0 1001 1283 35- 50 Bacterial extracellular
1001=0.44th percentile of anchor block scores for shuffled queries
P not calculated for single block BL01039A
|--- 94 amino acids---|
BL01039 AAAA:::.BBBBBBBBBB::.......................................C
Ferredoxin AAAA
BL01039A <->A (18,65):34
ARTI_ECOLI 41 NQIVGFDVDLAQALCK
||| | || |
Ferredoxin 35 GtIVGvDfDLCiAdgs
7 possible hits reported
Query=>YCZ2_YEAST HYPOTHETICAL 40.1 KD PROTEIN IN HMR 3'REGION.,
Size=368 Amino Acids
Database=/data/blocks_6.0/blocks.dat, Blocks Searched=2302
1.----------------------------------------------------------------------------
Block Rank Frame Score Strength Location Description
BL00059A 1 1 1310 2439 2- 42 Zinc-containing alcohol dehyd
BL00059A 371 1 825 2439 0- 40 Zinc-containing alcohol dehyd
BL00059B 15 1 984 1967 52- 77 Zinc-containing alcohol dehyd
BL00059C 105 1 891 2795 77- 134 Zinc-containing alcohol dehyd
BL00059D 2 1 1232 2388 174- 229 Zinc-containing alcohol dehyd
1310=98.5th percentile of anchor block scores for shuffled queries
P<1.4e-06 for BL00059D BL00059B in support of BL00059A
|----- 108 residues----|
BL00059 AAAAAAAAA::.BBBBBB::........CCCCCCCCCCCCC:::...DDDDDDDDDDDDD
>YCZ2_YEAS AAAAAAAAA::BBBBBB::::::::::::::::::::::DDDDDDDDDDDDD
>YCZ2_YEAS
BLOCKS format
Getting the Block
BLOCK BL00198: 4FE4S_FERREDOXIN
4Fe-4S ferredoxins, iron-sulfur binding region signature.
[return to toc]Block introduction
[return to toc]Block BL00198 Logo (postscript viewer required)
[return to toc]Prosite data file
[return to toc]
Prosite documentation
{PDOC00176}
{PS00198; 4FE4S_FERREDOXIN}
{BEGIN}
************************************************************
* 4Fe-4S ferredoxins, iron-sulfur binding region signature *
************************************************************
Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron
transfer in a wide variety of metabolic reactions. Ferredoxins can be
divided into several subgroups depending upon the physiological nature of the
iron-sulfur cluster(s). One of these subgroups are the 4Fe-4S ferredoxins,
which are found in bacteria and which are thus often referred as
'bacterial-type' ferredoxins. The structure of these proteins [2] consists of
the duplication of a domain of twenty six amino acid residues; each of these
domains contains four cysteine residues that bind to a 4Fe-4S center.
A number of proteins have been found [3] that include one or more 4Fe-4S
binding domains similar to those of bacterial-type ferredoxins. These proteins
are listed below (references are only provided for recently determined
sequences).
- The iron-sulfur proteins of the succinate dehydrogenase and the fumarate
reductase complexes (EC 1.3.99.1). These enzyme complexes, which are
components of the tricarboxylic acid cycle, each contain three subunits: a
flavoprotein, an iron-sulfur protein, and a b-type cytochrome. The iron-
sulfur proteins contain three different iron-sulfur centers: a 2Fe-2S, a
3Fe-3S and a 4Fe-4S.
- Escherichia coli anaerobic glycerol-3-phosphate dehydrogenase (EC 1.1.99.5)
This enzyme is composed of three subunits: A, B, and C. The C subunit seems
to be an iron-sulfur protein with two ferredoxin-like domains in the N-
terminal part of the protein.
- Escherichia coli anaerobic dimethyl sulfoxide reductase. The B subunit of
this enzyme (gene dmsB) is an iron-sulfur protein with four 4Fe-4S
ferredoxin-like domains.
- Escherichia coli formate hydrogenlyase. Two of the subunits of this
oligomeric complex (genes hycB and hycF) seem to be iron-sulfur proteins
that each contain two 4Fe-4S ferredoxin-like domains.
- Methanobacterium formicicum formate dehydrogenase (EC 1.2.1.2). This enzyme
is used by the archaebacteria to grow on formate. The beta chain of this
dimeric enzyme probably binds two 4Fe-4S centers.
- Escherichia coli formate dehydrogenases N and O (EC 1.2.1.2). The beta
chain of these two enzymes (genes fdnH and fdoH) are iron-sulfur proteins
with four 4Fe-4S ferredoxin-like domains.
- Desulfovibrio periplasmic [Fe] hydrogenase (EC 1.18.99.1). The large chain
of this dimeric enzyme binds three 4Fe-4S centers, two of which are located
in the ferredoxin-like N-terminal region of the protein.
- Methanobacterium thermoautrophicum methyl viologen-reducing hydrogenase
subunit mvhB, which contains six tandemly repeated ferredoxin-like domains
and which probably binds twelve 4Fe-4S centers.
- Salmonella typhimurium anaerobic sulfite reductase (EC 1.8.1.-) [4]. Two of
the subunits of this enzyme (genes asrA and asrC) seem to both bind two
4Fe-4S centers.
- A Ferredoxin-like protein (gene fixX) from the nitrogen-fixation genes
locus of various Rhizobium species, and one from the Nif-region of
Azotobacter species.
- The 9 Kd polypeptide of chloroplast photosystem I [5] (gene psaC). This
protein contains two low potential 4Fe-4S centers, referred as the A and B
centers.
- The chloroplast frxB protein which is predicted to carry two 4Fe-4S centers.
- An ferredoxin from a primitive eukaryote, the enteric amoeba Entamobea
histolytica.
- Escherichia coli hypothetical protein yjjW, a protein with a N-terminal
region belonging to the radical activating enzymes family (see <PDOC00834>)
and two potential 4Fe-4S centers.
The pattern of cysteine residues in the iron-sulfur region is sufficient to
detect this class of 4Fe-4S binding proteins.
-Consensus pattern: C-x(2)-C-x(2)-C-x(3)-C-[PEG]
[The four C's are 4Fe-4S ligands]
-Sequences known to belong to this class detected by the pattern: the majority
of known 4Fe-4S sequences, with at least 5 exceptions.
-Other sequence(s) detected in SWISS-PROT: 14.
-Note: in some bacterial ferredoxins, one of the two duplicated domains has
lost one or more of the four conserved cysteines. The consequence of such
variations is that these domains have either lost their iron-sulfur binding
property or bind to a 3Fe-3S center instead of a 4Fe-4S center.
-Note: the last residue of this pattern in most proteins belonging to this
group, is a Pro; the only exceptions are the Rhizobium ferredoxin-like
proteins which have Gly, and two Desulfovibrio ferredoxins which have Glu. It
must also be noted that the three non 4Fe-4S-binding proteins which are
picked-up by the pattern have Gly in this position of the pattern.
-Last update: November 1995 / Text revised.
[ 1] Meyer J.
Trends Ecol. Evol. 3:222-226(1988).
[ 2] Otaka E., Ooi T.
J. Mol. Evol. 26:257-267(1987).
[ 3] Beinert H.
FASEB J. 4:2483-2492(1990).
[ 4] Huang C.J., Barrett E.L.
J. Bacteriol. 173:1544-1553(1991).
[ 5] Knaff D.B.
Trends Biochem. Sci. 13:460-461(1988).
//
[return to toc]
Blocks home
Quick Search
MATCH 90
BEST
TITLE This is an example
SEQ
AAACCATATAGGCCCTTTT
SSearch
Why you should routinely check your
sequence
With such good jokers in the world as these gentlemen are,
you
don't want to get caught by them.
Fact and fiction in alignment.