Release 12.4
Published October 23, 2007
Headlines
More controlled vocabulary in the 'Subcellular location' subsection
Over 160'000 UniProtKB/Swiss-Prot entries (56%) contain a subcellular location description in the General Annotation section (CC lines in the flat file). We have standardized the content of these comments with the concomitant creation of a controlled vocabulary and a new, parsable flat-file format.
The subcellular location controlled vocabularies are stored in a new document (subcell.txt) which provides, for each individual UniProtKB location, topology or orientation term, the corresponding definition, as well as other relevant information, such as synonyms, hierarchies or mapped GO terms.
The format of the 'Subcellular location' subtopic has changed from free text to a more structured format. When required for the accurate description of a complex biological situation, free text is still used in the 'Note' (see for example O43918). In addition, since release 11.0, this subsection can occur more than once per entry, allowing specific annotation for each isoform, chain or peptide in separate subsections.
UniProtKB News
New document listing the controlled vocabularies used in the 'Subcellular location' subsection
The document subcell.txt, available by ftp and on the Web site, lists the controlled vocabularies used in the in the 'Subcellular location' subsection (CC SUBCELLULAR LOCATION lines in the flat file), their definitions and further information such as synonyms or relevant GO terms in the following format:
--------- ------------------------------- ----------------------------------------------
Line code Content Occurrence in an entry
--------- ------------------------------- ----------------------------------------------
ID Identifier (location) Once; starts an entry
IT Identifier (topology) Once; starts a 'topology' entry
IO Identifier (orientation) Once; starts an 'orientation' entry
AC Accession (SL-xxxx) Once
DE Definition Once or more
SY Synonyms Optional; Once or more
SL Content of subc. loc. lines Once
HI Hierarchy ('is-a') Optional; Once or more
HP Hierarchy ('part-of') Optional; Once or more
KW Associated keyword (accession) Optional; Once or more
GO Gene ontology (GO) mapping Optional; Once or more
WW Interesting links or references Optional; Once or more
// Terminator Once; ends an entry
Example:
ID Cyanelle.
AC SL-0082
DE A cyanelle is a photosynthetic organelle of glaucocystophyte algae.
DE Cyanelles are surrounded by a double membrane and, in between, a
DE peptidoglycan wall. Thylakoid membrane architecture and the presence
DE of carboxysomes are cyanobacteria-like. Historically, the term
DE cyanelle is derived from a classification as endosymbiotic
DE cyanobacteria, and thus is not fully correct.
SY Muroplast; Cyanoplast.
SL Plastid, cyanelle.
HI Plastid.
KW KW-0194
GO GO:0009842; cyanelle
//
Syntax modification of the 'Subcellular location' subtopic
We have structured the 'Subcellular location' subtopic (CC SUBCELLULAR LOCATION lines in the flat file) in order to improve the consistency of annotation and to allow to parse its content.
The new format of SUBCELLULAR LOCATION in the flat file is:
CC -!- SUBCELLULAR LOCATION:(( Molecule:)?( Location\.)+)?( Note=Free_text( Flag)?\.)?
Where:
- Molecule: Isoform, chain or peptide name
- Location =
Subcellular_location( Flag)?(; Topology( Flag)?)?(; Orientation( Flag)?)?- Subcellular_location: SL-line of subcell.txt ID-record
- Topology: SL-line of subcell.txt IT-record
- Orientation: SL-line of subcell.txt IO-record
- Flag =
\(By similarity|Probable|Potential\)
Note: Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 1 or more times (+). Alternative values are separated by a pipe symbol (|).
Examples:
P32755:
CC -!- SUBCELLULAR LOCATION: Cytoplasm. Endoplasmic reticulum membrane;
CC Peripheral membrane protein. Golgi apparatus membrane; Peripheral
CC membrane protein.
Q96QV1:
CC -!- SUBCELLULAR LOCATION: Cell membrane; Peripheral membrane protein
CC (By similarity). Secreted (By similarity). Note=The last 22 C-
CC terminal amino acids may participate in cell membrane attachment.
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm (Probable).
P35670:
CC -!- SUBCELLULAR LOCATION: Golgi apparatus, trans-Golgi network
CC membrane; Multi-pass membrane protein (By similarity).
CC Note=Predominantly found in the trans-Golgi network (TGN). Not
CC redistributed to the plasma membrane in response to elevated
CC copper levels.
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm.
CC -!- SUBCELLULAR LOCATION: WND/140 kDa: Mitochondrion.
Modification of the EC (Enzyme Commission) number format
EC numbers are used to describe enzyme reactions and are based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). The EC numbers and the reactions they describe are stored in the ENZYME and IntEnz databases.
In the UniProt Knowledgebase some enzymes are assigned so-called partial EC numbers where part of the numbers are replaced by dashes (e.g. EC 3.4.24.-). This happens in the following situations:
- The catalytic activity of the protein is not known exactly.
- The protein catalyzes a reaction that is known, but not yet included in the IUBMB EC list.
To distinguish these two meanings, we have started to use the letter 'n' with a preliminary number instead of a dash '-' for the latter case. The retrofit of those existing EC numbers of proteins in UniProtKB that catalyze a reaction that is known, but not yet included in the IUBMB EC list will be an ongoing process.
Examples:
The catalytic activity of the protein is not known exactly:
Q9VAC5:
DE ADAM 17-like protease precursor (EC 3.4.24.-).
The protein catalyzes a reaction that is known, but not yet included in the IUBMB's EC list:
Q01468:
DE 4-oxalocrotonate tautomerase (EC 5.3.2.n1) (4-OT).
Q8IV42:
DE L-seryl-tRNA(Sec) kinase (EC 2.7.1.n3) (O-phosphoseryl-tRNA(Sec)
DE kinase) (PSTK).



