-
08/29/2006 - Genbank ENV addtion
By request, we have added the Genbank ENV (Environmental Sample Sequences)
database.
-
07/01/2006 - Genbank EST removal
We have removed the Genbank EST division from our databases -
both the flat files and the Blastable databases. The EST division uses
large amount of our resources, but does not get a lot of usage. Furthermore,
because of the limitations of the Biology Workbench, we cannot provide the
ability to do a lot of meaningful analysis with the EST database (maybe a
reason for its limited usage). Workbench users that want to do work with
the EST sequences can search or Blast against the EST databases at NCBI or
EBI, and then import the Fasta format EST sequences to the Biology Workbench.
-
05/12/2006 - Genbank Refseq changes
The complete Refseq database has grown too large for our system, and had to
be removed. We have replaced it with with the individual divisions (for
example, mammalian organisms, plasmids, etc.), much like the regular Genbank
Releases. Also, like in Genbank, the Refseq updates for all divisions since
the last full release are all stored in one database.
-
12/05/2005 - PIR protein database removal
We have removed the PIR protein database from our system. PIR is
now part of the UniProt consortium, and it is no longer being maintained
as a separate database. The SwissProt and TrEMBL databases should be used
as a replacement for PIR.
-
09/23/2005 - Genbank GSS and HTG removal
We have removed the GenbankGSS (Genomic Survey Sequnces) and
GenbankHTG (High Ghroughput Genomic) databases. Very few people used
these databases and sequences, and we are running out of available disk
space for our local databases.
03/24/2005 - FUNDING UPDATE:
The Biology Workbench Team learned this week that our proposal for new
development
of the Biology Workbench has been approved, and funding will begin around 4/1/2005.
The team wishes to thank all of you who use
and who wrote letters of support for the Workbench; your help and support was instrumental in our success. We encourage you to
flood us with input and suggestions as we develop the Next Generation Biology Workbench. Watch this space
for news and updates surrounding our local development and beta releases of the new site.
02/01/05 - New Genomic Databases were added:
honeybee (Apis mellifera), fly (Drosophila melanogaster)
02/01/05 - Most Genbank Bacterial Genomes databases were removed
databases, due to the large number
of them (we lack the dataspace and the manpower to keep up). If you wish
to work with a certain bacterial genome database that is no longer on the database list, please contact us:
bwbhelp@sdsc.edu
01/10/05 - Restore Session tool
We have added a tool which restores session files from the last backup
of a user's sessions. Users that have their data accidentally deleted
can use this tool to get a copy of their session files as of the previous
evening. Users that have had their sessions deleted because of account
inactivity should be be able to restore their work as of their last
Workbench visit.
07/10/03 - HMMPFAM update
The HMMER package (of which HMMPFAM is part) has been updated from version 2.2 to version 2.3.1. The new
version of HMMPFAM is significantly faster than the old one.
05/28/03 MitoProteome Protein List database
We have added a simplified version of the MitoProteome (mithochondrial protein
database)(http://www.mitoproteome.org/) - the
Protein List. It contains almost 1000 human protein sequences, found experimentally and via public database
searches.
Our version contains only the name, function, public database IDs and protein sequence information.
This database allows protein sequence comparisons with Blast, FastA and other tools.
The records are linked to the MitoProteome website (http://www.mitoproteome.org) for
more information on the
protein (disease, gene information, domains, interactions).
03/24/03 - EXTCOEF - extinction coefficient at 280 nm calculator
This tool calculates the extinction coefficient and absorption of a protein at 280 nm, based on the protein's
composition (number of Trp, Tyr and Cys residues).
The calculations are done based on the original formula
by
Gill and von Hipple (Anal. Biochem. 182, 319-326;1989):
e = 5690 x (#W) + 1280 x (#Y) + 60 x (#C).
This
formula
assumes that ALL Cys
residues appear as half cystines (i.e involved in S-S bridges). Cysteine residues do not absorb appreciably at
wavelengths >260 nm, while cystine does.
A second formula is also used, in which there is no contribution
from
the Cys residues, equivalent to the fact that none of the Cys residues appear as half cystines.
(Conditions: 6.0 M guanidium hydrochloride; 0.02 M phosphate buffer; pH 6.5)
01/10/03 - Alliance for Cellular Signalling (AfCS) protein database
We have added a simplified version of the AfCS Molecule Page database (http://www.signaling-gateway.org/).
Our version contains only the AfCS ID, synonyms, category and protein sequence information for all the AfCS
proteins that have a protein sequence defined (a few do not). The proteins are mouse, human if mouse isn't
defined, rat if neither mouse nor human are defined, and in a few cases something else (e.g. cow, Drosophila)
when the molecule couldn't be defined with mouse, human, or rat.
There is a Blastable component to this database, for proteins -- so it can be accessed with BlastP or BlastX.
The records are linked to the AfCS-Nature Signaling Gateway Molecule Page website for more information on the
protein. One needs to have an account in order to access the Molecule Pages on that website.
12/19/02 - Swissprot/TrEMBL/TrEMBLnew blastable sub-databases for human and mouse
We have added as blastable dabases subsets of Swissprot/TrEMBL/TrEMBLnew databases, that contain only human or
only mouse protein sequences. The combination of Swissprot, TrEMBL and TrEMBLnew human/mouse only subsets would
contain the vast majority of human/mouse protein sequences publicly available. This is useful when one wants to
blast against human protein sequences only, for example. The Swissprot and TrEMBL subsets have very little overlap
in sequences.
12/18/02 - PRIMERCHECK and PRIMERTM - new primer tools
PRIMERCHECK calculates the melting point, length and GC content of
a given short nucleic sequence, in particular, a primer.
GC content for any nucleic sequence can also be calculated with NASTATS tool.
PRIMERTM designs primers of minimum length, that start at the ends
of the two strands and that have a melting point above a minimum desired temperature Tm.
In both tools the sequence has to be selected from the user's data and should not contain characters other
than A,C,T,G. Salt and oligos/DNA concentrations WILL affect the calculations.
In both , the calculations are done essentially as described by Breslauer et al., in P.N.A.S.,1986
and by Rychlik et al., N.A.R., 1990. For a more complex primer program see PRIMER3.
12/16/02 - BLAST updated
The Blast package from NCBI has been updated from version 2.2.2 to version 2.2.5.
10/14/02 - Hetero-sequences removed from PDBFINDER blastable databases
We have removed the sequences that contained both nucleic and amino-acids from
the blastable databases PDBFINDER and PDBSEQRES, such as some ribosome and tRNA sequences,
or DNA comlexed with an amino acid or enzyme complexed with a nucleotide.
We have also removed the sequences that contained odd characters, like "?".
The "mixed" records and sequences are still available for a NDJINN search however.
Attention should be paid when importing complex/hetero- sequences that contain both Amino-acids
and Nucleic-acids (such as 1TTT_D,E,F; 1RGA; 2EDA; 2ARG; 1FFZ_B;
1OLD; 1M90_5;1B23_R; 2FMT_C,D; 1C95_A,B; 1FG0_B; 1FIR_A; 1KQS_4 etc.).
Note that many protein sequences from PDBFINDER (and PDBSEQRES) contain the character "X",
and that many nucleic characters have meaning as protein characters and vice-versa.
10/09/02 - Bugs fixed in PATTERNMATCH tools
We have fixed a bug in PATTERNMATCH and PATTERNMATCHDB tools that allow now
the use of the negation character "[^X]". We have added to the Help page the
note that Perl regular expression "A+" does not work (the "+" character does
not work ) - "A{1,}" should be used instead.
We have also limited the length of the regular expression
in order to avoid searches for full length CDS or protein sequences. Those
searches can be done with other tools, like Blast, and they are not the purpose
of PATTERNMATCHDB.
09/15/02 - PRODOM consensus sequences are no longer in the Non-Redundant database
The PRODOM consensus sequences are no longer included in the SDSC Non-Redundant Database since
they are not "real" protein sequences. We continue to offer PRODOM as a blastable protein database
though. The only other blastable protein databases not included in the Non-Redundant Database are
the Dictyostelium ORFs sequences (from Genomes databases).
07/19/02 - Easier sequence download added to "View" Tool
We now have made it easier to download sequences from within the
View tool (previously, one could only do that from Netscape, or they
had to use the Download function). This will allow the user to download
any sequence(s) they want, in all the formats the Biology Workbench
can interpret. This is quite useful for porting sequences to other
applications, or for backing up data. The link to get all the sequences
in text format is located at the top and bottom of the page. To get a
text file of the sequences you are viewing, right-click on the link, or
save the page that opens up when you click on the link.
A separator line (a line full of the equal sign: "=") is used to separate
multiple alignments that are downloaded. This is necessary, because
otherwise there is no easy way to designate separate alignments. Note:
the Biology Workbench cannot read in more than one alignment at a time;
trying to do so will lead to an error. If you are saving alignments
to be reloaded to the Biology Workbench later on, you will want to save
each one to an individual file.
6/28/02 - Genbank Mus musculus Genomic database
The Genbank Mus musculus Genome database has been added. It has
two components -- one for the nucleic contigs, and one for the mRNA sequences
(and their translated proteins).
04/10/02 PI - isoelectric point calculation tool
A tool to calculate the isoelectric point for proteins has been added, that has been developed at
EMBL WWW Isoelectric Point Service. Molecular weight can be calculated using AASTATS tool.
04/08/02 Genomic databases updates
Fission yeast Schizosaccharomyces pombe and parasite Encephalitozoon cuniculi genomes
and CDS/proteins from NCBI genomes have been added. As usual, we keep adding new bacterial genomes as
they become available from NCBI-Genbank.
04/02/02 FASTA scoring matrices change
Because of format problems we now offer only the scoring matrices that were distributed with
the original FASTA package (versions 2.0 and 3) (including SSEARCH, ALIGN, LALIGN, LFASTA).
02/22/02 HMMPFAM update
Pfam has now two different types of models, the glocal models ("ls" mode, in the
Pfam_ls HMM database) and Smith/Waterman models ("fs" mode, in the Pfam_fs HMM database). In
glocal mode, only full-length complete domains are found. In Smith/Waterman mode, fragmentary
domains can also be found, because fully local alignments are allowed. "ls" mode is much more
sensitive than "fs" mode, but only if a complete domain is actually present; if a partially
deleted fragment is present, "fs" mode will be needed. The two modes have different curated
cutoffs. The old flatfiles called Pfam (standard) and PfamFrag have now been deprecated.
HMMER version used has been updated to 2.2.
12/06/01 MVIEW -added choosing reference option
We have added an option to choose the sequence to be used as reference
when computing identities - by the place in the alignment, counting from
top.
12/05/01 Genomic databases updates
Arabidopsis thaliana genome and proteins have been added. Most bacterial genomes are now
mirrored as the NCBI RefSeq complete genomes. More information can be found at the current ftp site
ftp://ncbi.nlm.nih.gov/genomes in the README files. We continue to add new bacterial
genomes as they are released by Genbank.
12/04/01 SIXFRAME - added alignment option
We have added the option of displaying the nucleic sequence which is being
translated aligned with its corresponding translation. We have highlighted
the methionine (M) residues as well as the "stars" which correspond to stop codons.
When displaying the corresponding nucleic sequences for Frames 2,3 and 5,6 first
one, or two, nucleotides, respectively are ommitted in the alignment.
This is still experimental so feel free to send us your suggestions.
11/30/01 FASTA - changes in display of the results
We have changed the display of results of all fasta programs, from a
scroll-down list to a table with checkboxes. This should
make it easier to select the sequences.
We have also provided links between the sequence in the table and the
corresponding alignment, making it easier to get to the desired alignment
down the page and back.
11/12/01 PDBSEQRES - a database derived from PDB and PDB-FINDER databases
We have created a database based on the PDBFINDER database records, in which
the sequences are derived from PDB. The database contains only those records from
PDBFINDER for which there is a sequence with the same ID in PDB. The sequences for
those records are then taken from the updated listing of all PDB sequences in FASTA
format (therefore there may be slight differences betweeen sequences with the same ID
in PDBSEQRES and in PDBFINDER).
10/22/01 - Ndjinn - changes in display of the results
We have changed the display of Ndjinn giving the user the choice of using
either a full display mode with checkboxes or a compact display mode with a
scroll-down list. The default is now full display mode with checkboxes and
none of the matches are selected. This should make it easier to view/identify
and select the sequences since in a scroll-down list the names of the sequences
are sometimes truncated. The compact display could be more useful when one needs
to select a lot of sequences or for a quick view of all the hits.
10/12/01 - BLAST and CLUSTALW - changes in display of the results
We have changed the display of results of all blast programs, including
RPSBLAST, from a scroll-down list to a table with checkboxes. This should
make it easier to select the sequences.
We have also provided links between the sequence in the table and the
corresponding alignment, making it easier to get to the desired alignment
down the page and back. In the case of PSI-BLAST, the links are provided
only for the last iteration; we have also added links betweent the hit summary
table and the different iterations resuts. For RPSBLAST we have provided links
between the domains in the results table and the corresponding NCBI Conserved Domain
Database entries.
For CLUSTALW we have changed only the display of the selected sequences,
providing both FASTA and Workbench labels. This should make it easier to
identify the sequences in the alignment.
9/07/01 - RPS-BLAST addition
The RPSBLAST tool has been added to the Protein Tools.
RPS-BLAST (Reverse PSI-BLAST) searches a query protein sequence against a
database of profiles. Select All or one search database of the Conserved
Domain Databases which currently contains domains derived from Smart and
Pfam, and alignments from the LOAD-database (Library Of Ancient Domains).
8/25/01 - ProDom addition
The ProDom protein domain database has been added as a searchable and
Blastable database. Select "ProDom" from the list of protein databases
in any tool that compares a sequence to protein databases (for example,
BlastP) to do a sequence search. The ProDom Blastable database is made
up of the consensus sequences from each ProDom family.
8/15/01 - ClustalW optimization
ClustalW has been optimized, and you may notice faster performance, especially
for those alignments that take a longer time to complete.
8/13/01 - Genomic databases
We have converted our genomic databases into a more usable format.
These genomic databases have two components: "genome" and "CDS and proteins".
The genome component is the entire Genbank record
pertaining to a large sequence fragment - usually a chromosome. The CDS and
proteins database contains all the protein sequences identified in the genome,
and all the CDS regions that code for those proteins. The CDS and proteins
database is of more use to most people using the Biology Workbench.
Importing the large nucleic sequences from the "genome" database is
quite dangerous, and often can lead to a crashed session - users should
avoid doing this (see the FAQ for more
details).
7/31/01 - TMAP
TMAP has been updated to edition 55, and a single-sequence version of
TMAP edition 52 has been added. Now, one can use TMAP with individual
protein sequences, though the author still recommends using alignments
when possible to get more accurate results.
7/17/01 - Blast update
The Blast programs have been updated to version 2.2.1
6/22/01 - Account time limit implemented
Due to extreme growth, it has become necessary to implement an account
lifetime. If an account has not been accessed in 6 months, all of its data
may be deleted. The username/password combination will still be reserved,
though.
6/22/01 - Genpept split into 2 databases
Genpept has been split into 2 databases: the full release and updates.
The full release contains the gene products in the last full Genbank
release, and "updates" contains the gene products in the Genbank
updates (Genbank New).
6/14/01 - Non-redundant database changes
The Biology Workbench non-redundant protein database has had a few changes.
The Blastable file now contains text information, which in most cases should
give an idea of what the sequence represents. Also, we no longer display it
as a separate choice from the database selection menu. Some people found
this confusing, so we consider it a separate database.
Selecting the non-redundant database *and* any other database would be
pointless, as the non-redundant database contains information from all
the Blastable databases.
4/1/01 - Genbank Indexing
All the Genbank and Genpept databases have been reindexed by GI number.
Though this doesn't effect any data on a cosmetic level, if you try to
use "View Database Records of Imported Sequences" on any sequences from
those databases that were imported before this change, the function
will not work.
3/17/01 - Genbank Homo Sapiens Genome
The Genbank Homo Sapiens Genome database has been reformatted. It now has
two components -- one for the nucleic contigs, and one for the mRNA sequences
(and their translated proteins).
MVIEW
We have added Mview, a program for producing color-coded HTML views
of sequence alignments.
TMHMM
We have added TMHMM, a program for finding transmembrane regions in single
protein sequences. Previously, one could only do this with TMAP -- a
program that worked only with alignments.
BLAST
The number of Blastable databases in Genbank has been greatly reduced,
by combining the smaller files into larger ones. This should make it
considerably more convenient for the users to Blast against Genbank
(for examples, the 89 EST sections are now represented by only 4
Blastable databases).
All of the Blast programs have been updated to version 2.1.2. The problem
with PSIBLAST crashing when multiple databases are selected seems to have
gone away.
Clustal W
We updated ClustalW to version 1.81, and now use Phylip to display the
guide trees as images (if desired).
Fasta
The major Fasta tools have been updated to version 3.307b.
This includes FASTA, TFASTA, TFASTX, TFASTY, FASTX, FASTY, and SSEARCH.
Internet Explorer bug fixed
Some of our HTML pages were causing Microsoft Internet Explorer version 5.5
to crash. We were able to find the errors in our HTML that led to the crash,
and they have been fixed. Internet Explorer shouldn't be so sensitive that
bad HTML makes it crash, but it was.
HMMPFAM and BLIMPS have been added
Two utilities for comparing protein sequences to motif databases have been
added to the Biology Workbench. HMMPFAM compares a seqeunce to the Pfam
motif databases, and BLIMPS compares a sequence to the BLOCKS motif databases.
PSIBLAST
PSIBLAST has been updated with a few new output features. Users can
choose to view 1-line descriptions and alignments from every iteration or
just the last iteration. By default, 1-line descriptions are shown from
every iteration, and alignments are show for the last iteration.
Before, the users could only see the results from the last iteration.
SENSEI
Sensei has been removed from the Biology Workbench, due to extensive
memory use by the program. We will give the location of the source
code to those interested in running it themselves.
PROSITE
PPSEARCH has been added as another regex-based (i.e. text comparison)
search tool for the Prosite motif database.
The "PROSITE" script for searching the PROSITE database for a particular
sequence has been removed, as it was not giving accurate results. The
PROSEARCH and PPSEARCH programs accomplish the same task, and gives correct
results.
TeXshade
We have added a new program for coloring alignments called "TEXSHADE".
This script uses Eric Beitz's TeXshade programs (specifically, LaTeX
style sheet) to colorize alignments. It offers many additional options
to those offered by BOXSHADE, and since the postscript output is produced
by LaTeX and dvips, it is likely to be more compatible across all systems
and postscript viewers and interpretors. We suggest those that have used
BOXSHADE give this program a try.
Note that the colored alignment might not look nearly as "crisp" on the
screen as it does on paper, as the conversion of postscript files to gif
images can blur the fonts. We do think you will notice the difference
on paper, though.
Clustal W
Profile alignments are now available on the Biology Workbench, via the
CLUSTALWPROF tool. In the Alignment tools, this allows one to align two
alignments of the same type. In the Protein tools, this allows one to
align one or more protein sequences to an existing protein alignment within
the current session, and in the Nucleic tools align one or more nucleic
sequences to an existing nucleic alignment within the current session.
This will be very useful for people that want to align a few sequences to
an alignment they had already made, or to construct "super" alignments out
of smaller alignments.
Also, we now have implemented a limit on Clustal W, so that sequences or
alignments with lengths over 5,000 will not be allowed. This was added
because we have had a large number of people try to use Clustal W to align
sequence fragments to entire genomes -- an application for which Clustal W
was not intended. These jobs used a lot of our resources, and led to
alignments which for all intenets and purposes were meaningless.
FingerPRINTScan
We have added FINGERPRINTSCAN, a program which compares a protein sequence
to Fingerprints within the PRINTS motif database.
Databases
Many databases have been added to the Biology Workbench. The Ndjinn
database list shows the recent additions. The DBCAT database is a database
of databases, and can give more detailed information on a particular database.
Non-redundant protein database
We have added a non-redundant protein database that includes
all of the protein databases within the Biology Workbench. This database
uses a simple string comparison to catch redundancies, so it may not be
appropriate for certain statistical calculations, but it does greatly
enhance the ability to search.