The result of connecting contigs by linking information from paired-end reads from plasmids, paired-end reads from BACs, known messenger RNAs or other sources. The contigs in a scaffold are ordered and oriented with respect to one another.

Scoring matrix（记分矩阵）

See Position-specific scoring matrix.

SEG （一种蛋白质程序低复杂性区段过滤程序）

A program for filtering low complexity regions in amino acid sequences. Residues that have been masked are represented as "X" in an alignment. SEG filtering is performed by default in the blastp subroutine of BLAST 2.0. (Wootton and Federhen)

Selectivity (in database similarity searches)（数据库相似性搜索的选择准确性）

The ability of a search method to locate members of a protein family without making a false-positive classification of members of other families.

Sensitivity (in database similarity searches)（数据库相似性搜索的灵敏性）

The ability of a search method to locate as many members of a protein family as possi-ble, including distant members of limited sequence similarity.

Sequence Tagged Site （序列标签位点）

Short cDNA sequences of regions that have been physically mapped. STSs provide unique landmarks, or identifiers, throughout the genome. Useful as a framework for further sequencing.

Significance（显著水平）

A significant result is one that has not simply occurred by chance, and therefore is prob-ably true. Significance levels show how likely a result is due to chance, expressed as a probability. In sequence analysis, the significance of an alignment score may be calcu-lated as the chance that such a score would be found between random or unrelated sequences. See Expect value.

Similarity score (sequence alignment) （相似性值）

Similarity means the extent to which nucleotide or protein sequences are related. The extent of similarity between two sequences can be based on percent sequence identity and/or conservation. In BLAST similarity refers to a positive matrix score. The sum of the number of identical matches and conservative (high scoring) substitu-tions in a sequence alignment divided by the total number of aligned sequence charac-ters. Gaps are usually ignored.

Simulated annealing

A search algorithm that attempts to solve the problem of finding global extrema. The algorithm was inspired by the physical cooling process of metals and the freezing process in liquids where atoms slow down in movement and line up to form a crystal. The algorithm traverses the energy levels of a function, always accepting energy levels that are smaller than previous ones, but sometimes accepting energy levels that are greater, according to the Boltzmann probability distribution.

Single-linkage cluster analysis

An analysis of a group of related objects, e.g., similar proteins in different genomes to identify both close and more distantrelationships, represented on a tree or dendogram. The method joins the most closely related pairs by the neighbor-joining algorithm by representing these pairs as outer branches onthe tree. More distant objects are then pro-gressively added to lower tree branches. The method is also used to predict phylogenet-ic relationships by distance methods. See also Hierarchical clustering, Neighbor-joining method.

Smith-Waterman algorithm（Smith-Waterman算法）

Uses dynamic programming to find local alignments between sequences. The key fea-ture is that all negative scores calculated in the dynamic programming matrix are changed to zero in order to avoid extending poorly scoring alignments and to assist in identifying local alignments starting and stopping anywhere with the matrix.

SNP （单核苷酸多态性）

Single nucleotide polymorphism, or a single nucleotide position in the genome sequence for which two or more alternative alleles are present at appreciable frequency (traditionally, at least 1%) in the human population.

Space or time complexity（时间或空间复杂性）

An algorithms complexity is the maximum amount of computer memory or time required for the number of algorithmic steps to solve a problem.

Specificity (in database similarity searches)（数据库相似性搜索的特异性）

The ability of a search method to locate members of one protein family, including dis-tantly related members.

SSR （简单序列重复）

Simple sequence repeat, a sequence consisting largely of a tandem repeat of a specific k-mer (such as (CA)15). Many SSRs are polymorphic and have been widely used in genetic mapping.

Stochastic context-free grammar

A formal representation of groups of symbols in different parts of a sequence; i.e., not in the same context. An example is complementary regions in RNA that will form secondary structures. The stochastic feature introduces variability into such regions.

Stringency

Refers to the minimum number of matches required within a window. See also Filtering.

STS （序列标签位点的缩写）

See Sequence Tagged Site

Substitution （替换）

The presence of a non-identical amino acid at a given position in an alignment. If the aligned residues have similar physico-chemical properties the substitution is said to be "conservative".

Substitution Matrix （替换矩阵）

A substitution matrix containing values proportional to the probability that amino acid i mutates into amino acid j for all pairs of amino acids. such matrices are constructed by assembling a large and diverse sample of verified pairwise alignments of amino acids. If the sample is large enough to be statistically significant, the resulting matrices should reflect the true probabilities of mutations occuring through a period of evolution.

ZHANGroup : BioinformaticsGlossaryS

Bioinformatics Glossary

S

Scaffold （支架，由序列重叠群拼接而成）