Published online 14 July 2008
Nucleic Acids Research, 2008, Vol. 36, No. 14 4641–4652
doi:10.1093/nar/gkn433
Revealing unique properties of the ribosome using
a network based analysis
Hilda David-Eden and Yael Mandel-Gutfreund*
Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
Received April 22, 2008; Revised June 4, 2008; Accepted June 23, 2008
ABSTRACT
INTRODUCTION
The ribosome is a large complex of proteins and ribosomal
RNA (rRNA) that is responsible for protein biosynthesis
in all organisms. The ribosome is made up of two subunits: a small subunit (30S) and a large subunit (50S). In
Escherichia coli, the small subunit consists of the 16S
rRNA (1542 nt) and 21 proteins, whereas the large subunit contains the 23S rRNA (2904 nt), the 5S rRNA
(120 nt) and 33 proteins. The small and the large ribosomal subunits associate into an active 70S complex, which
catalyzes protein synthesis. The small subunit contains the
decoding center (A-site), known to mediate the correct
*To whom correspondence should be addressed. Tel: +972 4 8293958; Fax: +972 4 8225153; Email: yaelmg@tx.technion.ac.il
ß 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from http://nar.oxfordjournals.org/ by guest on December 24, 2014
The ribosome is a complex molecular machine that
offers many potential sites for functional interference, therefore representing a major target for
antibacterial drugs. The growing number of highresolution structures of ribosomes from different
organisms, in free form and in complex with various
ligands, provides unique data for structural and
comparative analyses of RNA structures. We
model the ribosome structure as a network, where
nucleotides are represented as nodes and intermolecular interactions as edges. As shown previously
for proteins, we found that the major functional sites
of the ribosome exhibit significantly high centrality
measures. Specifically, we demonstrate that mutations that strongly affect ribosome function and
assembly can be distinguished from mild mutations
based on their network properties. Furthermore, we
observed that closeness centrality of the rRNA
nucleotides is highly conserved in the bacteria, suggesting the network representation as a comparative tool for the ribosome analysis. Finally, we
suggest a global topology perspective to characterize functional sites and to reveal the unique properties of the ribosome.
interaction between the tRNA anticodon and the
mRNA, which is being translated. The large subunit contains the peptidyl transferase center (PTC), which catalyzes the peptide bond formation in the growing
polypeptide (1). It is well established that the major functional sites in the ribosome that are involved in peptide
bond formation are composed mainly of rRNAs, which is
in the heart of the catalytic process (2). Though the ribosome is commonly considered a ribozyme (1), the involvement of a protein in the catalytic site has been
demonstrated in E. coli (3).
High resolution structures of 30S and 50S ribosomal
subunits have been solved by X-ray crystallography.
These include the 30S subunit from the eubacteria
Thermus thermophilus (4,5), the 50S subunit from the
archaeon Haloarcula marismortui (6) and the eubacteria
Deinococcus radiodurans (7). Several high-resolution structures of the 70S ribosome are also available, including the
E. coli ribosome at 3.5 Å (8), and T. thermophilus at 2.8 Å
(9) and 3.7 Å (10). The 70S ribosome structures provide
unique information on the interface between the subunits,
as well as the conformation of the active site in the context
of the entire ribosome complex. Overall, the growing
number of high-resolution structures of ribosomes from
different organisms, in the free form as well as in complex
with various ligands and antibiotics, provides a unique
data set for structural and comparative analysis of the
ribosome, specifically of the rRNA (4,6,8,9).
Different computational methods, which are based on
force field, have been used to study macromolecular structures, specifically proteins (11). However, the computational cost of these methods for studying long and short
range interactions in a large-scale system such as the ribosome is extremely high. Therefore, coarse grained methods
have been developed (12,13). Furthermore, graph theory
has been found to be a useful tool to investigate different
properties of macromolecules such as folding, stability,
function and dynamics (14). These studies have predominately concentrated on protein structures. In order to
model structure as a graph (network), several representations have been developed, ranging from coarse representation, in which each node represents a secondary
4642 Nucleic Acids Research, 2008, Vol. 36, No. 14
MATERIALS AND METHODS
Network analysis
The ribosome structure was presented as an undirected
graph (network), in which nodes represent nucleotides or
amino acids, and edges represent contacts. In order to
generate the network, all atomic contacts were calculated
using the CSU program (33). Nucleotides or amino acids
were considered to be in contact if at least one of the
corresponding atoms was in surface complementarity, as
defined in ref. (33).
Two parameters were calculated to characterize the network: the average shortest path length, and the average
clustering coefficient. The average shortest path length of a
network with N nodes is defined by Equation (1):
L¼
N
1 X
N
X
2
Lij
NðN 1Þ i¼1 j¼iþ1
1
where, Lij is the shortest path length between nodes i and j.
The average clustering coefficient of a network with N
nodes is defined by Equation (2):
X
1 N
C¼
Ci
2
N i¼1
where, Ci is the clustering coefficient of node i, defined
as the fraction of contacts that exist among its nearest
neighbors relative to the maximum contacts among all
neighbors.
In order to test the network properties, we calculated
the average shortest path length and the average clustering coefficient for random and regular networks with
the same number of nodes, and the same average
number of edges (degree) (24). In a random network,
Lrand ln N= ln K and Crand K=N, while for a regular
network, Lreg ¼ NðN þ K 2Þ=½2KðN 1Þ and Creg ¼
3ðK 2Þ=½4ðK 1Þ, N denotes number of nodes and K
number of edges.
For each node in the network, we calculated three centrality measures, i.e. degree, closeness and betweennness.
The degree of a node i is defined as the number of edges
connected to i. The closeness of a node i is defined as the
inverse average length of the shortest paths to all other
nodes in the graph (34). The closeness of node i is given in
Equation (3):
jN 1j
P
dij
3
j6¼i
where, N is the total number of nodes in the network, and
dij is the shortest path length to node i.
The betweenness of a node i is defined by the number of
the shortest paths that cross node i. The betweenness of
node i is given in Equation (4):
X gjik
4
g
j6¼k;i6¼j;k6¼i jk
where, gjik is the number of the shortest paths from j to k
that pass through i.
The shortest path length calculations are based on a
modified form of Dijkstra’s algorithm (35). Network parameters were calculated with the igraph package version
0.1.2 using GNU R statistical software (http://cneur
ocvs.rmki.kfki.hu/igraph), and the network python package (https://networkx.lanl.gov).
Statistical analysis
Enrichment of mutations within high centrality nucleotides was evaluated based on the Hyper Geometric
Distribution using the Fisher’s exact test. In addition,
we tested the enrichment in 100 random sets of nucleotides
Downloaded from http://nar.oxfordjournals.org/ by guest on December 24, 2014
structure element (15), to finer modeling methods, in
which each node represents an atom (16).
A common methodology to represent a macromolecular
structure as a network considers amino acids/bases as
nodes and inter-residue interactions as edges (17–20). In
protein structures, it has been shown that the network of
amino acid interactions is highly clustered and has properties of a small-world network. Such a network is characterized by the presence of a small number of central nodes
(21). Interestingly, the central nodes defined by the smallworld network description were found to be associated
with key residues in protein folding and dynamics
(18,22–25), functional sites such as enzymes catalytic
sites, ligand-binding sites (17,26–29) and hot spots in protein–protein interactions (20,30).
Previously, a small-world network approach has been
applied to study the conformational space of tRNA secondary structure (31). In addition, a network approach
has been utilized for RNA structure characterization, in
which different types of interactions (i.e. Watson–Crick,
Hoogsteen and Sugar-edge) were applied to present the
complexity of the structure (32). Here, we represented
the rRNA 3D structure as a network with nucleotides as
nodes and inter-nucleotides interactions as edges. We
revealed that the rRNA structure-derived network fits
the model of geometric random graphs with characteristics of a small-world network. Though the network parameters were directly derived from the structure of the
ribosome complex, they were not found to simply correlate with classical structural parameters (e.g. solvent surface accessibility). The lack of strong correlation between
structural properties and network parameters suggests
that the latter can provide extra insights on the ribosome
structure–function relationship. Specifically, we found
that nucleotides with significantly high centrality values
in the network correspond to the major functional regions
in the small and large subunits of the ribosome. Additionally, we observed that rRNA mutations that cause a
strong deleterious effect to the ribosome function, exhibit
high centrality. In summary, applying a network-based
analysis, we show that critical sites in the bacterial ribosome exhibit high values of local and global structural
parameters. Moreover, these functional sites can be identified based solely on the network parameters without considering evolutionary information.
Nucleic Acids Research, 2008, Vol. 36, No. 14 4643
of the same size of each mutations dataset and expended
the tested set to include the contacting nucleotides, as
applied for the original datasets. The proportion of
nucleotides that were randomly sampled from the large
and small ribosomal subunit was as in the mutation
datasets.
Ribosome structure analysis
Evolutionary rate
Aligned sequences and a guide tree for both the 16S/18S
and the 23S/28S were downloaded from the ARB-SILVA
database (release 92) (http://www.arb-silva.de) (38). The
16S/18S alignment comprises high-quality sequences with
a minimum length of 1200 bases for Bacteria and Eukarya
and 900 bases for Archaea. The 23S/28S alignment comprises high quality rRNA sequences with a minimum
length of 1900 bases. Positional variability in bacteria
was calculated with the ARB package using parsimony
function (38), including 168131 and 4115 sequences for
the 16S and 23S rRNA, respectively. The evolutionary
Mutation data
Datasets 1, 2. The two datasets are based on mutation
data, collected from various studies. The ribosomal
RNA mutation database in E. coli (16SMDB and
23SMDB) was downloaded (http://server1.fandm.edu/
departments/Biology/Databases/RNA.html) (39). We
further extended the database by including data from
recent studies (40–43). Dataset 1 includes mutations with
strong effect on the ribosome function were selected by
using a keyword search (strong, lethal, severe and deleterious) excluding mutations with mild or moderate effect.
Further, the mutations were manually filtered to include
only single point mutations. In addition, those mutations
which had their phenotypic effect tested with the presence
of antibiotics were excluded. To avoid redundancy, we
included mutations with a minimal space of 3 nt in the
primary sequence. Additionally, we excluded mutations
that appeared in Datasets 3 and 4. In total, Dataset 1
included 44 nt: 25 nt in the 16S rRNA and 19 in the 23S
rRNA. The 16S rRNA mutated nucleotides included positions: 13, 18, 517, 529, 571, 627, 643, 702, 770, 787, 792,
865, 914, 922, 967, 981, 1200, 1207, 1401, 1409, 1414, 1418,
1483, 1491 and 1498. The 23S rRNA mutated nucleotides
included positions: 1832, 1836, 1849, 1896, 1916, 1926,
1932, 1940, 1946, 1955, 1960, 1972, 1979, 1984, 2252,
2504, 2507, 2580 and 2584 (Supplementary Table S5).
Dataset 2 includes mutations with mild effect of the bacteria function. In total, Dataset 2 included 30 nt: 18 nt in
the 16S rRNA and 12 in the 23S rRNA. The 16S rRNA
mutated nucleotides included positions: 531, 534, 618, 624,
631, 634, 641, 645, 651, 912, 966, 1203, 1341, 1351, 1388,
1397, 1404 and 1518. The 23S rRNA mutated nucleotides
included positions: 1067, 1098, 1914, 1921, 1940, 1951,
1979, 2249, 2254, 2477, 2561 and 2661.
Dataset 3, 4. These two sets were derived from two recent
studies, which applied random mutagenesis procedure on
E. coli 16S and 23S rRNA genes (44,45). In these studies,
53 and 77 mutations in the 16S and 23S rRNA, respectively, were classified according to their phenotypic severity. The 16S and 23S rRNA included 50 and 69 base
substitutions, and 3 and 8 deletions, respectively. Among
the 16S base substitutions, 13 mutations were classified as
strong, 17 as mild and 20 as moderate. In the 23S base
substitutions, 12 mutations were classified as strong, 34 as
mild and 23 as moderate. Overall, the datasets included 25
positions (13 + 12) that were classified as strong and 51
positions (17 + 34) that were classified as mild.
The 16S rRNA mutated nucleotides with strong deleterious phenotype (Dataset 3) included the following
mutations: Y516G, C518U, C519U, A520G, G521A,
G973A, G1058A, G1068A, A1111U, C1208G, C1395U,
U1406C and U1495C. The 16S rRNA mutated nucleotides with mild effect included: U49C, A51G, G57A,
A161G, G299A, A373G, A389G, C536U, G568C,
C614A, A622G, U684C, A1014G, C1054U, A1055G,
U1073C and U1085C. The 23S rRNA mutated
Downloaded from http://nar.oxfordjournals.org/ by guest on December 24, 2014
In order to define the ribosomal subunits interface, the
solvent accessible surface area of each nucleotide in the
16S and the 23S rRNA was calculated using the POPS
server (36,37). A nucleotide was considered to be in the
interface if it had at least one atom that lost solvent accessibility upon complex formation (PDB codes 2AVY and
2AW4).
The nucleotides of the tunnel were defined using the 3V
Channel Extractor (http://geometry.molmovdb.org/3v)
with a large probe radius of 9.0 Å and a small probe
radius of 3.4 Å. Using the 3V Channel Extractor, an
output file with the tunnel surface represented as water
atoms was generated. We considered a nucleotide to be
located in the tunnel if at least one of its atoms was located
at a distance