August 11, 2020

Gene Expression: Transcription Factor Evolution Amongst Life Domains

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) or small nuclear RNA (snRNA) genes, the product is a functional RNA. Gene expression is summarized in the Central Dogma first formulated by Francis Crick in 1958, further developed in his 1970 article, and expanded by the subsequent discoveries of reverse transcription and RNA replication.

Download PDF Brochure PDF Brochure and Learn more

The process of gene expression is used by all known life—eukaryotes (including multicellular organisms), prokaryotes (bacteria and archaea), and utilized by viruses—to generate the macromolecular machinery for life.

In genetics, gene expression is the most fundamental level at which the genotype gives rise to the phenotype, i.e. observable trait. The genetic information stored in DNA represents the genotype, whereas the phenotype results from the "interpretation" of that information. Such phenotypes are often expressed by the synthesis of proteins that control the organism's structure and development, or that act as enzymes catalyzing specific metabolic pathways.

source: freepik.com

Transcription

The production of a RNA copy from a DNA strand is called transcription, and is performed by RNA polymerases, which add one ribonucleotide at a time to a growing RNA strand as per the complementarity law of the nucleotide bases. This RNA is complementary to the template 3' → 5' DNA strand, with the exception that thymines (T) are replaced with uracils (U) in the RNA.

In prokaryotes, transcription is carried out by a single type of RNA polymerase, which needs to bind a DNA sequence called a Pribnow box with the help of the sigma factor protein (σ factor) to start transcription. In eukaryotes, transcription is performed in the nucleus by three types of RNA polymerases, each of which needs a special DNA sequence called the promoter and a set of DNA-binding proteins—transcription factors—to initiate the process (see regulation of transcription below).

RNA polymerase I is responsible for transcription of ribosomal RNA (rRNA) genes. RNA polymerase II (Pol II) transcribes all protein-coding genes but also some non-coding RNAs (e.g., snRNAs, snoRNAs or long non-coding RNAs). RNA polymerase III transcribes 5S rRNA, transfer RNA (tRNA) genes, and some small non-coding RNAs (e.g., 7SK). Transcription ends when the polymerase encounters a sequence called the terminator.

Transcription factor evolution amongst life domains

TF function involves two basic features: i) The ability to recognize and bind short, specific sequences of DNA within regulatory regions; and ii) the ability to recruit or bind proteins that participate in transcriptional regulation. Consequently, the evolution of TFs mainly depends on alterations in binding sites, binding partners and expression patterns. Moreover, as an integral part of gene expression, they are closely related to the evolution of epigenetic mechanisms. The current literature on TF evolution provides a broad range of information. Firstly, gene duplication and gene loss as crucial drivers of evolution are subsequently important drivers of TF evolution. Regardless of organism complexity, they are present in all domains of life.

Duplication and deletion can influence transcriptional regulatory networks by increasing or reducing the number of TFs with specific binding preferences. Following the duplication of a TF gene, the two resulting gene copies are likely the same. Since they share the same sequence, including the DBD sequence, they bind to the same target genes. Ensuing mutations in the DNA binding domain sequence can lead to one of the TF copies to switch to regulating different target genes.

On a more lineage-specific level, TFs display several differences. Although the basal transcription machinery has long been considered universally conserved, it is currently accepted that it too diversifies during evolution. The size and subunit composition of the basal transcription machinery increase highly during evolution, consisting of roughly 6 subunits in bacteria, up to 15 in the archaea, and a large number in eukaryotes, which have at least 3 different RNA polymerases. Significant differences are apparent between prokaryotes and eukaryotes. Firstly, some DBDs are specific to evolutionary lineages; e.g., the ribbon-helix-helix domain is specific to bacteria and archaea while C2H2-ZNfs, Homeobox box, and T-box domains are specific to eukaryotes.

Moreover, eukaryotic TFs are relatively longer than other eukaryotic proteins with a different function, while this association is reversed in prokaryotes. This phenomenon may be due to the fact that eukaryotic TFs have a number of long intrinsic disordered segments that are needed to leverage the formation of a multi-protein transcription protein complex. Another characteristic specific to eukaryotes are the repeats of the same DBD family in one polypeptide chain. This characteristic may be the result of a mechanism eukaryotes use that increases the length and diversity of DNA binding recognition sequences using a limited number of DNA binding domain families.