top of page
  • Writer's pictureNicola Stanisławska

Exploring Retroviruses & Transposable Elements -Integration and Retrotransposition within Our Genome

Did you know that our genome contains certain gene sequences that can move around and even replicate themselves? These genetic elements can originate from retroviruses, which are a type of virus that can integrate their viral DNA into the host genome.

Unlike other viruses that rely on a host cell's machinery to replicate, retroviruses enter the nucleus of our cells and become part of our genome. But their unique abilities don't stop there - once they become inactive, retrovirus-originating transposable elements can transform into endogenous retroviruses (ERVs). ERVs are a type of retrotransposon - mobile genetic elements that can move and replicate within our genome, even contributing to its increase in size.

But how do retroviruses enter our cells and attach their DNA onto our own?

The structure and mechanism of action of retroviruses

The unique ability of retroviruses to integrate into the host's genome is reflected in the structure of their virions, the infectious particles that allow them to invade our cells.

Their outermost layer, which is derived from the host cell’s membrane, is called the viral envelope and is made up of lipids and proteins, including glycoprotein receptors (composed of surface and transmembrane proteins, Figure 1) which assists in its binding to the cell membrane.

Figure 1:

Underneath the envelope, the capsid protein shell surrounds the virus's genetic material. It is composed of repeating subunits called capsomeres, which come together to form an icosahedral structure. Inside the core of the capsid, the virion contains genetic information as well as vital machinery necessary to integrate the viruses genetic material into the host cell genome, such as a 2 RNA copies, reverse transcriptase (an enzyme responsible for converting the virus's RNA genome into DNA) as well as integrase (an enzyme responsible for integrating the viral DNA into the host cell genome, where it becomes a permanent part of the host cell's genetic material).

Figure 2:

Each RNA strand of retroviruses contains several genes needed to synthesise the structural, accessory, and regulatory proteins necessary for the processes of replication, assembling of new virions and integration into the host genome. For example, the viral genomic RNA of the human immunodeficiency virus (HIV) encompasses genes including Long Terminal Repeats, accessory genes vif, vpr, and vpu, as well as Gag, Pol and Env genes (Figure 4). Of such, the Gag gene codes for vital structures of the virion including the viral matrix and capsid, while the Env gene codes for the surface and transmembrane proteins and the Pol gene codes for viral enzymes, including reverse transcriptase and integrase.

Figure 3:

To enter the cell, the viral surface glycoprotein located in its envelope binds to a specific receptor on the host cell surface. After binding to the receptor, the surface (SU) protein undergoes a conformational change, which exposes a second domain, called the transmembrane glycoprotein (TM). The TM domain then mediates fusion between the viral envelope and the host cell membrane, allowing the viral core to enter the cytoplasm. Once in the cell cytoplasm, the virion undergoes uncoating, a process in which the viral capsid is dissolved, releasing its genetic material into the cytoplasm. Next, the viral enzyme reverse transcriptase transcribes the viral RNA genome, which is packed within the viral core, into a double stranded DNA (cDNA).

By this stage, the viral RNA has become double stranded DNA which means it now resembles the structure of the genome of the host. This double strand is transported and enters the nucleus, where, catalysed by the viral-encoded enzyme integrase, it becomes attached to the cell's genome, becoming integrated DNA, also known as a provirus. In the case of most retroviruses, except eg. HIV, the cDNA may only enter the cell during any stage of the cell cycle in which the nucleus envelope is broken down.

The integration into the host genome provides several advantages to the retrovirus. Firstly, it allows the viral genome to be replicated and passed on to daughter cells during cell division. Moreover, it allows the virus to evade the host immune system, as the viral genome is effectively hidden within the host cell genome. Finally, integration allows the virus to take advantage of the host cell's transcriptional machinery to transcribe and translate the viral genes, which is necessary for the production of new viral particles.

However, the integration of viral DNA may also have several other effects depending on the specific retrovirus involved or the state of the host cell. These can range from latent infection, such as the remaining of the provirus in a latent state where it it is not actively translated nor transcribed, oncogenesis, which is the activation of oncogenes which may disrupt normal cell division and lead to cancer, insertional mutagenesis, which is the disruption of neighbouring genes and more.

HIV - the mystery of the entry of Human Immunodeficiency Virus into the nucleus - “A Hard Way to the Nucleus”, Michael Burkinsky, 2004

One of the best known members of the Retrovirus family, human immunodeficiency virus (HIV), a causative agent of AIDS, replicates by integrating its genome into the host cell’s nuclear DNA. Unlike most retroviruses that depend on mitotic dissolution of the nuclear envelope to access the host cell's genome, the cDNA of HIV can enter the nucleus through the nuclear pore while the nucleus is still intact during interphase. This ability to infect non-dividing cells, such as immune cells like macrophages, is a critical factor in the pathogenesis of HIV-1.

The mechanisms responsible for this unusual feature of HIV have enticed researchers since the early 90s and to this day, many mechanisms have been proposed yet no conclusive cause has been provided. As such, the main proposed mechanisms include the matrix protein (MA), integrase (IN), viral protein R (VPR), and central DNA flap.

HIV enters the nucleus through the process of nuclear protein import (Figure 5), a selective process mediated by the nuclear pore complex (NPC), a large protein complex that spans the nuclear envelope and acts as a gateway for the selective transport of molecules between the cytoplasm and the nucleus.

The basic mechanism of nuclear protein import involves the recognition of a nuclear localization signal (NLS) on the cargo protein by importin alpha and the consequent binding of importin beta to the complex. The importin-protein trimeric complex then interacts with the NPC, which facilitates translocation of the complex into the nucleus.

Figure 4:

There are different theories on how HIV enters the nucleus. One theory suggests that the MA protein in the HIV genome carries 2 functional, weak nuclear localization signals that help it enter the nucleus, but studies show that viruses lacking most of MA can still infect cells, indicating that MA may not be crucial for nuclear import (Reil et al., 1998). Another theory proposes the role of IN, which was originally proposed by Gallay and coworkers (Gallay et al., 1997), which is highly karyophilic and can interact with different importins, including importin α/β (Gallay et al., 1997) and importin 7 (Fassati et al., 2003), but studies have also shown that IN may not be as important for nuclear import as previously thought. Additionally, the structure of the viral cDNA intermediate, a 99 nucleotide long structure called the "central DNA flap," may also play a role in HIV nuclear import by helping the virus interact with the right parts of the cell and get into the nucleus (Zennou et al., 2000). However, the importance of the central DNA flap may vary depending on the specific strain of HIV and the type of cell it is trying to infect (Dvorin et al., 2002; Limon et al., 2002).

Overall, the mechanisms that allow for HIV to enter the nucleus have been a subject of debate for countless years. Although certain mechanisms have been proposed, studies have shown that these individual components of the viral machinery are not as crucial on their own, but rather they must work together to allow for HIV pathogenesis. Moreover, the question of what allows HIV to perform this task is vital as it may allow for new, more effective treatment of HIV by targeting this specific feature which is imperative to its pathogenesis.

Retrotransposition - how proviruses move and replicate within our genome

Over time, proviruses can accumulate mutations that render them inactive as viruses, meaning they no longer leave the host cell, but give them the ability to move and replicate within the genome. These proviruses are called endogenous retroviruses (ERVs), which are a type of retrotransposon - mobile genetic elements which may move, replicate and integrate itself into the host genome. However, endogenous retroviruses are just one of transposable elements, which also include DNA transposons. While retrotransposons work in a “copy and paste” mechanism, always leaving a copy of themselves behind at the original site and integrating a new copy into the genome, DNA transposons may or may not leave a copy behind at the original site, working in a “cut and paste” manner.


Retrotransposons, also known as Class I transposable elements or transposons via RNA intermediates, are a type of mobile genetic element that copy and paste themselves into different genomic locations. This is achieved through a process called the RNA transposition intermediate mechanism.

In the RNA transposition intermediate mechanism, the transposable element first transcribes its genetic information from DNA to an RNA intermediate. Then, the RNA molecule is reverse transcribed back into DNA by the use of reverse transcriptase enzyme that is encoded for by retrotransposons. This newly synthesised DNA then integrates into a new site within the genome, resulting in the duplication of the retrotransposon at the original site and its insertion at the new site.

Figure 5:

A significant proportion of the human genome, approximately 8%, consists of virus-originating retrotransposons, also known as endogenous retroviruses (ERVs). ERVs are sequences of DNA that were originally derived from retroviruses.

Retrotransposons as a whole can be further classified into LTR and non-LTR retrotransposons, the difference between which is the presence or lack of long terminal repeats which are polynucleotide sequences found at each end of a retrotransposon that contain the signals for expression and integration of the viral genome.

One important function of the LTR sequence is to promote transcription. The LTR contains promoter and enhancer sequences that are recognized by cellular transcription factors, which help to initiate the transcription of the retrotransposon RNA. The LTR sequence also contains signals for the integration of the retrotransposon genome into the host cell DNA. After the retrotransposon RNA is reverse transcribed into DNA, the resulting cDNA, known as the provirus, is integrated into the host cell genome at a specific site determined by the LTR sequences. The LTR sequences play a crucial role in the integration process by providing the signals necessary for the retrotransposon DNA to insert into the host genome.

Non - LTR retrotransposons:

As retrotransposons, non-LTR retrotransposons are a type of mobile genetic element that can move (transpose) within the genome of an organism. They are called "non-LTR" because they lack long terminal repeat (LTR) sequences and are divided into two main groups: LINEs (Long INterspersed Elements) and SINEs (Short INterspersed Elements), found in almost all eukaryotes and together account for at least 34% of the human genome. LINEs are autonomous transposable elements which code for mRNA and reverse transcriptase which allows them to replicate. SINEs, on the other hand, are non-autonomous transposable elements, lacking the ability to mediate their own transposition. They code only for mRNA and not for reverse transcriptase, meaning they cannot replicate on their own and rely on the production of the enzyme from other sources, such as LINEs.

DNA transposons:

DNA transposons, also known as "jumping genes," are DNA sequences that have the ability to move and integrate into different locations within the genome. Unlike endogenous retroviruses, they are not virus originating. DNA transposons are classified as class II transposable elements, and they move by a cut-and-paste mechanism via a DNA intermediate. This means that the transposon is excised from its original location in the genome and then re-inserted into a new location. Unlike retrotransposons, which always leave a copy of themselves in the original location, in some cases, the transposon may leave a copy of itself in the original location, while in other cases, it may not.

In the DNA intermediate mechanism the transposable element excises itself from its original site within the genome, usually through the action of a transposase enzyme. The transposase then inserts the transposable element into a new site within the genome. This mechanism creates a "cut-and-paste" type of movement, where the transposable element is physically cut out of its original location and pasted into a new location.

Figure 6:

The effects of transposable elements, retrotransposons and DNA transposons:

With a high percentage of mobile genetic sequences making up the genome of various organisms, one may deliberate the effect of such structures on the host organism.

​​On one hand, transposable elements can have beneficial effects on the host genome, such as the contribution to the evolution of new genes and the formation of new regulatory elements, which can increase the genetic diversity of a population and provide adaptive advantages under changing environmental conditions. However, the insertion of transposable elements into a gene or a regulatory element can disrupt the normal function of that sequence, leading to genetic disorders or diseases.

Moreover, retrotransposons, in particular, pose a unique challenge as they not only move within the genome but also replicate and add onto it. This results in a continuous increase in the size of the genome, which can have several negative consequences.

One of the major drawbacks of retrotransposons is the increased use of energy and resources during replication. As the genome size increases, more energy and resources are required to pass on vital genetic information to new cells.

Retrotransposons can induce DNA damage, such as double-strand breaks or DNA replication errors, which can lead to mutations and chromosomal rearrangements. This can have serious consequences for the host organism, including the development of cancer or other genetic diseases.

Moreover, replication errors are common during DNA synthesis, with errors occurring in 1 per every 100,000 nucleotides. While most errors are corrected by damage repair mechanisms, the larger the genome size, the more replication errors that can occur, which can lead to detrimental effects on the host organism.

Using retrotransposons for ancestry analysis - evidence for evolution in our own DNA

For countless years, scientists have theorised that humans and apes are closely related. While this term was initially used figuratively and did not necessarily imply a common ancestry, the study of evolution, fossils, and comparative anatomy has since provided evidence of a shared ancestor. However, some remained sceptical of these methods, citing subjectivity and imprecision in the study of fossils.

Fortunately, DNA provides a precise and quantifiable means of proving such theories. By examining DNA, researchers can reveal family relationships and common ancestry between species. For example, the study of endogenous retroviruses can reveal viral infections experienced by our ancestors. Retrotransposons are relatively stable, allowing them to be transmitted across generations. If humans and chimpanzees share a common ancestor, and some viral infections in our genome occurred before the species diverged, the same viral genes should be found in the same locations in both genomes. This is because of the exponential size of the human genome, which consequently means that the probability that the same retrovirus would be incorporated in the same exact spot in the genome is extremely low, suggesting this similarity must have occurred due to a common ancestor. This provides universal proof of the shared ancestry between humans and apes, bolstering the evidence provided by the study of fossils and comparative anatomy.

The future

The intricate mechanism of action and adaptations that retroviruses have acquired to be able to not only integrate into our genome but move and replicate in it as well represent the incredible complexity and adaptivity that the world of nature beholds. Although retroviruses and endogenous retroviruses may have various effects on their hosts, ranging from providing favourable adaptations to causing diseases such as cancer, their potential applications are significant. Despite our current inability to consciously manipulate transposable elements, advancements in gene editing technologies such as CRISPR/Cas9 may make it possible to selectively target and eliminate retrotransposons from genomes, or to harness their activity for beneficial purposes. The future of retrotransposons is likely to be influenced by multiple genetic and environmental factors, and we can only anticipate what advancements in science and technology will reveal about their potential uses as well as novel discoveries on their inner workings.


  • Bukrinsky M. A hard way to the nucleus. Mol Med. 2004 Jan-Jun;10(1-6):1-5. PMID: 15502876; PMCID: PMC1431348.

  • Dvorin JD, Bell P, Maul GG, Yamashita M, Emerman M, Malim MH. Reassessment of the roles of integrase and the central DNA flap in human immunodeficiency virus type 1 nuclear import. J. Virol. 2002;76:12087–96. [PMC free article][PubMed] [Google Scholar]

  • Limon A, Nakajima N, Lu R, Ghory HZ, Engelman A. Wild-type levels of nuclear localization and human immunodeficiency virus type 1 replication in the absence of the central DNA flap. J. Virol. 2002;76:12078–86. [PMC free article] [PubMed] [Google Scholar]

  • Reil H, Bukovsky AA, Gelderblom HR, Gottlinger HG. Efficient HIV-1 replication can occur in the absence of the viral matrix protein. EMBO J. 1998;17:2699–708. [PMC free article] [PubMed] [Google Scholar]

  • Gallay P, Hope T, Chin D, Trono D. HIV-1 infection of nondividing cells through the recognition of integrase by the importin/karyopherin pathway. Proc. Natl. Acad. Sci. U.S.A. 1997;94:9825–30. [PMC free article] [PubMed] [Google Scholar]

  • Zennou V, Petit C, Guetard D, Nerhbass U, Montagnier L, Charneau P. HIV-1 genome nuclear import is mediated by a central DNA flap. Cell. 2000;101:173–85. [PubMed] [Google Scholar]

  • Fassati A, Gorlich D, Harrison I, Zaytseva L, Mingot JM. Nuclear import of HIV-1 intracellular reverse transcription complexes is mediated by importin 7. EMBO J. 2003;22:3675–85. [PMC free article] [PubMed] [Google Scholar]

  • van Heuvel, Y., Schatz, S., Rosengarten, J. F., & Stitz, J. (2022, February 14). Infectious RNA: Human immunodeficiency virus (HIV) biology, therapeutic intervention, and the quest for a vaccine. MDPI. Retrieved from

  • Wikimedia Foundation. (2023, April 13). Retrotransposon. Wikipedia. Retrieved from

  • Stated Clearly: “DNA Evidence That Humans & Chimps Share A Common Ancestor: Endogenous Retroviruses”,

  • Instagram
  • Facebook
bottom of page