Coronavirus Genome: Structure, Size, Comparison

Sana Masroor   by Sana Masroor, M.S., Biochemistry    Last updated on April 28, 2020,

Coronaviruses Identified So Far: How Are They Classified?

Coronaviruses usually cause upper respiratory tract illness and are genetically classified into four major genera:

  • Alphacoronavirus
  • Betacoronavirus
  • Gammacoronavirus, and
  • Deltacoronavirus.

The former two genera primarily infect mammals and the latter two usually infect birds.

There are six types of human coronaviruses that have been previously identified. These include:

HCoV229E and HCoV-NL63 which belong to the Alphacoronavirus genus; and HCoVHKU1, HCoV-OC43, which belong to severe acute respiratory syndrome coronavirus (SARS-CoV), and Middle East respiratory syndrome coronavirus (MERS-CoV), which belong to the Betacoronavirus genus.

Another one, called deltacoronaviruses, has not been found in human cases still.

Coronaviruses did not get worldwide attention until the 2003 SARS pandemic, followed by the 2012 MERS and, most recently, the 2019-nCoV outbreaks.

SARS-CoV and MERS-CoV are highly pathogenic. 

Coronavirus Genome and Its Size

Coronavirus genome is the genetic constituent of coronavirus, which consists of single stranded RNA (ribonucleoid) and helically symmetrical nucleocapsid proteins.

The coronaviruses genome size ranges between approximately 26,000 and 32,000 bases, and includes a variable number (from 6 to 11) of open reading frames (ORFs).


Check Out Other Trusted Material on Coronaviruses


The first ORF represents approximately 67% of the entire genome which  encodes 16 non-structural proteins (nsps), while the remaining ORFs codes for some  structural proteins (proteins involved in structural maintenance) and accessory proteins (proteins that helps in the stability of primary proteins).

The four major structural proteins are:

Spike Surface Glycoprotein(S)

Its size is approximately 150 kilodalton. It helps in binding to receptors present on the host cell and determine host tropism.

Small Envelope Protein (E)

It is found in very small quantities in the virion (entire virus particle). Its size is approximately 18-25 kilodalton.

These are highly divergent but have common architecture. E protein helps in the assembly and release of the virus.

Matrix Protein (M)

M protein is the most abundant structural protein in the virion. It is small in size with approximately 25-20 kilodaltons.

Nucleocapsid Protein (N)

N protein is the only protein which is found in the nucleocapsid. N protein helps the viral genome to tether with the replicase-transcriptase complex (RTC) and subsequently in packaging the genome into the viral particles.

Also Read

What Does COVID-19 Virus Look Like Under Microscope?

How Will Novel Coronavirus End?


Systematic Comparison of 2019-nCoV and Several Other SARS and SARS-Like Viruses

The spike proteins of SARS-CoV and MERS-CoV bind to distinct host receptors via different receptor-binding domains (RBDs).

SARS coronavirus generally uses angiotensin-converting enzyme 2 (ACE2) as one of the main receptors with CD209L as an alternative receptor, whereas MERS-CoV uses dipeptidyl peptidase 4 (DPP4, also known as CD26) as the predominant receptor.

Recent analysis has reported that novel coronavirus 2019 (COVID-19) has a close evolutionary association with the SARS like bat coronaviruses.

For this study, scientists carry out in-depth genome annotations on the first three determined genomes of 2019-nCoV — HB01, HB04, and HB05 — and compared them to related coronaviruses, including 1,008 human SARS-CoV, 338 bat SARS-like CoV, and 3,131 human MERS-CoV.

The researchers found that the amino acids in 2019-nCoV were quite similar to SARS-CoV. They also identified some notable differences, such as:

  • The 8a protein was present in SARS-CoV but the same protein was absent in COVID-19.
  • The 8b protein was 84 amino acids long in SARS-CoV, but 121 amino acids long in COVID-2019.
  • The 3b protein was 154 amino acids long in SARS-CoV, but there was only 22 amino acids long 3b protein in COVID-19.

On the basis of phylogenetic analysis of  the whole genomes of the various viruses, the researchers have found that the COVID-19 was in the same Betacoronavirus clade as MERS-CoV, SARS-like bat CoV, and SARS-CoV.

Meanwhile, they also found that coronaviruse genome of 2019 has the highest similarity with a SARS-like bat coronaviruses, and less related to the MERS-CoVs.

Differences in COVID-19 (Novel Coronavirus in 2019) Genome and SARS Coronaviruses Genome

Based on the close relationship between COVID-2019 and SARS coronaviruses or SARS like bat CoVs, findings of the amino acid substitutions in different proteins could shed light into how COVID-19 genome  differs structurally and functionally from SARS-CoVs.

In total, there were 380 amino acid substitutions between the amino acid sequences of  novel coronavirus in 2019 (COVID-19) and the corresponding consensus sequences (repeated sequences of amino acids) of SARS and SARS-like viruses.

Frequently Asked Questions

Coronaviruses can survive on surfaces for a few hours or up to several days under different conditions (e.g. type of surface, temperature or humidity of the environment).

If you think that the surface may be infected, clean it with simple disinfectant to kill the virus and protect yourself and others. Clean your hands with an alcohol-based sanitizer or wash them with soap or handwash and water. Avoid touching the eyes, nose or mouth.

The coronavirus constitutes of non-segmented and single stranded RNA genome of approximately 30,000 bases.

Coronaviruses are spherical with diameters of approximately 125 nm as reported in studies by cryo-electron microscopy and cryo-electron tomography. The most distinct feature of coronaviruses is the club-shape spike projections projecting from the surface of the virus. The virus is enveloped with nucleocapsid. Coronaviruses usually have helically symmetrical nucleocapsids, which is most uncommon among positive-sense RNA viruses, but is most commonly found in negative-sense RNA viruses.

The receptor-binding domain (RBD) of a coronavirus spike protein constitutes of a core subdomain that usually serves as the structural scaffold and a receptor- binding motif (RBM) that binds the receptor and contains neutralizing epitopes (part of antigen or foreign particle that is exposed). The RBDs are the most preferable candidates for subunit vaccine designs.

Sana Masroor

Sana Masroor is a biochemist and has pursued her Master’s degree in biochemistry from Jamia Millia Islamia University, New Delhi.  She has worked as a Research Trainee in the Special Center for Molecular Medicine, Jawaharlal Nehru University, New Delhi.

Read More Articles by this Author