HMM Logo


Domain logo

doc_logo_letter

A HMM Logo displays several columns containing a letters stack, representing the different amino acids observed at this position. The height stack shows the position conservation while the letter height represents the amino acid frequency at this specific position.

The three lines underlying the columns letter stack displays -from the top to the bottom- respectivly: the probability to observe an amino acid, the insertion probability and the insertion length at this logo position.

doc_lines_probs

In simple terms, the last two lines represent the probability to observe a letter insertion right after this column and the expected length of this insertion.
In our case the first line represents how many eukaryotes sequences in the pfam domain alignment have an amino acid at this location. The two last lines have low insertion probability and low expected length because we chose to represent all the alignment position to make easier the human proteins mapping.

Actually, if you need a consensus protein domain HMM logo you can go to the Pfam website. Only more conservated position are represented on this logo. Nevertheless, it is harder to map human protein positions on this type of display. That's why Pfam logo is more helpfull determining most important amino acid patterns in the protein domain.


Column information

doc_col_info

By clicking on a logo column you can display a table including more details on amino acids frequencies. Amino acids are ordered by decreasing frequency. The explainations of the table two first lines are in the previous paragraph.


Mutation table

doc_table

The mutation table shows the Dolphin prediction for the given protein missense mutation. Dolphin "WT" and "∆" scores are displayed and the prediction probability (see Dolphin paper for more information). The red button downloads the mutation information.
GnomAD allele frequency and Dolphin frequency (AF in the domain) are also available.

To understand where the Dolphin frequency comes from, you can click on the "Show Details" button.

doc_details_mut_table

The details table displays the same missenses variants at the same logo position. Mutations are ordered by decreased frequencies. The first one gives the Dolphin frequency. For each mutant you can also see the reported gnomAD frequency.


Create your HMM logo

All the protein domains HMM Logos were created by the Skylign tool. To create your logo you can go to their website and read the relative documentation. Differents possibilities are offered to generate a logo that corresponds to your alignment data.

Skylign web site:

skylign.orgLogo Skylign

Citation in publications:

Wheeler, T.J., Clements, J. & Finn, R.D. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics 15, 7 (2014). https://doi.org/10.1186/1471-2105-15-7

Alignment Viewer


Logo position number

doc_align

The alignment viewer displays all the human proteins belonging to this pfam domain alignment.
The first line represents the HMM logo positions (columns). In the example above, the provided variant is at the center of the highlighted blue cross. The position in the logo corresponds to the vertical highlight while the horizontal one corresponds to the specific domain of the human protein.
By hovering over the entry name the gene name will appear.


Colors

doc_col_info

This select form proposes several colors panels:

  • Skylign
  • Rasmol
  • Rasmol Shapely
  • ClustalX

Get protein domains alignments

Protein domains sequences alignments were obtained from the Pfam FTP. Only Pfam A (an HMM based hand curated Pfam entry, which passed a manually set threshold value for each HMM) is used, keeping only the eukaryotes sequences. In the Dolphin alignment viewer, you can see sequences from human proteins.

Pfam web site:

pfam.xfam.orgLogo Pfam

Citation in publications:

Jaina Mistry, Sara Chuguransky, Lowri Williams, Matloob Qureshi, Gustavo A Salazar, Erik L L Sonnhammer, Silvio C E Tosatto, Lisanna Paladin, Shriya Raj, Lorna J Richardson, Robert D Finn, Alex Bateman, Pfam: The protein families database in 2021, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D412–D419, https://doi.org/10.1093/nar/gkaa913

Frequency viewer


Domain feature

doc_feature

The first line of the frequency viewer element, displays the protein amino acid sequence associated with the Pfam entry name related to the provided mutation. Letter color depends on the alignement viewer color selector (see paragraph above).
The second line represents differents occurancies of the same protein domain along the protein. At the bottom, the dark line shows the protein scale (protein position).


Frequencies features

doc_zoom_feature

With your mouse, you can scroll or click and drag to zoom in the frequency viewer. To unzoom, you can scroll or right click in the viewer.
Both stair lines display the number of amino acid substitutions with associated with a GnomAD (yellow, top) or Dolphin (blue, bottom) frequency.


Variants frequency

All substitution frequencies were extracted from GnomAD version 2 containing data from 125,748 exomes. For each missense mutation, we selected the most frequent mutational event leading to the amino acid substitution from any human population.

GnomAD web site:

gnomad.broadinstitute.orgLogo GnomAD

Citation in publications:

Karczewski, K.J., Francioli, L.C., Tiao, G. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). https://doi.org/10.1038/s41586-020-2308-7