Skip to content
  • Brazil
  • Canada
  • Europe
  • India
  • Italy
  • Japan
  • Korea
  • Latam
  • Spain
  • Taiwan
  • The Middle East
  • Turkey
  • United Kingdom
  • United States
  • Country/Region
  • No block ID is set

  • Clinic Portal
    • +0034963905310
  • Request Information
  • +34 96 390 53 10
  • Part of brands: |
InternationalInternational
  • Country/Region
  • Part of brands: |
  • Patient Journey
    • Before Pregnancy
    • IVF Process
    • Healthy pregnancy
    • After birth
  • Reproductive Health
    • Specialists
      • ERA
      • ERA insight Hub
      • ALICE
      • EMMA
      • EndomeTRIO
      • Infertility Panels
      • EMBRACE
      • PGT-A
      • PGT-A Plus
      • PGT-M
      • PGT-SR
      • CGT
      • NACE
      • Zenit
      • POC Portfolio
      • SAT
      • Newborn Screening
  • Diagnostics
  • About us
    • Igenomix Research
    • About Igenomix
    • Igenomix Worldwide
  • Academy
  • Blog

Complex regions and calculation of homology

The human genome

In recent years, clinical genetics has undergone major developments in the field of precision medicine due to the incorporation of massive parallel sequencing in clinical routine, significantly increasing the diagnostic yield. 

One of the most extended sequencing strategies is second generation sequencing. It works by sequencing millions of DNA fragments at the same time, providing reads with highly accurate values, being almost a faithful image of the sequence being read from the genome. However, despite all advantages that this methodology provides, it also has limitations associated to the length of the reads generated by this type of sequencing, which usually varies between 75 and 300 bases pairs.  

At Igenomix we take advantage of the speed and reliability of the data generated by Illumina, Inc. for its use in molecular diagnosis within the a clinical environment for molecular diagnosis.  

The main purpose of this page is to provide useful information to our customers regarding the limitations associated with the technology used in our massive parallel sequencing studies, especially with respect to the homology within the exome used. 

Drawbacks of short-read sequencing

Although the short-read sequencing methodology has provided us amazing advances in the molecular diagnosis field through an accurate evaluation of the genomes, the drawbacks associated to the technology shall should not be forgotten, as they may have impact in the molecular diagnosis of a patient. 

In fact, the main drawbacks, or limitations of the second-generation sequencing can be listed as: 

  • Limitation in homopolymer regions: the repetition of a nucleotide more than 5-6 times in the genome makes the evaluation of that position almost impossible due to the synchronization of the polymerase during amplification and sequencing. 

  • Secondary structures: secondary structures formed during the library preparation and sequencing procedure can result resulting in distinct types of biases in the final results. 

  • Homologous regions: regions in the genome with high sequence similarity to other genomic locations that can lead to bioinformatic mapping issues and may cause variant calling errors.  

  • Repetitive regions: regions difficult to map due to the repetitiveness of their sequence, such as centromeric and telomeric regions. 

Bioinformatic calculation of homologous regions

In order to obtain accurate and confident results, homologous regions must be identified before the analysis.  

In our assessment process to obtain a list of transcripts and exons that may present issues in the mapping and variant calling steps, the following information was used: 

  • Tables genomicSuperDups and getRmNgsProblemHigh from the UCSC database (2022-07-11). These tables have been created through the generation of in silico data to determine the mapping quality of each region of the genome. 
  • RefSeq regions (v2021-03-24). 
  • Transcript indicated by the MANE database (v1.0). 
  • OMIM database (2022-08-31). 

Using the following information, health professionals are able to evaluate if the study´s targeted genes of the study could be affected in the analysis process, missing what might be relevant information in the patient diagnosis. 

Table describing the OMIM genes, and the exons containing high homologous regions
GeneTranscriptExons with low average mappabilityExons with >90% homologyExons with >95% homology
ABCA3NM_00108916, 31
ABCA7NM_01911218
ABCC6NM_0011711-91-91-9
ABCD1NM_0000337-107-107-10
ACANNM_00136926812
ACRNM_0010974-5
ACTBNM_0011012-6
ACTG1NM_0016142-6
ADAMTSL2NM_01469410-1910-1910-19
ADGRE2NM_0134473-10
ADH1BNM_0006685-7
ADH1CNM_0006695-7
ADH5NM_0006716
AFG3L2NM_00679614
AGKNM_01823816
AIMP2NM_0063031
ALG1NM_0191096-136-136-9
ALG10BNM_0010136201-3
ALMS1NM_00137845417-18, 2017-21
ANAPC1NM_0226622-482-482-48
ANKRD11NM_0132759, 131313
ANO10NM_0180753
ANOS1NM_00021610-12, 1410-1410-14
AP1B1NM_0011272-32-3, 6
AP2S1NM_0040694-5
AP4S1NM_00112812666
APOL1NM_0036615-6
APOL2NM_0308825
AQP7NM_0011702-8
ARHGEF1NM_0047068
ARL6IP1NM_0151616
ARSLNM_0000479, 119-11
ASNSNM_0016734, 93-13
ASS1NM_05401215
ATAD1NM_00132196710
ATAD3ANM_0011705351-11, 13, 15-161-16
ATMNM_00005128
B3GAT3NM_01220033-53-5
BANF1NM_0038603
BCAP31NM_0012564476-75-85-8
BCRNM_00432717-20, 2217-2317-23
BDP1NM_0184293939
BMPR1ANM_00432910, 1312-1312-13
BMS1NM_0147532-7, 9, 14-21, 23
BPTFNM_1826411, 3, 6-7, 25-261, 3, 6-7, 15-17, 25-26, 28
BRAFNM_00137425819
BRAFNM_00433318
BRCA1NM_0072942
C4ANM_0072931-41
C4BNM_0010020291-411-411-41
CA5ANM_0017391-71-71-7
CACNA1CNM_00071944-4543-45
CACNA1CNM_00116762344-4543-45
CALM1NM_0068884-6
CCL3NM_0029832-3
CD209NM_0211554
CD46NM_17235122-5
CDC40NM_01589115
CDC42NM_00179164-64-6
CDH15NM_00493311
CDK8NM_00126013
CELNM_0018071, 8-111-111-11
CELA2ANM_0334401, 5
CEP290NM_02511454
CES1NM_0010251953-4, 6, 12-14
CFC1NM_0325451-6
CFHNM_00018620-228-10, 20-2220-22
CFHR1NM_0021131-61-61-6
CFHR3NM_0210234-61-64-6
CFTRNM_00049210
CHCHD2NM_0161392
CHEK2NM_00719411-1511-1511-15
CHRNA2NM_0007426
CHRNA4NM_0007445
CIBAR1NM_1452698
CIDECNM_0013211426-7
CISD2NM_001008388333
CLCN3NM_0018293-9
CLCN7NM_0012879
CLCNKANM_0040702, 6-8, 12, 14-16, 202-20
CLCNKBNM_0000852, 6-8, 12, 14-16, 202-20
CORO1ANM_00707410-1110-1110-11
COX10NM_001303666
COX6A1NM_0043733
CPNM_00009691919
CPAMD8NM_0156921
CR1NM_0006511-3, 5-23, 25-29, 33, 41, 45, 47
CRYBB2NM_0004964-64-64-6
CRYGDNM_0068912
CSF2RANM_1722453-133-13
CSH1NM_0013171-5
CTNND1NM_0010854582121
CTU2NM_0010127597
CUBNNM_00108142-43, 49-5041-50, 61-67
CUX1NM_0019133
CUX1NM_1815523
CXCR1NM_0006342
CXCR2NM_0015573
CYCSNM_0189472-32-3
CYP11B1NM_0004973-5, 7-81-9
CYP11B2NM_0004983-5, 7-81-9
CYP21A2NM_0005001, 3-101-101-10
CYP2A6NM_0007621-2, 4-91-91-9
CYP2B6NM_0007675, 8-9
CYP2C19NM_0007691-3, 8
CYP2C9NM_0007711-3, 8
CYP2D6NM_0001061-2, 4-9
CYP3A4NM_0174607, 10-11
DCLRE1CNM_0010338554, 6, 84-94-9
DDX11NM_0306532-27
DHFRNM_00079166
DICER1NM_17743827
DIS3L2NM_15238316, 18-1915-21
DKC1NM_00136311
DLATNM_0019312
DLSTNM_00193315
DNAH11NM_00127711576, 8276
DNAJB6NM_0582468
DNM1NM_00440822
DPF2NM_00626811
DPP6NM_130797111
DPY19L2NM_1738122-22
DRD5NM_000798111
DSENM_01335266
DUOX2NM_0013637116-8, 19
DYNC1I2NM_00137818
EFL1NM_0245802, 4-5, 7, 9, 12-152-152-15
EIF2B3NM_0203659
EIF2S3NM_0014151212
EIF3FNM_0037541
ELMO2NM_1331719
ERCC6NM_0012770586
ERFNM_0064942
ESPNNM_0314752-102-122-12
ETFBNM_0019853
EYSNM_00114280012
FAM111BNM_19894733
FANCD2NM_00101811513-15, 17, 21-25, 27-2812-17, 19-2812-14, 17
FAR1NM_03222812
FCGR1ANM_0005661-6
FCGR2ANM_0011362193-4, 6-74-74-7
FCGR3ANM_0005691-51-51-5
FHL1NM_0011596996
FHL1NM_0011597028
FLGNM_00201633
FLG2NM_0010143423
FLNCNM_00145846-4844-4844-48
FMN2NM_0200665
FOXC1NM_0014531
FOXC2NM_0052511
FSIP2NM_17365113
FTLNM_0001464
FUT2NM_0005112
FUT6NM_0001503
FXNNM_0001445
FXR1NM_005087171717
GATCNM_1768184
GCNANM_0529573-6, 8
GGT1NM_0012888335-165-165-16
GH1NM_0005151, 51-5
GJA1NM_000165222
GKNM_001205019212121
GLDCNM_0001701
GLUD1NM_0052711-22-4, 1313
GLUD2NM_012084111
GNAQNM_0020721, 777
GOSR2NM_0042873-4
GPRASP2NM_00100405155
GPX1NM_0005811-2
GRAPNM_0066131-41-51-5
GRK1NM_0029293-4
GUSBNM_0001811111
GYPANM_0020991-5
GYPBNM_0021001-5
GYPCNM_0021012-32-3
H3-3ANM_0021074
H4C11NM_0219681
HBA1NM_0005581-21-3
HBA2NM_0005171-31-3
HBG1NM_0005591-3