Sunday, August 4, 2019

Predicting HLA Types for Array and High-Throughput Sequencing Data

My previous link to my HLA-assignments with varying technologies has the most important table in the middle of the page.  So, I am mostly reproducing that here to make the information easier to view.


SNP2HLA HIBAG bwakit HLAminer
HLA-A A*01, A*02
(23andMe)

A*01, A*02
(Genes for Good)

A*01, A*02
(AncestryDNA)
A*01, A*02
(23andMe)

A*01, A*02
(AncestryDNA)
A*01, A*02
(Genos Exome BWA-MEM)
A*01, A*02
(Genos Exome BWA-MEM)

A*01, A*68
(Genos Exome BWA)
HLA-B B*08, B*40
(23andMe)

B*08, B*40
(Genes for Good)

B*08, B*40
(AncestryDNA)
B*08, B*40
(23andMe)

B*08, B*40
(AncestryDNA)
B*08, B*40
(Genos Exome BWA-MEM)
B*08, B*40
(Genos Exome BWA-MEM)

B*08, B*41
(Genos Exome BWA)
HLA-C C*03, C*07
(23andMe)

C*03, C*07
(Genes for Good)

C*03, C*07
(AncestryDNA)
C*03, C*07
(23andMe)

C*03, C*07
(AncestryDNA)
C*03, C*07
(Genos Exome BWA-MEM)
C*03, C*07
(Genos Exome BWA-MEM)

C*03, C*07
(Genos Exome BWA)
HLA-DRB1 DRB1*01, DRB1*03
(23andMe)

DRB1*01, DRB1*03
(Genes for Good)

DRB1*01, DRB1*03
(AncestryDNA)
DRB1*03, DRB1*11
(23andMe)

DRB1*03, DRB1*15
(AncestryDNA)
DRB1*04, DRB1*04
(Genos Exome BWA-MEM)
DRB1*01, DRB1*15
(Genos Exome BWA-MEM)

DRB1*01, DRB1*15
(Genos Exome BWA)
HLA-DQA1 DQA1*05, DQA1*05
(23andMe)

DQA1*01, DQA1*05
(Genes for Good)

DQA1*01, DQA1*05
(AncestryDNA)
DQA1*05, DQA1*05
(23andMe)

DQA1*01, DQA1*05
(AncestryDNA)
DQA1*03, DQA1*03
(Genos Exome BWA-MEM)
DQA1*02, DQA1*03
(Genos Exome BWA-MEM)

DQA1*02, DQA1*03
(Genos Exome BWA)
HLA-DQB1 DQB1*02, DQB1*05
(23andMe)

DQB1*02, DQB1*02
(Genes for Good)

DQB1*02, DQB1*05
(AncestryDNA)
DQB1*02, DQB1*03
(23andMe)

DQB1*03, DQB1*06
(AncestryDNA)
DQB1*03, DQB1*03
(Genos Exome BWA-MEM)
DQB1*02, DQB1*03
(Genos Exome BWA-MEM)

DQB1*02, DQB1*03
(Genos Exome BWA)

In other words, my HLA-A / HLA-B / HLA-C types could be identified more robustly than the HLA-D genotypes (which I don't know, since I haven't gotten a regular blood test).  However, my understanding is that those types have a greater priority in defining organ transplant matches (although I'm currently encountering some difficulty finding the reference for that).

The GitHub link also goes a little deeper into how 23andMe is using 2 SNPs to represent 2 haplotypes (across genes) for celiac disease (which I found surprising, but that is done for other diagnostics as well).  I am mostly leaving that out of this section, but I did think it was interesting that HLA was used in 23andMe's "Meet Your Genes" when the SNPs are actually intronic / intergenic (with respect to the RefSeq annotations).

My 23andMe report indicated that I was DQ8-positive but DQ2-negative for my celiac disease risk.  In terms of defining the 2 genes used to define my DQ8-positive status I coloring matching assignments above in magenta (HLA-DQA1*03 and HLA-DQB1*0302).

Here is a screenshot for the variants tested by 23andMe (where I have the "C" variant for rs7454108, for the marker described as "HLA-DQ8"):



Again, as described here, a positive HLA-DQ8 status is defined by having HLA-DQA1*03 and HLA-DQB1*0302.

More recently, I collected Illumina Whole Genome Sequencing data where unaligned reads were provided (from Sequencing.com), along with some amount of PacBio HiFi data from Dante Labs.  There are some parts of the results that are not especially clear to me and I am interested to learn about additional options for analysis.  However, I believe those results are consistent with me having at least one DQB1*03 allele.

In terms of what appears to be consistent between the PacBio data and Illumina Whole Genome Sequencing data, the T1K results from the Sequencing.com Illumina reads indicates that the related HLA-DQB1 allele should be HLA-DQB1*03:02:01.

I ordered additional GlutenID testing from Targeted Genomics, with an uploaded subfolder on GitHub.  However, I believe the potential problem with using this for validation is that the DQ8-DQ8 result is based upon the same 1 SNP as my 23andMe result (rs7454108).  So, I will continue to look into additional validation options.

I am interested to learn more about the broader trends if certain HLA types are harder to assign and/or impute than other HLA types.  I have some notes mentioned in this Disqus comment.  Comments containing relevant feedback is also welcome on this blog post.

Update Log:

8/4/2019 - public post date
8/6/2019 - minor changes
8/15/2019 - add coloring for HLA-DQ8
2/11/2024 - add information / links to Whole Genome Sequencing data (Illumina from Sequencing.com and PacBio HiFi from Dante Labs)
2/18/2024 - add screenshot from 23andMe + dbSNP link; add additional HLA-DQ8 sentence; add small paragraph for Illumina WGS T1K result; add link to Disqus comment; fix minor typos + add tags
2/27/2024 - minor change in column header
3/19/2024 - minor change in column header; add link to GlutenID results

No comments:

Post a Comment

 
Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.