Charles Warden's Science Blog

Saturday, May 18, 2019

Emphasizing "Hypothesis Generation" in Genomics

With terms like "precision medicine," I believe there is an expectation that that genomics will remove ambiguity in medicine. I certainly agree that it can help. However, I think it is also important to realize that "hypothesis generation" and "genomic profiling" are also common terms in genomics, which imply some uncertainty and a need for further investigation.

I think there more work needed to have a fair and clear description of the importance to limitations to genomic analysis, but I believe concepts like over-fitting and variability also be important.

Here is a plot that I believe can help explain what I mean by over-fitting :

The plot above is from Warden et al. 2013. Notice that very high accuracy in one dataset (from which a signature was defined) actually resulted in lower accuracy in other datasets (which is what I mean by "over-fitting"). The yellow horizontal line is mean to be like the diagonal line in AUC plots. While 50% accuracy is not actually the baseline for all comparisons (for example, if an event or problem is rare, then saying something works 0% of the time or 100% of the time may actually be a better baseline). Nevertheless, I think this picture can be useful.

When I briefly mentioned "variability," I think that is what is more commonly thought of for personalized / precision medicine (solutions that work in one situation may not work as well as others). However, I also hope to eventually have an RNA-Seq paper to show that testing of multiple open-source programs can help towards having an acceptable solution (even if you can't determine the exact methods should be used for a given project ahead of time). I think this is a slightly different point in that it indicates limits to precision / accuracy for certain genomics methods, while still showing overall effectiveness in helping answer biological questions (even though you may need to plan to take more time to critically assess your results). Also, in the interests of avoiding needless alarm, I would often recommend people/scientists visualize their alignments in IGV (a free genome browser), along with visualizing an independently calculated expression value (such as log2(FPKM+0.1), calculated using R-base functions); so, if you think a gene is likely to be involved, that can sometimes be a way to help gauge whether the statistical analysis produced either a false positive or a false negative for that gene (and then possibly provide ideas of how to refine analysis for your specific project).

This is also a little different than my earlier blog post about predictive models in that I am saying that over-fit models may be reported to be more accurate than they really are (whereas the the predictive power of the associations described in that post clearly indicate population differences have limitations in predictive power in individuals). However, I think that level of predictive power for the SNP association is in some ways comparable to the "COH" gene expression model shown above (where roughly 80% accuracy is actually more robust, and therefore arguably more helpful, than a signature with >90% accuracy in one cohort but essentially random predictions in most other cohorts).

I think that also matches this Wu et al. 2021 commentary, where performance noticeably drops in Table 1 when an AI model is trained at one site and tested at another (with highest performance coming from the same site as training). However, this is a little different than what I showed above (where lower performance on the same dataset may result in relatively better performance in different datasets for the BD-Func plot, but I think the maximal performance is more correlated in the AI commentary table with a loss across in performance in new sites across all 3 rows). If there was at least 1 additional row with non-AI models that had more similar performance on the same site and different sites (but lower performance than the AI model on the same site), then that would be more similar to the BD-Func example.

Also, I think it should be emphasized that precision medicine doesn't necessarily have to involve high-throughput sequencing, and I think using RNA-Seq for discovery and lower-throughput assays in the clinic is often a good idea. For example, the goal of the paper was a little different, but the feature being predicted in that plot above is Progesterone Receptor immunostaining (I believe protein expression for ER, PR, and HER2 are often checked together for breast cancer patients). So, just looking at the PGR mRNA might have had more robust predictions in validation sets than the BD-Func "COH" score (which was a t-test between up- an down-regulated gene expression, per-sample).

There are positive outcomes from genomics research, and there are some things that can be known/performed with relatively greater confidence than others (such as well-established single-gene disorders). However, I think having realistic expectations is also important, and that is why I believe there should be emphasis on both "precision medicine" and "hypothesis generation" when discussing genomics. Or, I actually prefer the term "personalized medicine" over "precision medicine," which I think can capture both of those concepts.

Change Log:

5/11/2019 - this tweet had some influence on re-arranging my draft (prior to public posting), in terms of the expectation that personalized medicine / genetics can explain / improve therapies that originally did not seem very effective.
5/18/2019 - public blog post
5/20/2019 - update link for "personalized medicine," add sentence in 1st paragraph, and remove "medicine" from title (and one sentence in concluding paragraph).
5/22/2019 - I don't remember why I had this in the draft for the generics post. While I don't think it fits in with the flow of the main content, I wanted to add this as a slide note relevant to general limitations in precision (even when a program is incredibly useful): As mentioned in this tweet, BLAST is of huge benefit to the bioinformatics / genomics community, even without choosing a "typical" 0.05 E-value cutoff (to be more like a p-value).
5/26/2019 - add Mendelian Disease as "success story" for genomics

4/8/2021 - add link to article about issue with retrospective studies and decrease in performance between sites

Saturday, May 4, 2019

precisionFDA and Custom Scripts for Variant Comparisons

After posting this reply to a tweet, I thought it might be a good idea to separate some of the points that I was making about comparing genotypes for the same individual (from this DeepVariant issue thread).

For those who might not know, precisionFDA provides a way to compare and re-analyze your data for free. You need to create an account, but I could do so with a Gmail address (and an indication that you have data to upload). I mostly show results for comparing .vcf files (either directly provided from different companies, or created via command line outside of precisionFDA).

I needed to do some minor formatting with the input files, but I provided this script to help others to the same. I also have another script that I was using to compare .vcf files.

For the blog post, I'll start by describing the .vcf files provided from the different companies. If readers are interested, I also have some messy notes in this repository (and subfolders), and I have raw data and reports saved on my Personal Genome Project page.

For example, this is the results for the SNPs from my script (comparing recovery of variants in my Genos Exome data within my Veritas WGS data):

39494 / 41450 (95.3%) full SNP recovery
39678 / 41450 (95.7%) partial SNP recovery

My script also compares indels (as you'll see below), but I left that out this time (because Veritas used freebayes, and I didn't convert between the two indel formats).

I defined "full" recovery as having the same genotype (such as "0/1" and "0/1", for a variant called as heterozygous by both variant callers). I defined "partial" recovery as having the same variant, but with a different zygosity (so, a variant at the same position, but called as "0/1" in one .vcf but called as "1/1" in the other .vcf would be a "partial" recovery but not a "full" recovery).

You can also see that same comparison in precisionFDA here (using the RefSeq CDS regions for the target regions), with a screenshot shown below:

So, I think these two strategies complement each other in terms of giving you slightly different views about your dataset.

If I re-align my reads with BWA-MEM and call variants with GATK (using some non-default parameters, like removing soft-clipped bases; similar to shown here, for Exome file, but not WGS, and I used GATK version 3.x instead of 4.x) and filter for high-quality reads (within target regions), these are what the results look like (admittedly, using an unfiltered set of GATK calls to test recovery in my WGS data):

Custom Script:

20765 / 21141 (98.2%) full SNP recovery
20872 / 21141 (98.7%) partial SNP recovery
243 / 258 (94.2%) full insertion recovery
249 / 258 (96.5%) partial insertion recovery
208 / 228 (91.2%) full deletion recovery
213 / 228 (93.4%) partial deletion recovery

precisionFDA:

Since I was originally describing DeepVariant, I'll also show those as another comparison using re-processed data (with variants called from a BWA-MEM re-alignment):

Custom Script:

51417 / 54229 (94.8%) full SNP recovery
53116 / 54229 (97.9%) partial SNP recovery
1964 / 2391 (82.1%) full insertion recovery
2242 / 2391 (93.8%) partial insertion recovery
2058 / 2537 (81.1%) full deletion recovery
2349 / 2537 (92.6%) partial deletion recovery

precisionFDA:

So, one thing that I think is worth pointing out is that you can get better concordance if you re-process the data (although the relative benefits are a little different for the two strategies provided above).

Also, in terms of DeepVariant, I was a little worried about over-fitting, but that was not a huge issue (I think it was more like an unfiltered set of GATK calls, but requiring more computational resources). Perhaps that doesn't sound so great, but I think it is quite useful to the community to have a variety of freely available programs; for example, if DeepVariant happened to be a little better at finding the mutations for your disease, that could be quite important for your individual sample. Plus, I got a $300 Google Cloud credit, so it was effectively free for me to use on the cloud.

As a possible point of confusion, I am encouraging people to use precisionFDA to compare (and possibly re-analyze) new data. However, there was also a precisionFDA competition. While I should credit DeepVariant to cause me to test out the precisionFDA interface, my opinion is that the ability to make continual comparisons may actually be more important than that competition from a little while ago. For example, I think different strategies with high values should be comparable (not really one being a lot better than the others, as might be implied from having a "winner"), and it should be noted that that competition focused on regions where "they were confident they could call variants accurately" Perhaps that explains part of why the metrics are higher than my data (within RefSeq CDS regions)? Plus, I would encourage you to "explore results" for that competiation to see statistics for subsets of variants, where I think the func_cds group may be more comparable to what I performed (or at least gives you an idea of how rankings can shuffle with a subset of variants that I would guess are more likely to be clinically actionable).

Saturday, March 9, 2019

Updated Thoughts on PatientsLikeMe

I have a previous post about PatientsLikeMe, but I importantly did not test creating an account until relatively recently. I have been continually improving my habits in terms of taking more time to critically assess results and question prior assumptions (in addition to realizing that I may have not had the best title for that previous blog post, in retrospect), so I thought there would be value in providing an updated perspective on this free website.

I have genomics / medical data publicly available to download on my Personal Genome Project page (for hu832966) and I have what I would consider a partial electronic medical record on my PatientsLikeMe page (which I think is an excellent resource for sharing and learning about patient experiences, with the requirement that everybody who participates be completely open; however, you have to sign in with a free account to view my profile).

For those that currently don't have PatientsLikeMe accounts, I thought I should describe a few of my experiences (from the perspective of a patient):

I have taken Citalopram at doses of 20 mg and 40 mg (and 0 mg, during intervals to test the continued benefit of the medication, when my overall stress levels were lowered and/or I learned better cognitive strategies to manage stress). While it makes quick analysis more difficult, I think being able to see the details of people's experience can be important. For example, I thought it was interesting that my body's reaction to the medication seemed to change over time (each time I went back on the medication, I think the side effects were more subtle, even though I think the severity of my initial symptoms also gradually improved over time). If this is in fact true, that would indicate some resistance / reaction that could not be completely captured from studying germline variants (if you are focusing on using DNA genotyping/sequencing for medication guidance), such as somatic variants, epigenetic modifications, etc.

I also like that PatientsLikeMe provides scores for both effectiveness and side-effects (and I admittedly created a PatientsLikeMe account because some plots in the "Health Communities" in 23andMe reminded me of what I had seen for PatientsLikeMe, even without previously creating a PatientsLikeMe account).

On the positive side, I have seen multiple neurologists, and I had previously not really found any of the previous migraine medication that I took to be helpful. However, my most recent neurologist prescribed me indomethacin, and I found that to be very helpful. I wrote a positive evaluation for that migraine treatment, and I was surprised to see that this was a relatively rare treatment for migraines. So, if people found commonly prescribed treatments to not be helpful, I think this might be helpful in brainstorming alternatives.

I also reported 3 negative evaluations for drugs where I experienced moderate-to-severe side effects. I noticed that severe side effects were self-reported for these drugs among 9-14% of members in the Community Reports (9% was comparable to other drugs that I checked, but the drug for which I had the most severe side effects in 2018 had a the highest severe percentage of 14% and qualitatively most frequent reports that seemed similar to by own experience). That said, the most commonly prescribed migraine medication (which I never tried) had a reported severe side effect rate of ~20% (so, it seems to me that a self-reported "severe" side effect rate of 5-10% is normal, but 15% or 20% with hundreds or thousands of patients may be kind of high). That said, I want to be very careful about being too negative about something that is not my area of expertise (even though the idea of something being helpful for some people and harmful for others seems relevant for genomics research).

Going back to the topic of my anti-depressant (for anxiety or depression, depending upon the time-frame of my treatment that you are talking about), the current maximum recommended dosage of Citalopram is 40 mg (with 60 mg now being considered unsafe), and that would match my own expectation (although for slightly different reasons - I had to drink coffee instead of tea due to extra drowsiness at 40 mg, and I am currently on 20 mg instead of 40 mg). I can also see a 2016 indication from the FDA that 20 mg is the maximum recommended dose for individuals greater than 60 years of age (so, the maximum recommended dose is currently lower for older individuals). You can also see more information about this drug in the 1998 drug approval package from the FDA. To be clear, I am very grateful for the availability of Citalopram and that has made a huge difference in my life, but I think this is something that may be worth discussing more (and I would probably also benefit from understanding better).

There has even been Washington Post article describing a partnership between PatientsLikeMe and the FDA to help with drug reporting (I saw this in a recent e-mail from them, but the article is actually from 2015 - still, it is good to know other people probably have at least somewhat similar thoughts). While they didn't mention PatientsLikeMe, I think this was also related to the topic of a more recent announcement regarding patients reporting "real-world evidence." You can also report adverse events to the FDA through MedWatch.

Update Log:

3/9/2019: original blog post
3/26/2019: changed link in 1st paragraph (and added another link in that sentence).
6/28/2019: add MedWatch link

Charles Warden's Science Blog

Saturday, May 18, 2019

Emphasizing "Hypothesis Generation" in Genomics

Saturday, May 4, 2019

precisionFDA and Custom Scripts for Variant Comparisons

Saturday, March 9, 2019

Updated Thoughts on PatientsLikeMe

About Me

My Websites

Blog Archive

Labels

Charles Warden's Science Blog

Saturday, May 18, 2019

Emphasizing "Hypothesis Generation" in Genomics

Saturday, May 4, 2019

precisionFDA and Custom Scripts for Variant Comparisons

Saturday, March 9, 2019

Updated Thoughts on PatientsLikeMe

About Me

My Websites

Blog Archive

Labels

Follow Me!