Charles Warden's Science Blog

Saturday, September 7, 2019

Updated Thoughts on The Language of Life

I have an earlier review on "The Language of Life" by Francis Collins (who was previously the lead for the Human Genome Project, and is the current director of the NIH). However, given that I am presenting this book at the Monrovia Library book club in October, I thought it would be good to have a newer post with some additional posts (and I will also create a post with discussion topics for the book club, in October).

First, I should critique my own previous post. For example, I currently feel more confident about preventing rare diseases than guiding drug treatments. However, I think there was a sense in the limits to prediction that I was trying to convey before (in terms of "designer babies" where a large number of traits could be predicted and selected), and I still don't support that (or necessarily believe that can / should be accomplished). I think this is also emphasized in the HFEA guidelines (described on page 55, in my edition) as well as some more updated scientist options (such as not using Polygenic Risk Scores for embryo screening).

I still really like that Francis Collins provided a balanced view of genomics (with both potential and limits), and I was glad to see that I could also notice that ~10 years ago.

My earlier post also reminded me of the statistic that "[adverse] drug reactions are the fifth leading cause of death in the United States" (page 233, in my edition), although I admittedly also forgot that in between the time that I first marked that page with a book dart and when I started writing the draft for this post.

Going back to the genomics and drug treatments, Francis Collins also mentions "the biggest reason for potentially deadly drug reactions is simple human error [but this isn't the only reason]" (page 233, in my edition). So, even though I implied that I had less confidence in pharmacogenomics (or at least I think we have to be more careful about the assessments than I used to), I really do think biomedical informatics can help patient care. In other words, making sure relatively simple actions are consistently understood and carried out appropriately is not trivial (which I also touch on when describing my cystic fibrosis carrier status, even though I believe that is more complicated than some people may expect), and that is something important that we can improve (without even using exceptionally complicated models / techniques). Nevertheless, to be clear, this area was fairly represented in the book, with an entire section of the 9th chapter called "Obstacles to the Pharmogenomics Revolution" (page 247-249, in my edition).

Now, in terms of my updated thoughts:

1) In the introduction, "Dr. James" (who was really Francis Collins) describes interacting with somebody with a BRCA1 mutation in their mother's DNA, saying "[the patient] faced a 50 percent risk of having inherited that misspelling, in which case her lifetime risk of breast cancer would be approximately 80 percent, and that of ovarian cancer about 50 percent" (page XII, in my edition). However, I think I have more recently gained better appreciation for the value in the range of risk estimates (at the gene or variant level). For example, there is a Stanford BRCA Decision Tool that provides the variance of risk with a few options (although I'm not sure about the the intervals of screening, I don't know what is the relative effectiveness of hormonal therapies, and these estimates are at the gene level when I would expect some specific variants are higher risk than others). Likewise, there was a recommendation for BRCA screening with a "B" grade in individuals with family histories, but a recommendation against BRCA screening with a "D" grade (which I think is shown most clearly on the US Preventive Services website). In other words, I believe the significance of the result (and the preventive option chosen) varies depending upon whether the individual has a family history of early-onset breast cancer (ideally, I believe, with a variant that specifically validates between cases and controls in their own family).

In the interests of space, I have saved a collection of informal notes in another blog post. This includes some things like strategies to define risk from family history from the CDC.

2) On page 194 (in my edition), Francis Collins describes "a company called Psynomics is marketing a DNA test for susceptibility to bipolar disorder, arguing that this information could be useful in establishing the diagnosis in an uncertain case. The test being offered, however, is based upon a variation in a gene called GRK3, and this has not been validated in a large-scale study. This result could turn out to be utterly useless. Even worse, this kind of unvalidated test, utilized by individuals or their physicians to make a serious diagnosis in an uncertain situation, might do more harm than good." As with pretty much all of the posts, I will probably update my review of "Blueprint", but I agree with concerns about the over-estimation in the accuracy of tests that can possibly negatively impact the rest of someone's life (as well as my opinion that the reaction to genetic results may be particularly important for mental health).

Similarly, on page 204, a company offering testing of V1aR variants for $99 to test for increased susceptibility to infidelity is also presented with an appropriately critical view that "the actual influence on the behavior of an individual male is quite modest, and should certainly not be used in mate selection or as an excuse for cheating on one's partner.

3) Perhaps it is a bit of a tangent; however, in terms of the rare diseases, the first episode of Diagnosis on Netflix involves a patient story that was resolved with Whole Genome Sequencing (WGS) to determine the cause of her ailments for the past 10 years were due to CPT2 (meaning her symptoms could improve by increasing sugar and decreasing fatty acids in her diet). I was surprised that she went to Italy for the diagnosis (where she was treated for free after the arrived, but I would have expected treatment costs to usually be above a few thousand dollars to justify the trip; I could get Veritas WGS data for $1000, but I did need to re-analyze it).

I was also surprised that her US doctors were trying to sue her for hundreds of dollars of medical bills (when she was already in debt, and the treatments weren't helping her in the long-run since they didn't reveal the underlying problem). However, that unfortunately seems like it may not be an isolated incident: for example, I recently heard about this happening to a large number of individuals in the UVA health system.

However, getting back to this book, on page 92 (in my edition), Francis Collins warns that some nutrigenomics companies are running "consumer scams," while there are legitimate rare diseases whose symptoms can be improved with diet (such as PKU). I was also skeptical about some of my nutrigenomics results, but it sounds like the Netflix show also provides a genuine example where genetics can inform diet (and vastly improve your quality of life).

Also, similar to my 1st post, here are some assorted minor points:

a) Francis Collins (as Dr. James) indicated some someone from Navigenics implied that "most of the remaining genetic risk factors for common disease will have been discovered in the next two or three years; as a scientist working in this field, that seems unlikely to me" (page XXII, in my edition, emphasis added). I also don't think Navigenics exists anymore - at least the Wikipedia company link does not go to a genetics company website (even though they also mention it was acquired by Thermo Fisher in 2014).

b) As noted in the first post, Francis Collins has blue eyes when 23andMe predicted them to be brown (page XXVIII, in my edition).

c) I am a Bioinformatics Specialist (doing genomics research). However, I don't think that term was in widespread use when the book was written. For example, I believe his term "DNA cryptography" (page 13, in my edition) is meant to be synonymous with "Bioinformatics."

d) Reading this book also influenced another blog post, in terms of the discussion of the ACLU Supreme Court case invalidating Myriad's patents on the BRCA1/2 genes and contrasting his own actions for the CFTR gene for cystic fibrosis.

e) On page 187 (in my edition), Francis Collins describes "[one] remarkable gene in the brain is estimated to be able to make 38,000 different proteins." However, I kind of wish there was a reference to the citation in the primary literature. For example, I thought most cells tended to have one predominant version of a gene transcript, and I am worried about false positives (or at least rare alternative splicing events) when describing very large numbers of isoforms for genes.

f) On page 317 (in my edition), 23andMe is listed as testing for the Δ508 cystic fibrosis variant. While I got my first 23andMe test in 2011 (a little after this book was published), I am a carrier for a different cystic fibrosis variant. So, 23andMe currently covers more than just that one cystic fibrosis variant.

Finally, I specify "in this edition" whenever I reference something from the book. However, I think the relatively newly purchased paperback was still the 1st edition. So, I'm not sure how necessary this is. However, in terms of trying to minimize errors in peer-reviewed publications (and making sure people acknowledge and correct errors), I think the concept that books have editions may be kind of important.

Update (10/19/2019): After I finished re-reading the book (again - to prepare for leading the book club discussion), I thought I should write a little more to make sure that I am to down-playing the pharmacogenomics part too much. While I do think the introduction is a fair match to my interests / opinions, I do want to make clear that I am sure there are important genomic applications with decent predictive power for guiding drug dosage, drug effectiveness, and/or serious adverse side effects.

So, similar to the separate blog posts containing notes on BRCA1/2 pathogenic risk, high-to-moderate inherited cancer risk frequencies for pathogenic variants, and APOE variant frequencies and Alzheimer's Disease risk, I will try to add a few links about the influence of VKORC1 on Warfarin / Coumadin dosage. However, I have spent considerably less time looking into that, so this will just be bullet points below (instead of a separate blog post):

Rieder et al 2005 - paper cited in the figure caption in the book (I think "Haplotype" concept is made a little more clear in the original figure, as well as explaining the use of "A" and "B" in the book)

There is a more recent 2009 paper, but the genotypes are described in a different way (and what I wanted to do was try to judge if there should have been larger error bars in the earlier paper).
For example, Figure 2 in the 2005 paper matches the sort of error bars that I was expecting with VKORC1 gene expression (which is noticeably larger)

Limdi et al. 2010 - paper showing variant predictive power across racial groups; mentions both SNP and haplotype, but I mostly notice that the variant is at 91.03% in the Asian population, 10.05% in the African population, and 37.81% in the Caucasian population(so, variability is not equal from individuals with different ancestry)
23andMe blog posts with "Warfarin" tag - there is a blog post mentioning providing a 23andMe report for the variants listed on the label, but perhaps that doesn't have approval to be added back?
FDA Coumadin labels - describes GG, AG, AA genotypes for 1639G>A variant (which actually would be a SNP, even though SNPs sometimes do represent larger haplotypes)

ClinVar link above actually only has 2 stars, but "Warfarin Response" in MedGene has more information, included dbSNP ID (rs9923231) and that variant also has clinical annotations in PharmGKB
The variant in intergenic, but you can see your genotype (I am CC, which should be GG in format above), even without a formal report --> from PharmGKB, I can tell I have the variant associated with requiring a higher dose

On the other hand, I really do have some interest in understanding (and critically assessing) the use of genomics for depression treatment. I have tried to collect some notes on that within my review of "blueprint", as well as expressing concerns I have about what people might percieve about the predictive power of genomic data and anxiety / depression (based upon my own personal experience as well as some general genomics research experience).

Change Log:

9/7/2019 - public post date
9/8/2019 - revise post from sister's feedback; minor changes
9/9/2019 - add UVA example + NCCN guidelines + additional Twitter / blog link
9/10/2019 - minor changes
9/13/2019 - add links from the CDC
9/14/2019 - move longer set of BRCA1/2 notes to separate post
10/19/2019 - add update with pharmacogenomic notes

Monrovia Library Book Club Discussion Topics for "The Language of Life"

You can see my thoughts in an two earlier blog posts (my first ever blog post in 2010, as well as a more recent blog post in 2019).

However, the book club (at 6:30 PM on Tuesday October 22nd) is really about other people's thoughts (although I hope this non-fiction book helped with understanding about genetics/genomics).

So, here are some discussion topics, which I think could be of interest (even if you didn't already have a passion for genomics):

1) In general, what did you find to be the most interesting part of the book?

2) Did you think this was a good introduction to genetics / genomics? If not, I also recommend reading "The Cartoon Guide to Genetics" (which was required for my AP Bio class in High School, along with a more formal textbook). However, please be aware that the cartoons within the book are in black-and-white. Also, as with just about anything else, the book isn't absolutely perfect: for example, there is a reference to 200,000 genes in the human genome on page 80 (which was believed at one point, but I would now say we feel much more comfortable with 20,000 genes that can be relatively consistently transcribed).

3) There is a section of the 7th chapter about the influence of genetics on Criminality. For example, there are a few paragraphs about the X-linked MAOA gene. While I mostly have to trust the study was fairly presented (and the reproducible in subsequent studies), a study showing decreased expression of MAOA was associated with increased risk of violent behavior and criminal convictions, but only if the individual was abused as a child (page 202). So, I think this is a good example of a gene-environment interaction, but I don't know how strong / predictive the risk association was.

Likewise, to put things in perspective, Francis Collins also pointed out "approximately half of the US population carries a genetic risk factor that places people at a sixteenfold higher likelihood of imprisonment than the other half. That happens to be the Y chromosome" (also on page 202).

Would your opinions of someone change if you knew they had a negative genetic predisposition (and you thoroughly understood exactly what has been observed and how much of an effect that has)? For example, what do you think about giving somebody a lesser or more severe sentence because of their genetics?

4) Also in the introduction, Francis Collins discusses Alzheimer's disease risk, and questions the value of returning results when there is nothing that can be done medically (page xx, as well as illustrated on page 222).

I (Charles Warden) carry one copy of the APOE E4 risk variant (and I know which parent also has that risk variant).

4a) What do you think about a risk assessment for a disease that cannot be prevented or treated?

4b) Does that opinion change if I emphasize the need for you (and your genetic counselor, physician, etc.) to have access to the data to calculate the risk assessments, as well as making sure that you have access to your raw data for re-analysis / evaluation?

If interested, you can see my longer list of informal notes in another blog post. However, the main message I think I should explain is that it takes some time to get confidence in a risk assessment (and I think there should ideally be some sort of access to the primary data used to come to those conclusions).

While I won't focus on what (from what I understood) were the less representative results here, my impression is that the more robust conclusion was similar to what was reported in my 23andMe report, Genin et al. 2011, and Myers et al. 1996 (which I am using to report the following statistics):

~55% of E4/E4 individuals developed Alzheimer's Disease (with an age of onset ~80 years)
~27% of E4/E3 individuals developed Alzheimer's Disease (with an age of onset ~85 years)
~9% of E3/E3 individuals developed Alzheimer's Disease (with an age of onset ~85 years)

Likewise, my 23andMe Report says "Approximately 40-65% of Alzheimer's patients have one or two copies of the APOE ε4 variant. However, many people with the APOE ε4 variant will not develop late-onset Alzheimer's disease" (citing Alzheimer's Association 2016).

5) Do you have any direct experiences with genomics results (from 23andMe, AncestryDNA, uBiome, Genes for Good, American Gut, etc)? For example, I have recorded some of my relatively recent experiences in this set of blog posts.

Having 5 questions to guide the discussion may already fill an hour (with a group of 20-30 people). However, I hope the blog post can help with discussions before the book club (to help me better prepare) as well as after the book club (if anybody doesn't have a chance to express their opinion).

Change Log:

9/7/2019 - public post date
9/8/2019 - revise post from sister's feedback; minor changes
9/9/2019 - trim content
9/10/2019 - fix typo
9/11/2019 - add extra APOE E3/E4 citations (from 23andMe, ClinVar, and accepted middle-author paper; although I think the last of which was also in the pre-print)
9/13/2019 - add CDC links
9/14/2019 - separate blog post for detailed APOE notes
10/1/2019 - minor changes

Book Review for "Blueprint"

Update (10/1/2019): If you would like to read a shorter review that shares my main overall opinions, please check out this.

Otherwise, there are some additional points being made in this blog post, which I think still has some value.

First, I think this book was very helpful in terms of critically assessing plots for Polygenic Risk Scores (PRS):

I had a bit of difficulty finding matching public images that I can post here, and I don't think there is a public interface (kind of like data.color.com; or some of the PGC data and/or the UK Biobank, even thought I had an issue with this link more recently) to query the TEDS data (or CAPS data, WTCCC data, etc.). However, I would be very happy if somebody could how how a non-scientist can reproduce the results from the book.

Nevertheless, Figure 2 in Chapter 2 (page 25 in the paperback edition) shows scatter plots for weight where monozygotic (MZ) twins have a correlation of 0.84 and dizygotic (DZ) twins have a correlation of 0.55. I think having concrete examples to show the spread of these correlations (which directly relate to heritability values in this book, defined as 2 times the difference between MZ and DZ correlations on page 27 in the paperback edition) is very important.

Scatter plots can also be important for interpretation. In an earlier version of this blog post, I had a plot of ranks from here, but I noticed more recently that the linked image was not appearing in the post. So, I kept searching, and I found some data from this post. Using code that I uploaded here, I created the following plots for BMI correlations for MZ versus DZ twins:

You can clearly see that the correlation is higher for twins that are 100% identical (MZ) versus 50% identical (DZ). However, importantly, the ability to predict a BMI for an individual twin also has limitations. The first row shows all points, and the second row uses the same data but provides a density distribution to see where a lot of points are close together.

You might also notice that the correlation coefficients are both lower than mentioned in Blueprint: as much as possible, I hope independent validation in multiple large cohorts is helpful, and transparency in data and sample section/filtering is also important. However, at the current time, I am not sure what explains the difference in correlation coefficient, even if both datasets come to the same conclusion that the genetic impact is higher for monozygotic twins than dizygotic twins.

In the context of a Polygenic Risk Score, creating a scatterplot between the true value and the predicted value may also be helpful in interpreting and critically assessing the results.

In terms of an introduction, I think Chapter 12 (The DNA fortune teller) is quite good in terms of explaining how PRS are calculated and presented (although I kind of whish it was called something different, like "Introduction to Polygenic Risk Scores," since the main thing that was clear to me was the limitations and that seems strange for something called a "fortune teller"). For example, that chapter says "[the] most predictive polygenic risk score so far is height, which explains 17 per cent of the variance in adult height" (emphasis added, page 139 in paperback edition) as well as showing a scatterplot for actual height versus PRS for height (Figure 5, page 142, paperback edition), and Robert Plomin specifically has an actual height at the 99th percentile but a PRS for height in the 90th percentile (as a sort of "best case scenario" for a PRS).

Similarly, the Plomin's PRS for BMI was at the 94th percentile, while his actual BMI was at the 70th percentile (page 146, paperback edition). He explains this in terms of being at the 99th percentile for height and possibly having to take extra effort to keep off weight (which I think sounds like a plausible combination of factors, in addition to general limits to predictive power). If most people consider this as a motivating factor (similar to the author), perhaps that is good.

I also really like that Plomin gives examples of PRS percentiles for himself (Figure 11 on page 160, paperback edition): 22% for bipolar disorder, 35% for major depressive disorder, 39% for Alzheimer's disease, 85% for schizophrenia, and 94% for educational attainment. That said, while I think some of his "self-understanding" discussions about his schizophrenia PRS may be acceptable in a research setting (page 151 and 177 in the paperback edition), it sounds like 85% is not a high enough score to be relevant in a clinical setting (if something like the PRS could increase the chance of somebody being institutionalized with less direct evidence). This kind of makes sense for a disease with less than 1% prevalence, although I think that does bring into question the value of using common variants (instead of rare variants) in the PRS calculation.

Another useful plot is density plots for extreme values (with the range and overlap of PRS values for each of those populations. Again, I am trying to show you something in the book without copying it, but I can make some representative examples in R:

For example, I would say the simulated example on the left looks good, but the utility of the example on the right could be questionable (although I think this is encountered more often, especially if you took a randomly selected trait and your own generated PRS).

On the other hand, what I thought could be misleading was the decile plot shown in Figure 6 on page 144 in the paperback edition. Yet again, I'll use an example on-line (from Figure 3 of Calafato et al. 2018, instead of in the book). However, one of my top Google searches happened to be for psychiatric traits (rather than height as a positive example):

To be fair, this does make the schizophrenia PRS look like it may have some value with a percentile >90% (matching the expectation not too much should be read into for the 85% PRS for the author), and it looks like the bipolar PRS is probably of limited utility. Nevertheless, you might have something that is not very predictive (with a lot of variability in the scatter plot) in a decile plot that looks like the 1st 9 deciles for schizophrenia (in that paper).

While I think the example with height was meant to give some sense of an inflection point above the 80th percentile, I think this does not do a good job of capturing the variability that you would see in a scatter plot (so, I think showing a scatter plots and density plots should be required for any PRS). In particular, I believe the crucial point is made by Plomin on pages 143-144 (emphasis added): "The line running through each data point [in the decile plot] is called the standard error...Note that the standard error refers to the average of each group, not the error of estimating an individual's score...It does not mean that the actual height of 95 per cent of individuals in the top decile of polygenic scores will be in this range."

Second, I admittedly started the book with a bit of a negative impression - the prologue mentions predicting depression (and schizophrenia / school achievement) "from the moment of your birth, it is completely reliable and unbiased - and it costs only £100" (page vii, paperback edition). As somebody who has had to manage anxiety and depression, I know that symptoms are context-dependent and change over time. So, even without taking limitations to the genomics predictions into consideration, I would say that something like "probability of having at least 1 depressive episode" likely has a genetic component, but whether or not you have depression / anxiety at any particular interval (and whether or not that requires medication) will require additional factors. In other words, I think there are a lot of exceptions to the assumption "[psychologists] study hundreds of traits, which is their collective label for differences between us that are consistent across time and across situations" (page 3, paperback edition).

I also don't believe I completely agree with the claim "[for] the first time, genetics offers a causal basis for predicting disorders rather than waiting until symptoms appear and trying to use these symptoms, rather than causes, to diagnose disorders" (page 66 in the paperback edition). For example, I believe I even had a psychiatrist who explicitly said that getting caught up on the names for the diagnosis can sometimes cause problems beyond trying to treat symptoms. However, I do agree that "[whether] you become anxious or you become depressed is caused by environmental factors" and I believe there is some useful insight in terms of clustering "internalizing problems" and "externalizing problems" (both on page 67 in the paperback edition).

Additionally, I think there is an important point being made about continuous traits and PRS values on page 164 in the paperback edition: "A second way in which polygenic risk scores will transform clinical psychiatry is by moving away from diagnoses and towards dimensions. One of the big findings in this book is that the abnormal is normal, meaning that, from a genetic perspective, there are no qualitative disorders, only quantitative dimensions". So, practically speaking, on the trait side, there are certain thresholds for needing to take action (such as not being able to function at work, at least without treatment / adaptations). Likewise, I believe you still need concrete examples of less severe behavior to watch out for, in order to possibly identify problems early and prevent progression.

I certainly hope that there can be ways to better identify problems at an early stage in order to have the sort of prevention described on page x (of the paperback edition), but I think it is also important to be realistic about predictive power. In other words, if the true predictive power is lower than you expect, then providing a diagnosis based upon DNA sequence alone (at least using one strategy of interpretation) might contribute to unnecessary stigma for a patient. Given the prevalence of depression, I think there are a number of things that probably should be done to improve perception of mental illness (so, a false positive would be less of a big deal). However, I would be more concerned if limits in predictive power were not properly understood in situations where a false positive could negatively impact the rest of a person's life (if we assume a person will have a severe mental health problem, without any evidence from their actual behavior). So, it is the part about "This means that we can foretell our futures from birth. For example, in the case of mental illness, we no longer need to wait until people show brain or behavioral signs of the illness and then rely on asking them about their symptoms" (emphasis added, page x of my paperback edition) that I think is either not being commentated precisely (especially in the present tense) or causes me concern.

To be fair, I have general experience with genomics research (and personal experience with mental health problems), but I didn't have any previous psychiatry research experience reading this book. So, what may very well be true is that the genetic predictors are better than other risk factors. For example, Plomin says "[there] are very few large effect sizes in psychology. On example is that general intelligence accounts for about 25% of variance in educational achievement." (page 31, paperback edition). However, I think it is also important to keep in mind how the predictive power for these traits / illnesses compares to other associations, and there is still a need to fairly judge each situation independently.

In other words, before reading this book, I thought I mostly remembered schizophrenia having the least significant association. For example, in Selzam et al. 2019 (with Robert Plomin as the last author), the Polygenic Risk Scores in Figure 1 had significantly higher beta coefficients for height and BMI (or the other cognitive traits) than schizophrenia (SCZ), which was opaque in the lower-right because it's own p-value was greater than 0.01 (and the difference is indicated as not being significant). Likewise, this review concluded "[these] limitations mean that [Polygenic Risk Scores] are not yet clinically useful in psychiatry." The book itself also describes limited success for psychological disorders in a 2007 study, although studies with even greater sample sizes were emphasized after that (such as from the Psychiatric Genomics Consortium).

However, to be fair, the Figure 1 in the PRS paper linked above does show better success in predicting educational attainment (General Certificate of Secondary Education, GCSE, in that paper). While I don't discuss it much in this post, this is a topic of discussion in multiple chapters of the book. As yet another way to compare PRS, the Epilogue of the book (on page 187 of the paperback edition) summarizes: "polygenic scores...can predict 17 per cent of variance in height, 6 percent of variance in weight, 11 per cent of the variance in school achievement, 7 per cent of the variance in intelligence, and 7 per cent of the variance in liability to schizophrenia." However, I was less impressed with Figure 10 density plots on page 158 of the paperback version of the book (showing density distributions for the top / bottom 10% of educational attainment PRS, as a function of GCSE score percentile), so perhaps the lower values than Height or BMI for the beta coefficients in the paper should also be emphasized (and, while I think the paper seems to be a better match to my expectations, I don't believe this is entirely consistent with the relative percent variance explained in the Epilogue of the book).

Outside of the book, I did stumble across a gene that was specifically named because of it's association with schizophrenia (DISC1, although that might have been from the NCBI entry for mouse name for Disc1). However, it is probably also helpful to have numbers like "if one sibling is diagnosed as schizophrenic, their siblings have a 9 per cent risk of being schizophrenic, much greater than the rate of 1 per cent across the general population" (emphasis added, page 71, paperback edition). In that situation, there is considerably increased risk, but predictive power is still low. Likewise, I have concerns that the reader may over-estimate the predictive power from sentences like "[for] schizophrenia, DNA differences packaged as polygenic risk scores are now the best predictor we have for who will become schizophrenic"(page 126 in the paperback edition), even though that may in fact be true (and the predictive power for other traits / risk factors is just worse).

There was also at least one paper that indicated "twin studies [can overestimate] heritability," and my comment on that paper references a pre-print where it looks like varying definitions of heritabilty can be used (between the twin 2*|cor_MZ - cor_DZ|) I also noticed that the MaTCH entry for cystic fibrosis (under "ICF/ICD10 Subchapter") wasn't especially high; I'm not sure if that is an issue with sample size, but that makes me think this measure may not be absolutely perfect in terms of representing how well we understand the biology of a given disease (or the severity of the rare disease). I also thought it was strange that the dizygotic twin correlations were higher than the monozygotic twin correlations for cystic fibrosis. Perhaps I should look more into the associated Polderman et al. 2015 paper.

I also noticed this other blog post about the genomics in psychiatry (HT @elo81). The context here is a little different. However, the article where I first heard about Myriad's GeneSight had a subtitle that implied some confusion in the ability of this pharmacogenetic test to be used for diagnosis (which was not what the content of the article showed: you may be able to use genetics to guide testing different medications, but I didn't see any evidence that this particular test could diagnosis whether you actually have depression). Additional, this article says "United HealthCare in August announced that it will cover panels of genetic test for guiding the use of drugs for major depression and other depressive disorders, although the American Psychiatric Association’s research council last year concluded that the evidence for testing in those indications is not conclusive", which I believe is specifically in reference to GeneSight? Either way, the end of this article has two citations (Zeier et al. 2018 and Zubenko et al. 2018) that discuss the field (the later of which describes clinical trials for GeneSight).

In general, you can also find some information on ClinicalTrials.gov (for GeneSight). While some results are more clear than others, I thought this was interesting. For example, I think one was "Completed" but actually canceled (under "Results Submitted")? One is recruiting in Canada (this is for the US National Library of Medicine). Some can be complete yet have "No Results Posted" (as opposed to not having results because the study still active). In fact, there was only one with results (NCT01610063, out of the 10 from my search), and it has a link in the original table of search results (to make it stand out more).

There were twice as many lost to follow-up for the Guided (using GeneSight) versus Unguided treatment, but I don't know how often that happens. The difference for "primary outcome" (and multiple secondary outcomes) was greater for red category results than green/yellow combined category (which matches my expectation that what would give me the most confidence was for me to have a red result and test taking the medication - even though I don't recommend that for most people, particularly if you have no good reason to take such a risk). I certainly don't want to downplay being able to identify adverse side effects better in 15-20% of patients (if I understand that correctly, that does matter, particularly if you consistently see that between independent cohorts), but I also don't want people to think that the method was precise enough to predict exactly what they should take on the first try.

Third, I am sure I would be guilty of this if I tried to write an entire book (and this is part of why I have "change logs" on my blog posts), but there were some situations where I think there was some room for improvement in the wording. For example, I think there are some valid points within sections like "Parents Matter, But They Don't Make A Difference" (page 82 in the paperback edition) or "Schools Matter, But They Don't Make A Difference" (page 86 in the paperback edition), but there was understandably some complaints described in the afterword (page 191 in the paperback edition).

For example, on the positive side, the explanation of what is and what can be (at a population measure, as being reported in these studies) that is described several times in the book (including, but not limited to, page 192 in the paperback addition) is useful advice that I have used at least one time when making a point in causal conversation. However, using statistics as an incentive for change is different than than presenting fate as highly deterministic from genetics (and therefore predictive regardless of future action), and I think this is a caveat that may require less emphasis on the predictive power (and therefore hopefully avoid the need for this additional explanation).

In terms of sections like "Life Experiences Matter, But They Don't Make A Difference" (page 89 in the paperback edition), I think Plomin is right to discourage people from being overly worried about small mistakes. However, figuring out what life experiences are traumatic enough to need extra effort to avoid is important. It is also my opinion that convergence of traits like personality and mental health over a long enough period of time can perhaps be though of like needing to go through developmental stages (even as an adult), where certain concepts are easier to understand after you have gone through certain first-hand experiences. In other words, I believe having some sort of anchor for clear understanding can be important for preventing certain problems; while I agree with tacking these issues as early as possible, I would therefore disagree that certain traits / diseases can be prevented from birth (if experience / communicating / logic are required to understand the underlying problem and modify behavior).

I also think it is extremely important that Plomin acknowledges "Severe genetic problems such as single-gene or chromosomal problems or severe environmental problems such as neglect or abuse can have devastating effects on children's cognitive and emotional development. But these devastating genetic and environmental events are, fortunately, rare and do not account for much variance in the population" (page 85 in the paperback edition). However, I think this may also get back to some limitation in the heritabiltiy measure for cystic fibrosis (a single-gene disorder that I believe is one of better examples some thing that can realistically be prevented with methods like IVF+PGT).

Likewise, I thought an insightful example was provided in the Afterword on page 197 in the paperback edition: "No prediction is perfect, especially in behavioral sciences. We often make big decisions on the basis of much weaker correlations. For example, the correlation between blood alcohol levels and automobile accidents is weak, but that doesn't, and shouldn't deter us from making strict laws about drunk-driving." I do wonder if perhaps choosing a title other than "blueprint: how DNA makes us who we are" would have helped with the "No prediction is perfect" part, but that is also discussed in the Afterword (on page 190-191 in the paperback edition).

Fourth, on page 180 in the paperback edition, Plomin mentions "dating websites might extend their data to include polygenic scores...Unlike the hype of dating websites, polygenic-score information could be verifiable through password-protected links to a direct-to-consumer company". While the limits to predictive power in "percent match" on dating websites might be a good analogy to the limits to "hypothesis generation" for some genomics applications, I certainly wouldn't encourage something like this. Plus, I'm not quite as certain about the genomics verification being foolproof (which also means that I would at least somewhat disagree with the sentence on page 181 that "You can't fake or train your DNA"). For example, I had a strange experience when I uploaded my 23andMe data into FamilyTreeDNA, and Francis Collins was able to submit his DNA sample under another name to mutiple companies (as described in the prologue to The Language of Life, which is a book that I admittedly prefer, and I have a blog post with an updated summary of thoughts as well as a set of discussion questions for a book club).

Similarly, I have some concerns about the suggestion of using polygenic risk scores for job interviews (mentioned on page 181 of the paperback version), even though I certainly acknowledge limitations in fairly accessing somebody during a brief screening / interview process.

To be clear, I don't expect any system to be completely foolproof (kind of like it is possible to pay somebody to take your SATs, but that is rare and we have only heard about wealthy people doing that). However, I still strongly disagree with using PRS for dating apps or job applications, which I believe is in line with the view that PRS should usually not be used for embryo selection.

Finally, on a closing note, the Afterword has a section called "Public Reaction" (pages 199 in the paperback edition). Plomin describes a highly positive response after describing critiques from scientists and the media: "Far from the nightmare predicted before publication, the public reaction has been positive beyond my wildest dreams." However, my concern that individuals with less background in an area may initially have a more positive reaction, but those individuals may have a more negative reaction in the long run (if there were limitations that were not made clear to them, especially if they repeated incorrect or imprecise information to others). While I realize this can make success hard to define, I think this may be important for many genomics researchers and companies.

Change Log:

9/7/2019 - public post date
9/8/2019 - fix typos; minor changes
9/16/2019 - add early link to book; minor changes
9/20/2019 - add pre-print citation
10/1/2019 - add other review link + minor subsequent changes
10/2/2019 - add links related to GeneSight
10/20/2019 - add ClinicalTrials.org links
11/14/2019 - add link to swirl lesson with Galton's height data.
7/6/2020 - change tense for "polygenic risk score" label

7/9/2022 - provide alternative BMI scatterplot, and modify content accordingly; minor changes

Also, I moved the following paragraph out of the main text, given that I found it somewhat confusing on re-reading (even if the view of the variation might in fact be of some interest):

Similarly, I thought it was interesting to see Sir Francis Galton's parent-child height data in the "1: Introduction" lesson in the swirl course for "Regression Models" (the scatter for what I believe is one of the most heritable traits is still noticeable and the data is being brought up in terms of describing regression towards a mean). Also, if you work interactively the data, I thought it was useful to deviate a bit from the instructions and create a plot using smoothScatter(galton$child ~ galton$parent). Also, to be clear, you should expect more variability for a parent-child plot than a MZ/DZ twin plot.

Also, for reasons of brevity, I thought it might help to move out the following content (and re-number the later sections):

Third, I thought it was a bit odd that the author referenced an error in the afterword without actually correcting it. Namely, on page 113 in the paperback edition, Plomin says "[if] a SNP is associated with a psychological trait, that means the SNP was expressed." If the variant changes the protein coding sequence, then expression of that gene is important. However, I remember Plomin also mentioning that variants for psychological traits are often located in non-coding regions: "most DNA associations with psychological traits involve SNPs in non-coding regions of DNA rather than in classical genes" (page 116 of the paperback edition). Without getting into whether that is actually causing more false positives for those associations, intergenic or promoter variants do not need to be expressed themselves (in order to affect expression of a causal gene). This is also mentioned in the Afterword (page 198 in the paperback edition) in the context of epigenetic regulation, but I am a little confused about the counter-argument to epigenetic regulation (except to say some variability, like drug resistance, can be caused through epigenetic mechanisms that won't be captured by Polygenic Risk Scores) and I don't see anything explicitly saying "the SNP doesn't have to be expressed" particularly if a lot of SNPs are in non-coding regions.

Also, it is probably a minor point, but I have some issues with the sentence "[the] rest of this book focuses on SNPs, because they have played a central role in the DNA revolution" (page 113 of the paperback edition) because i) certain classes of SNPs can be called with higher accuracy than indels (insertions and deletions) but I think indels may on average have a greater effect on function in coding regions (kind of like saying "you only searched for your keys under the street light because that is where you could see best") and ii) something about "DNA revolution" strikes me as something generally associated with hype (even if there are contexts where that truly is a fair representation of the advancements in technology and medicine). This 2nd point is kind of like calling the discovery of the double-helix by Watson and Crick as "the most important ever produced in biology" (page 110 in the paperback edition).

I also noticed that the predictive power of the Fabbri et al. 2019 PRS for resistance to depression treatment was not very impressive, but that may relate to epigenetic changes (getting back to the Afterword comment).

Charles Warden's Science Blog

Saturday, September 7, 2019

Updated Thoughts on The Language of Life

Monrovia Library Book Club Discussion Topics for "The Language of Life"

Book Review for "Blueprint"

About Me

My Websites

Blog Archive

Labels

Charles Warden's Science Blog

Saturday, September 7, 2019

Updated Thoughts on The Language of Life

Monrovia Library Book Club Discussion Topics for "The Language of Life"

Book Review for "Blueprint"

About Me

My Websites

Blog Archive

Labels

Follow Me!