Wednesday, May 22, 2019

Speculative Opinion: Possible Advantages to Directly Providing Generics via Non-Profits


I believe there is a lot more I should learn more about this topic, and I have never been directly involved in a clinical trial.

Nevertheless, these are my current thoughts about the possibility of what might be interesting about providing having generics directly enter the clinic/market through non-profit organizations (admittedly largely influenced by my experiences in genomics, which may be less relevant for some other applications):

Possible Advantages to Patients / Physicians:

  • [data sharing / diagnostic transparency] Maximize public / accessible information available in order to help specialists make "best guesses" about how to proceed with available information
    • I don't believe that sale of access to raw genetic data should be allowed
    • Specialists / physicians should have access to maximal information to help guide decision making process
    • I think it would be nice if some information was completely public, such as population-level data from Color Genomics (even though re-processing data can probably change some variant calls, and this company isn't a non-profit).
    • In general, I think it is important not to place too much emphasis on any one study.  As an example of how that could skew a true estimate of risk, I think there is a useful barplot in this paper.  While over-fitting is not always the explanation, that can be a factor and I think I have a figure in this blog post that I hope can help explain that concern.
      • I am most familiar with this in the context of genomics (which would be for research or diagnostic purposes).  However, I have submitted several FDA MedWatch reports (again, mostly for diagnostics), and there is still a need for surveillance of therapeutics after they have entered the market.
  • [data availability for patient autonomy] Making sure patients have access to all data generated from their samples
    • Having access to your raw data should also help you be capable of getting specialized interpretation as a second opinion.
    • I also think self-reporting (with the ability of the patient to provide raw data) may help with regulation (or at least setting realistic expectations about efficacy / side-effects).
  • I think it may help if there was more judicious use of advertising.
    • Namely, I worry that some advertisements can give a false sense of confidence in the interpretation of results.  For example, I posted this draft a little early because of 23andMe's marketing of travel destinations based upon ancestry, which I don't approve of (although I support other overall goals for 23andMe).
    • That said, I think it can be useful when digital advertisements allow you to comment on them, kind of like a mini self-reporting system.
  • If we are talking about a therapy (rather than a diagnostic), I would expect this should also decrease costs (and is what I most commonly think of when I hear the word "generic").  Otherwise, I am mostly talking about experience with the exchange of information, often dependent upon sequencing/genotyping from another company (like an Illumina sequencer) that frequently makes use of open-source software (or analysis where unnecessarily complexity may sometimes even cause problems).

If this makes production via non-profit preferable, then perhaps a penalty for not meeting the above requirements could be an organization could risk losing it's non-profit status.  Otherwise, I am primarily concerned that the above conditions are met (at least in genomics), and I am just curious if being a non-profit might help in sustainable accomplishing that goal (although I lack knowledge on many of the accounting and legal details, and I don't have experience running a non-profit or for-profit organization).

Possible Advantages to Providers?

  • Assuming expectations are defined clearly and appropriately, participation in "on-going research" may improve understanding (and forgiveness) when there are many unknowns (and possibility even limits to what can be known with high-confidence in the immediate future)?
    • I called this "Decreased liability?" in an earlier version of this post, but I have gotten feedback that makes me question whether this is precisely what I want to describe.
    • If I understand things correctly (and it is possible to show that precisely defining all costs to society is difficult), it seems like forgoing royalties / extra profits in exchange for limited liability (kind of like open-source software, as I understand it) could be appealing in certain situations.
    • Strictly speaking, I see a warning of limited liability within the 23andMe Terms of Service (if you actually read through it).  However, I also know that I am entitled to $40 off purchasing another kit, because of the KCC settlement.  So, I would expect actually enforcing limited liability would be easier for a non-profit (if their profits were limited to begin with, it is harder to get extra money from them).
    • So, even though I believe the concept of limited liability applies in other circumstances, I think public opinion of the organization is important in terms of being patient and understanding when difficulties are encountered.
  • Decreased or lack of taxes paid by non-profit?
    • I think part of the point of having a non-profit is making the primary focus something other than money.  However, I think this link describes some financial advantages and disadvantages to starting a non-profit.
    • There was one person who raised concerns that non-products can't produce products (at least if I understood them correctly).  While I admit that I don't fully understand the tax law, I think connections to researcheducation, and/or "public goods" qualify for the examples that I am thinking of.  So, I can't tell if any rules need to be changed, but I found some summaries on-line that make me think things may currently be OK (such as here and here).
    • At least from my end, this page says what I thought of when I was saying something should be offered by a non-profit: "Charitable nonprofits typically have these elements:  1) a mission that focuses on activities that benefit society and whose goal is not primarily for profit, 2) public ownership where no person owns shares of the corporation or interests in its property, 3) income that must never be distributed to any owners but recycled back into the nonprofit corporation's public benefit mission and activities....In contrast, a for-profit business seeks to generate income for its founders and employees. Profits, made by sales of products or services, measure the success of for-profit companies and those profits are shared with owners, employees, and shareholders."

I also originally had a bullet point for "If profits are limited, what about refunds?".  However, I decided to place less emphasis on that point after additional feedback.  For example, I recently purchased an upgrade from 23andMe (for their V5 chip, from their V3 chip).  I noticed that I had to acknowledge that the purchase was non-refundable when I purchased the upgrade.  If it is possible (and/or tactful) for the company to provide refunds, then I think there are disadvantages to this style of not providing refunds.  However, this also made me think twice about how such an interaction would look if you were hesitant to give a refund because your profits were limited (and you have things like salary caps).  Most importantly, both non-profits and for-profits have to make sure they are not compromising safety (or unfairly representing their product).
While it is not the only reason why I think something should be provided from a non-profit, I think one characteristic of something that might need to be directly offered by a non-profit is something where there is a need to make sure the experts are in the habit of publicly announcing limitations (and mistakes) on a fairly regular basis.  In other words, if you can get an accurate estimate of a reasonable success rate, you can look more closely at situations where the success rate that either is exceptionally low or exceptionally high (although I would expect gradual improvement over time).

Also, to be fair, I think of "ownership" to be different when you talk about "owning" a pet versus "owning" a product to sell.  However, I think the concept of responsibility for the former is important, and it is also definitely possible that there are misconceptions in my understanding about the ways to provide something through a for-profit organization.

If it doesn't exist already, perhaps there can be some sort of foundation whose goal is to fund diagnostics / therapies that start as generics (without a patent)? If immediately offered as generic, perhaps there could be a non-profit donation suggestion at pharmacy or doctor's office (to a foundation that helps develop medical applications without patents)? Or, if this is not quite the right idea, perhaps another possible option that could be up for discussion could be early development in non-profit could translate into decreased time to become a generic (so, even if the non-profit is not directly providing the product with limited profit margins, the contribution of non-profit can still decrease costs to society).  This relates in part to an earlier post on obligations to publicly funded research, but I believe my current point is a little different.

There is precedent for the polio vaccine not having a patent, but my understanding that came at a great financial cost to the March of Dimes (and that is why more treatments don't enter the market without patents, even though the fundraising strategy was targeted to a large number of individuals that were already on tight budgets).

Genomics Data and Diagnostics

In "The Language of Life" Francis Collins describes the discovery of the CFTR gene.  After describing the invalidation of gene patients for Myriad, he mentions "my own laboratory and that of Lap-Chee Tsui insisted that the discovery of the CF gene, in 1989, be available on a nonexclusive basis to any laboratory that was interested in offering testing" (page 112) as well as saying "I donated all of my own patent royalties from the CF gene discovery to the Cystic Fibrosis Foundation" (page 113).

My understanding is the greatest barrier to having products frequently start out as generics is the cost of conducting the clinical trial.  I need to be careful because I don't have any first-hand experience with clinical trails, but are some possible ideas that I thought might be worth throwing out as ideas:
  1. Allow data sharing to help with providing information to conduct clinical trails.  For example, lets say the infrastructure from a project like All of Us allows people to share raw data from all diagnostics (and electronic medical records), as well as archived blood draws and urine samples.  Now, let's say you have a diagnostic that you want to compare to previously available options.  If the government has access to the previous tests, the original samples, and the ability to test your new diagnostic, maybe use of that information can be combined with an agreement to provide your diagnostic as a generic (with understanding that continued surveillance also serves as an additional type of validation) is a fair trade-off?
    • I'm not sure if this changes how we think of clinical trails, but I think participants should also be allowed to provide notes over the long-term (after you would usually think of the trial as ending).  This would kind of be like post-publication review for papers, and self-reporting in a system like PatientsLikeMe (which I talk about more in another post).
    • Side effects are already monitored for drugs on the market
  2. Define a status for something that can be more easily tested by other scientists if passes safety requirements (Phase I?) but not efficacy requirements?  I guess this would be kind of like a "generic supplement," but it should probably have a little different name.
I also believe that all participants need to have access to their own data (including the ability to look up papers that use their data for publication), but I realize that this doesn't necessarily have to be part of a clinical trail because I have accessed patient genomics data from archived samples and donors/subjects (for which I think the rules are a little different).  Nevertheless, I think it is important and relevant to the points that I am making about patients having access to their raw data.

For some personalized treatments, I would guess you might even have difficulties getting a large enough sample size to get beyond the "experimental" status (equivalent to not being able to complete the clinical trial?). Plus, if some drugs have 6-figure price tags (or even 7-figure price tags), maybe some people would even consider getting a plane ticket to see a specialist for an "experimental" trial / treatment.

Role of the FDA

From what I can read on-line, I believe there is some interest in the FDA helping with generic production, and this NYT article mentions "[the FDA] which has vowed to give priority to companies that want to make generics in markets for which there is little competition", in the context of a hospital producing drugs.  According to this reference, "80 percent of all drugs prescribed are generic, and generic drugs are chosen 94 percent of the time when they are available."

Perhaps it is a bit of a side note, but I was also playing around with the FDA NDC Database (which is an Text / Excel file that you can download and sort).  For example, I could tell my Indomethacin was produced by Camber Pharmaceuticals by one pharmacy (NDC # 31722-543), and my Citalopram from another pharmacy was produced by Aurobindo Pharma Limited (even though Camber Pharmaceuticals also manufactures Citalopram, and Aurobindo also produces Indomethacin Extended-Release, according to the NDC Database).  I thought it was interesting to see how many companies produce the same generic and how many generics are produced by each company.  At least to some extent, this seems kind of like how there may be similar topics studies by labs in different institutes across the world.  So, maybe there can even be some discussions about how there can be both sharing information for the public good as well as independent assessments of a product from different organizations (whether that be a lab in a non-profit or a company specializing in generics).

I also noticed that the FDA has a grant for "complex" generics, but I believe that is for current drugs that are off-patent but there were extra challenges with production that make offering a generic version more difficult.  Nevertheless, it is evidence that there is some belief that academic and non-profit institutes may be able to help bring generics to the market more quickly.

Personal Experience / Open-Source Bioinformatics Software

I believe that I need to work on fewer projects more in-depth.  I wonder if there might be value in having a system for independence that would allow PIs to do the same (with increased responsibility/credit/blame at the level of the individual lab).  If something entered the market as a generic (possibly from a non-profit), perhaps the same individuals can be involved with both development and production of the generic.

Also, for my job as a Bioinformatics Specialist, I mostly use open-source software (but I sometimes use commercial software or software that is only freely available to non-profits/academics).  In particular, I think it is very important to have access to multiple freely available programs, and the topic of limits to precision in genomics methods (at least in the research context) is something I touch on in my post about emphasizing genomics for "hypothesis generation" (at least in the research context).

Concluding Thoughts

Even if is not used in clinical trails (which, as far as I know, was not part of the original plan), I think All of US matches some of what I am describing as a generic from a non-profit (even though it isn't called a "generic," it is a government operation, and free sequencing is not currently guaranteed after sample collection).  Nevertheless, non-profit (or academic) Direct-to-Consumer options that I think more people should know more about include Genes for Good (free genotyping), American Gut (can still be ordered from Indiegogo?), the UC-Davis Veterinary Genetics Lab, and I am excited to learn more about others.  I think this may also be in a similar vein to DIYbio clubs (for example, I believe Biocurious provides a chance to do MiSeq sequencing).  Cores (like where I work) also kind of do this (for labs), but I can tell that I need to work on fewer projects more in-depth (so, I think there would need to be some changes before adopting a "core" model for producing generics).

Finally, I want to make clear that this is something that I would like to gradually learn more about, but that is probably more on the scale of 5-10 years.  That is generally what I am trying to indicate when I add "Speculative Opinion" to a blog post title.  So, I very much welcome feedback, but my ability to have extended discussions on the topic may be limited.

The only things that I feel strongly about in the immediate future is not reversing the Supreme Court decision to not allow genes to be patented, and the limits to predictive power for some genomics methods (such as the concerns I expressed about the 23andMe ancestry results towards the beginning of this post, and how I don't believe it would be appropriate to encourage travel destinations to specific countries).

Change Log:

5/22/2019 - original post date
-I should probably give some amount of credit for the idea of emphasizing decreased health care costs to Ragan Robertson (for his answer to my SABPA/COH Entrepreneur Forum question about generics and providing something in a non-profit versus commercial setting).  However, his answer was admittedly more focused on mentioning how generics could be used for different "off-label" applications after they have entered the clinic (as well as connecting this to decreased health care costs).
5/23/2019 - update some information, after Twitter discussion
5/24/2019 - trim out 1st paragraph
5/25/2019 - move open-source software paragraph towards end.  Also, lots of editing for the overall post.
5/26/2019 - remove sentence with placeholder for shared resources post that is currently only a draft.  Add link to $2.1 million drug treatment tweet (with every interesting comments)
6/1/2019 - remove the word "their" from 23andMe travel sentence
6/27/2019 - update content in response to discussion with family member.  For example, I don't think I was making clear that I was primarily concerned about data sharing / transparency and continuing to not allow genetic testing / information to be patented, at least in the field of genomics (and I am curious if being a non-profit can play a helpful role if those requirements are met).
6/28/2019 - revise explanation for the previous change log entry
6/29/2019 - bring up tax details
7/13/2019 - add explanation for "Speculative Opinion"
7/30/2019 - add comments for Francis Collin's CFTR gene discovery
11/3/2019 - add "speculative opinion" tag
5/6/2020 - minor changes
8/22/2020 - add a couple additional links
10/4/2020 - add FDA "complex" generic grant link + minor changes + add section headers

Saturday, May 18, 2019

Emphasizing "Hypothesis Generation" in Genomics

With terms like "precision medicine," I believe there is an expectation that that genomics will remove ambiguity in medicine.  I certainly agree that it can help.  However, I think it is also important to realize that "hypothesis generation" and "genomic profiling" are also common terms in genomics, which imply some uncertainty and a need for further investigation.

I think there more work needed to have a fair and clear description of the importance to limitations to genomic analysis, but I believe concepts like over-fitting and variability also be important.

Here is a plot that I believe can help explain what I mean by over-fitting :



The plot above is from Warden et al. 2013.  Notice that very high accuracy in one dataset (from which a signature was defined) actually resulted in lower accuracy in other datasets (which is what I mean by "over-fitting").  The yellow horizontal line is mean to be like the diagonal line in AUC plots.  While 50% accuracy is not actually the baseline for all comparisons (for example, if an event or problem is rare, then saying something works 0% of the time or 100% of the time may actually be a better baseline).  Nevertheless, I think this picture can be useful.

When I briefly mentioned "variability," I think that is what is more commonly thought of for personalized / precision medicine (solutions that work in one situation may not work as well as others).  However, I also hope to eventually have an RNA-Seq paper to show that testing of multiple open-source programs can help towards having an acceptable solution (even if you can't determine the exact methods should be used for a given project ahead of time).  I think this is a slightly different point in that it indicates limits to precision / accuracy for certain genomics methods, while still showing overall effectiveness in helping answer biological questions (even though you may need to plan to take more time to critically assess your results).  Also, in the interests of avoiding needless alarm, I would often recommend people/scientists visualize their alignments in IGV (a free genome browser), along with visualizing an independently calculated expression value (such as log2(FPKM+0.1), calculated using R-base functions); so, if you think a gene is likely to be involved, that can sometimes be a way to help gauge whether the statistical analysis produced either a false positive or a false negative for that gene (and then possibly provide ideas of how to refine analysis for your specific project).

This is also a little different than my earlier blog post about predictive models in that I am saying that over-fit models may be reported to be more accurate than they really are (whereas the the predictive power of the associations described in that post clearly indicate population differences have limitations in predictive power in individuals).  However, I think that level of predictive power for the SNP association is in some ways comparable to the "COH" gene expression model shown above (where roughly 80% accuracy is actually more robust, and therefore arguably more helpful, than a signature with >90% accuracy in one cohort but essentially random predictions in most other cohorts).

I think that also matches this Wu et al. 2021 commentary, where performance noticeably drops in Table 1 when an AI model is trained at one site and tested at another (with highest performance coming from the same site as training).  However, this is a little different than what I showed above (where lower performance on the same dataset may result in relatively better performance in different datasets for the BD-Func plot, but I think the maximal performance is more correlated in the AI commentary table with a loss across in performance in new sites across all 3 rows).  If there was at least 1 additional row with non-AI models that had more similar performance on the same site and different sites (but lower performance than the AI model on the same site), then that would be more similar to the BD-Func example.

Also, I think it should be emphasized that precision medicine doesn't necessarily have to involve high-throughput sequencing, and I think using RNA-Seq for discovery and lower-throughput assays in the clinic is often a good idea.  For example, the goal of the paper was a little different, but the feature being predicted in that plot above is Progesterone Receptor immunostaining (I believe protein expression for ER, PR, and HER2 are often checked together for breast cancer patients).  So, just looking at the PGR mRNA might have had more robust predictions in validation sets than the BD-Func "COH" score (which was a t-test between up- an down-regulated gene expression, per-sample).

There are positive outcomes from genomics research, and there are some things that can be known/performed with relatively greater confidence than others (such as well-established single-gene disorders). However, I think having realistic expectations is also important, and that is why I believe there should be emphasis on both "precision medicine" and "hypothesis generation" when discussing genomics.  Or, I actually prefer the term "personalized medicine" over "precision medicine," which I think can capture both of those concepts.

Change Log:

5/11/2019 - this tweet had some influence on re-arranging my draft (prior to public posting), in terms of the expectation that personalized medicine / genetics can explain / improve therapies that originally did not seem very effective.
5/18/2019 - public blog post
5/20/2019 - update link for "personalized medicine," add sentence in 1st paragraph, and remove "medicine" from title (and one sentence in concluding paragraph).
5/22/2019 - I don't remember why I had this in the draft for the generics post.  While I don't think it fits in with the flow of the main content, I wanted to add this as a slide note relevant to general limitations in precision (even when a program is incredibly useful): As mentioned in this tweet, BLAST is of huge benefit to the bioinformatics / genomics community, even without choosing a "typical" 0.05 E-value cutoff (to be more like a p-value).
5/26/2019 - add Mendelian Disease as "success story" for genomics
4/8/2021 - add link to article about issue with retrospective studies and decrease in performance between sites

Saturday, May 4, 2019

precisionFDA and Custom Scripts for Variant Comparisons

After posting this reply to a tweet, I thought it might be a good idea to separate some of the points that I was making about comparing genotypes for the same individual (from this DeepVariant issue thread).

For those who might not know, precisionFDA provides a way to compare and re-analyze your data for free.  You need to create an account, but I could do so with a Gmail address (and an indication that you have data to upload).  I mostly show results for comparing .vcf files (either directly provided from different companies, or created via command line outside of precisionFDA).

I needed to do some minor formatting with the input files, but I provided this script to help others to the same.  I also have another script that I was using to compare .vcf files.

For the blog post, I'll start by describing the .vcf files provided from the different companies.  If readers are interested, I also have some messy notes in this repository (and subfolders), and I have raw data and reports saved on my Personal Genome Project page.

For example, this is the results for the SNPs from my script (comparing recovery of variants in my Genos Exome data within my Veritas WGS data):

39494 / 41450 (95.3%) full SNP recovery
39678 / 41450 (95.7%) partial SNP recovery

My script also compares indels (as you'll see below), but I left that out this time (because Veritas used freebayes, and I didn't convert between the two indel formats).

I defined "full" recovery as having the same genotype (such as "0/1" and "0/1", for a variant called as heterozygous by both variant callers).  I defined  "partial" recovery as having the same variant, but with a different zygosity (so, a variant at the same position, but called as "0/1" in one .vcf but called as "1/1" in the other .vcf would be a "partial" recovery but not a "full" recovery).

You can also see that same comparison in precisionFDA here (using the RefSeq CDS regions for the target regions), with a screenshot shown below:



So, I think these two strategies complement each other in terms of giving you slightly different views about your dataset.

If I re-align my reads with BWA-MEM and call variants with GATK (using some non-default parameters, like removing soft-clipped bases; similar to shown here, for Exome file, but not WGS, and I used GATK version 3.x instead of 4.x) and filter for high-quality reads (within target regions), these are what the results look like (admittedly, using an unfiltered set of GATK calls to test recovery in my WGS data):

Custom Script:

20765 / 21141 (98.2%) full SNP recovery
20872 / 21141 (98.7%) partial SNP recovery
243 / 258 (94.2%) full insertion recovery
249 / 258 (96.5%) partial insertion recovery
208 / 228 (91.2%) full deletion recovery
213 / 228 (93.4%) partial deletion recovery

precisionFDA:



Since I was originally describing DeepVariant, I'll also show those as another comparison using re-processed data (with variants called from a BWA-MEM re-alignment):

Custom Script:

51417 / 54229 (94.8%) full SNP recovery
53116 / 54229 (97.9%) partial SNP recovery
1964 / 2391 (82.1%) full insertion recovery
2242 / 2391 (93.8%) partial insertion recovery
2058 / 2537 (81.1%) full deletion recovery
2349 / 2537 (92.6%) partial deletion recovery

precisionFDA:



So, one thing that I think is worth pointing out is that you can get better concordance if you re-process the data (although the relative benefits are a little different for the two strategies provided above).

Also, in terms of DeepVariant, I was a little worried about over-fitting, but that was not a huge issue (I think it was more like an unfiltered set of GATK calls, but requiring more computational resources).  Perhaps that doesn't sound so great, but I think it is quite useful to the community to have a variety of freely available programs; for example, if DeepVariant happened to be a little better at finding the mutations for your disease, that could be quite important for your individual sample.  Plus, I got a $300 Google Cloud credit, so it was effectively free for me to use on the cloud.

As a possible point of confusion, I am encouraging people to use precisionFDA to compare (and possibly re-analyze) new data.  However, there was also a precisionFDA competition.  While I should credit DeepVariant to cause me to test out the precisionFDA interface, my opinion is that the ability to make continual comparisons may actually be more important than that competition from a little while ago.  For example, I think different strategies with high values should be comparable (not really one being a lot better than the others, as might be implied from having a "winner"), and it should be noted that that competition focused on regions where "they were confident they could call variants accurately"  Perhaps that explains part of why the metrics are higher than my data (within RefSeq CDS regions)?  Plus, I would encourage you to "explore results" for that competiation to see statistics for subsets of variants, where I think the func_cds group may be more comparable to what I performed (or at least gives you an idea of how rankings can shuffle with a subset of variants that I would guess are more likely to be clinically actionable).
 
Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.