Wednesday, January 30, 2013

Stormy's Feline DNA Test

As a cat lover and bioinformatics scientist, I got excited when I heard that the UC-Davis Veterinary Genetics Laboratory was offering a DNA ancestry test for cats.  I decided to apply this test to my girlfriend's cat Stormy, who is a Scottish Fold of modest internet fame.

You can click here to see Stormy's test results.

Given that Stormy is a Scottish Fold, it isn't surprising that he descends from cats of Western Europe.  However, I think this is worth noting because it shows that the test result was accurate.

More importantly, I was hoping that this test would give me a chance to become more familar with feline genetics / genomics research.  So, I was glad to see that the test also listed 10 phenotypic markers.

A vet has previously identified Stormy as a "siamese point," so it was nice to see the precise mutation assoicated with this trait.  It was also interesting to see that he carried a long-hair marker, even though he is a short haired cat.

In order to better understand the research behind these markers, I contacted Dr. Leslie Lyons (who developed the ancestry test, described in Kurushima et al. 2012).  She was kind enough to point out that detailed information for these markers can be found on the VGL website, and I have provided all the relevant links below.
Of course, I will also be excited to see future developments in this area.  For example, I think it will be cool to view a larger portion of Stormy's genome and perhaps see the specific alternation that causes him to be a Scottish Fold.

Monday, January 14, 2013

Testing Some Genomic Text-Mining Tools

I've wanted to learn more about the field of text-mining for some time, and I've recently had the goal of seeing if I can build a directed regulatory network using text-mining of papers in the scientific  literature.  Text-mining is certainly not a new area of genomics research: for example, Jenssen et al. 2001 built a co-citation network using text-mining.  However, this sort of network can *not* be used to identify genes that have been shown to activate or inhibit one another, and I was curious to see how easy it would be to currently build such a network.

I tested two text-mining tools for a couple example queries: iHOP and PolySearch.

This certainly not an exhaustive test of all the text-mining tools currently available, but these two tools had a nice user interfaces and a relatively high number of citations.  For those that are interested, here are some reviews that I came across in my literature search: Altman et al. 2008, Ananiadou et al. 2010, Hoffmann et al. 2005, Jensen et al. 2006, Krallinger and Valencia 2005, and Skusa et al. 2005.

I found the test query for progesterone receptor (PGR) was a good example of the strengths / weaknesses of these two tools.

In both cases, estrogen receptor was found to be co-cited most frequency with PGR.  I was pleased with this result because the relationship between these genes is well characterized, and they are two markers commonly studed in breast cancer patients.

On the other hand, these tools weren't very helpful for identifying several genes that are activated or inhibited by progesterone receptor.  I liked the fact that PolySearch allowed me to specify verbs to define the method of association, but I found that the search results for "activate; activated; activates" and "inhibit; inhibited; inhibits" produced practically identical gene lists.  As far as I could tell, iHOP didn't provide this feature for users, but I did write a short Perl script to parse sentences containing "activate" or "inhibit" (in the iHOP results) and I manually searched those sentences to try and find PGR targets.  This provided a small number of results which were not particularly useful: for example, iHOP provided citations that IGFBP-1 was both activated and inhibited by PGR.

In short, I think these tools do a good job of determining generic associations between genes by determining which genes are commonly co-cited.  However, it didn't appear that either of these tools would be a good solution for building a directed regulatory network.  I also found that each of these tools had distinct advantages / disadvantages: iHOP provided results much more quickly than PolySearch, but PolySearch can define a broader range of associations (such as gene-disease) and I liked the PolySearch query and results interface better than the iHOP interface.

To be clear, I am certainly not saying that it is impossible to accomplish my goal of defining a directed regulatory network via text-mining.  I just did not find it to be practical for a text-mining neophyte like myself.  I should also specify that I was specifically interested in open-source tools -  I know of some commerically available tools that provide this information (usually with the assistance of PhD-level curators), but I am trying to see if this can be done entirely in silico by scratch. On the contrary, I am sure this is a feasible goal, and I would certainly appreciate any comments for suggestions of atternative ways to achieve this goal.
Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.