Saturday, August 6, 2011

What is it like to be a "Bioinformatics Specialist"?

I recently received a request from a complete stranger to learn more about the field of bioinformatics.  Since I think others may also benefit from my answers, I've converted this e-mail conversation into a blog post.  I've made some modifications to the questions and my responses, but all the main ideas are the same.

FYI, I've provided a link to my CV, so you can get a better idea about my background.

Q)  Can you tell me a little bit about your work as a bioinformatics specialist and what a typical day looks like?

A) I think it is safe to assume that someone with this job description will work for at least one lab, and your goal will usually be to help biologists without a strong computational background analyze their data.  In particular, I assist with microarray and next-generation sequencing data analysis.  Sometimes you may work in the lab of an individual scientist, but I work in a shared resource facility.  So, I work for several scientists on campus.  The Bioinformatics Core is also in charge of software support, so I also assist in installing and maintaining software and hardware.  Additionally, I assist in writing papers and grants, but I don’t know if is safe to assume all Bioinformatics Specialists will be authors in papers.

Q) When you completed your MS degree, did you find the job market to be favorable? 

A I actually have an MA degree in Molecular Biology (I was in a PhD program and left the program with just a Master's degree), so this may be a little different than someone going for an independent MS degree in Bioinformatics.  When I had to look for jobs, it did take a lot of effort, and I basically accepted the first offer I could get after a few months of hunting.  However, there are lots of people who are unemployed or go a year or more without a job, so it could be a lot worse.

Q)  How deeply would you suggest that a person searching for a job similar to yours get into programming?  What programming languages are most useful for your job?
 
A) I would say that you pretty much can’t get a job with “Bioinformatics” in the title without significant programming experience and a firm grasp of statistics.

I am proficient in R and Perl.  SQL is also very important.  Python is especially useful for next-generation sequencing analysis.  It is also valuable to learn Java, Apache, and PHP.
 
Q)  Do you know of any useful resources for job seekers?  For example, what do you know about bioinformatics internships?  

A) I think that the importance of internships varies with your career goal.  I think they are a little less important if you plan to eventually get a PhD, but I think they can be very important for individuals with a terminal BS or MS degree.

I would suggest e-mailing PIs / Scientists whose work you find interesting to see if there are any jobs – this is how I have gotten all of my jobs.  

Before I got paid to do research, I had to do research for academic credit for 1-2 years.  If you don’t have considerable research experience, I would consider offering to do volunteer work (or an unpaid internship).  

You should also apply for jobs that you see posted for companies.  However, I haven’t actually had much success with these.  I think companies are legally obligated to post jobs, even if they have already found an internal person for the job.  A lot of companies like to promote from within, so this may be worth something to consider.  Nevertheless, I think anyone who is willing to pay money to post a job on an external website is probably serious about at least considering candidates from outside of the company.  For example, here are some useful resources when looking for bioinformatics jobs (in addition to places like Monster, etc.):
When you are in school, I believe there should be some sort of career services department that might be able to help you.  For example, I have helped send out job postings for the Bioinformatics Core to prestigious universities looking for recent graduates.

Also, be certain to take advantage of research experience (even it is not required) for networking purposes.  Plus, research experience also has other direct benefits, such as getting practical experience that will almost certainly be useful for jobs later down the road.

Please feel free to continue the discussion with questions and comments below!

Friday, July 1, 2011

Review of the TCGA Ovarian Cancer Paper

Initial analysis of the TCGA data for ovarian cancer was recently published in Nature this week.  The Cancer Genome Atlas (TCGA) is a joint project by the NCI and NHGRI to study genomic changes that are associated with many different types of cancer by collecting a large number of patient samples for analysis using mRNA gene expression microarrays, copy number arrays, methylation arrays, miRNA microarrays, and exome sequencing.  This data can be freely downloaded using the TCGA Data Portal.

There is a huge amount of information presented in this paper.  For example, the first TCGA paper provided an overview of the glioblasoma data, mostly focusing on somatic mutations and copy number alternations.  There have been a number number of subsequent papers studying the glioblastoma data, and the subsequent TCGA papers that I am most familar with focused on subtypes defined by gene expression patterns (Verhaak et al. 2010) and methylation patterns (Noushmehr et al. 2010).  The new ovarian cancer TCGA paper provides all of the information provided in the glioblastoma nature paper in addition to the subtyping analysis that was covered in mulitple high-impact, highly cited papers.

I think one of the most important take-home messages was the extremely important role of p53 in ovarian cancer.  For example, 96% of high-grade tumors showed p53 mutations, which has also been shown previously in publications such as Ahmed et al. 2010.  In contrast, the TCGA glioblastoma paper showed a p53 mutation rate of 38% to 58% for untreated and treated tumors, respectively.  Interestingly, the ovarian cancer TCGA paper also revealed a high rate of p53 mutation in ovarian cancers contributes to FOXM1 overexpression by using PARADIGM to identify pathway alterations in the new TCGA data (where pathways were defined using the NCI Pathway Interaction Database).

Another striking result was how consistent the copy-number alterations were within either ovarian tumors or glioblastomas but how different the copy-number alternations were between the two cancer types (as shown in Figure 1a).

Although I was impressed that the study defined separate subtypes for mRNA gene expression, miRNA gene expression, and CpG methylation status, I had mixed feelings about the results.  For example, the subtypes defined by methylaton only had "modest stability" (so, they have limited predictive power), and I thought the overlap between the mesenchymal mRNA subtype and tbe C2 miRNA subtye (and the proliferative mRNA subtype with the C1 miRNA subtype) was overemphasized.  I was also a little disappointed that the integrative analysis didn't substantially enhance the subtype definitions (for example, I think Figure S6.4 in the ovarian cancer paper looks less impressive than Figure 3 in Verhaak et al.).  However, I did find it interesting that both the glioblastoma and ovarian cancers had a "mesenchymal" subtype (although I don't think these subtypes necessarily have the same biological meaning), and I think it will definitely be interesting to further characterize the subtypes defined based upon mRNA gene expression.


I was somewhat surprised at how much the survival curves varied for the 4 data sets shown in Figure 2c.  For example, the TCGA test set (N = 255) and the data from Tothill et al. 2008 (N = 237) had very different Cox p-values (0.02 and 0.00008, respectively).  Nevertheless, it is not trivial to get a statistically significant result in 4 independent data sets, and I think the survival results are certianly strong enough to warrant further investigation in order to understand the cause of this variation.

Overall, I would consider this a must-read for any bioinformatician interested in cancer research.

Sunday, June 26, 2011

Review of Biopunk

Biopunk is a book discussing biological research that isn't conducted in traditional research setting (like an academic lab or a pharmaceutical company).  The book covers a wide variety of topics such as a philosophical discussion about what motivates good scientists, how legal and political decisions affect scientific progress, and recent developments in the field of "DIY bio" (where the book mostly focuses on personalized medicine and synthetic biology).  Throughout the book, Wohlsen also provides several cool factoids, like the Bridges of Cherrapunji that are engineered from living tree roots.

One chapter focuses on DTC genetic testing, where Wohlesen provides both an overview of this industry as well as accounts of individuals who have utilized DTC testing.  For example, Raymond McCauley conducted his own DIY bio research on metabolites in his own blood in order to try and better understand his 23andMe result indicating an increased risk for macular degeneration.  Although Wohlesen acknowledges "McCauley did not hesitate to concede that the results do not show anything conclusive," I think this is a very cool example of how DIY Bio can help inquisitive scientists try to learn more about themselves outside a formal research setting.

My subsequent research on Raymond McCauley also led me to learn more about DIYgenomics.org, which provides tools to help users further analyze their 23andMe data for health risk, drug response, and athletic performance for individual SNPs.  In some ways, this reminded me of the new, free Interpretome tool, but Interpretome can load my 23andMe data more quickly and with a more streamlined interface.  Nevertheless, I think it's good to know that this option is out there.

There were also a few aspects of the book that disappointed me.  For example, many accounts of biopunk research seem to focus more on buying used lab equipment off craigslist or eBay than new technological developments that can help democratize research.  It also seemed like a lot of the "biopunks" were pretty well-educated and not necessarily good examples of what I would consider amateur scientific research.  Also, I was somewhat disappointed at how difficult it was to additional information on some of the start-ups / organizations that were mentioned in the book (which has only been out for a few months).

For example, the chapter "Cancer Kitchen" discusses how John Schloendorn and Eri Gentry studied the role that the immune system played in cancer using Schloendorn's own cancer cells, which led the creation of DIY nonprofit called Livly to develop cancer immunotherapies (and Gentry later co-founded BioCurious, another DIY nonprofit).  However, the Livly website described in the book is no longer hosted on the internet (the old url, provided on the Livly facebook page, now links to an unrelated website).  Likewise, BioCurious only seems to have a facebook page with limited information.  Even with limited funding, the company can at least create a free Google Sites website (like my personal website) in order to more effectively convey information about the company.

I was also very interested in learning more about the Pink Army Cooperative (a DIY drug company attempting to deliver personalized treatments for breast cancer).  This time, I was able to find a generally well-designed and informative website, but I couldn't find much information about concrete research accomplishments (to be fair though, Wohlsen does warn readers that "so far, Pink Army is more a concept than an actual co-op").

Although it was frustrating that I couldn't learn much more about these specific non-profits, Biopunk has successfully encouraged me to learn more about the DIY bio movement.  Who knows, maybe I'll even stop by a meeting for my  local DIYbio chapter!
 
Creative Commons License
Charles Warden's Science Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.