Wednesday, March 13, 2013

Bioinformatics 101: DNA Sequence Analysis

Genome Visualization Tools:

  • UCSC Genome Browser
    • popular, free genomic visualization tool for a wide variety of organisms
    • also serves as a database for genomic sequences and features
  • Integrative Genomics Viewer (IGV)
    • very efficient tool for visualizing almost any type of genomic data
    • open-source
  • Gbrowse - open-source genome browser
  • Circos
    • circular genome plot
      • Especially useful for plotting genomic interaction results
    • official code has a step learning curve, but you have a lot of options for precise formatting
    • also implemented in Rcircos
  • POMO
    • creates image similar to circos plot
    • I consider the input file much more intuitive than circos configuration files, and plots are created via web interface (instead of local installation)
    • can be used to plot data from multiple species
    • I would recommend using Firefox; I've had some problems with Chrome and IE

Sequence Alignment:

  • BLAST - search for similar DNA sequences in GenBank
  • ClustalW - multi-species genome alignment
  • TCoffee - multi-species genome alignment
  • Mauve - multi-species alignment and visualization tool to detect segments of conserved sequence

General DNA-Seq Tools:

  • samtools
    • popular, free tool to extract data from .SAM alignment files
    • Picard - java-based version of samtools
    • see short read aligners necessary for upstream analysis
  • Galaxy
    • open-source, cloud-based suite of popular sequence analysis tools (including deep sequencing analysis 
  • GATK
    • toolkit for analysis of next-generation sequencing data
    • previously open-source, but now requires a commercial license
  • CLC Bio Genomics Workbench
    • commercial software covering a wide variety of applications such as sequence alignment, SNP/DIP detection, de novo assembly, etc.
    • CLC Bio Genomics Workbench also has the functionality of CLC Bio Main Workbench for standard sequencing analysis (cloning, primer design, etc.)
      • both are commercial programs that require a purchased license
  • SeqAnswers Software List
Copy Number / Indel Tools:

  • CoNIFER
    • My favorite tool for making copy number calls in exon capture data
    • However, you will want to analyze a pool of samples (say >10) because it is not ideal for analysis of one or two samples. Can also create .bed files to import into DNAcopy.
  • PennCNV
    • Suite of tools for calling copy number alterations from microarray data
    • Includes segmentation algorithm that considers LRR and BAF values
    • PennCNV-Affy is particularly useful for processing Affy SNP chip data
    • PennCNV2 is designed to handle tumor-normal paired data, but I currently prefer the single-sample analysis from the original PennCNV package
  • ASCAT
    • Tool for calling somatic copy number alterations from SNP chip data
    • estimates tumor purity
  • DNAcopy
    • Bioconductor package that makes copy number calls (either for single sample or log2ratio for paired samples).
    • Works for either microarray or NGS data
  • ExomeCopy
    • Bioconductor that can make copy number calls directly from .bam files.
    • I have found it most useful to produce copy number counts that I can then use for analysis in DNAcopy
  • Nexus Copy Number
    • commercial software for analysis of copy number alterations
    • works for a variety of microarray platforms as well as for deep sequencing analysis
  • VarScan
    • Can make copy number calls for individual or paired samples (as well as SNP/small indel calls). 
    • Individual copy number calls is basically the same as a .pileup file, but somatic calls are relatively useful

Transcription Factor Motif Analysis:

  • TRANSFAC
    • database of transcription factor motifs
    • a subscription is required to access the most recent annotations, but older versions are freely available
    • A plug-in is available within CLC Bio (a commercial program for genomics analysis)
  • JASPAR
    • free database of transcription factor motif sequences
  • TFsitescan
    • free tool to search for transcription factor motifs
  • MEME Suite
    • tools for ab initio motif finding
  • rVista / VISTA Suite
    • tool for searching motifs conserved across closely related organisms
  • TESS
    • transcription factor search system
    • unfortunately, this tool now has to be run locally

Mutation Analysis:
  • VarScan
    • open-source variant calling tool
    • see short read aligners necessary for upstream analysis
    • usually also requires something like samtools to create input file (.pileup file)
  • SeattleSNPs Genome Variation Server
    • tool to filter candidate variants (based upon frequency, predicted function, etc.)
  • ANNOVAR (pronounced Anno-Var)
    • tool to filter candidate variants (based upon frequency, predicted function, etc.)
    •  wANNOVAR is the web-based interface
  • GWAS Catalog
    • NHGRI database of SNP-based phenotypic / disease associations
  • Promethease
    • open-source tool for personalized genomic analysis
    • it is technically free to use, but you can pay $5 to get your report more quickly
    • uses annotations from SNPedia
  • Interpretome
    • Genome interpretation tool similar to Promethease
    • In my opinion, nicer interface.  However, it currently only works with raw data from 23andMe and  Lumigenix.
  • SNPedia
    • crowd sourced annotation of SNP associations
    • includes some publicly available genomes
  • Geno2MP
    • Resource to look up information about rare variants
  • DECIPHER
    • Resource to look up clinical variant annotation (including copy number alterations)
ChIP-Seq Tools:


de novo Assembly Algorithms:


Other Tools:
  • Primer3 - PCR primer design
  • Repeatmasker - identifies repetitive elements within a DNA sequence
  • Webcutter - detects restriction enzyme sites in a DNA sequence
  • Translate - a tool that allows translation of nucleotide (DNA / RNA) sequence into a protein sequence

No comments:

Post a Comment

 
Creative Commons License
Charles Warden's Science Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.