Wednesday, March 13, 2013

Bioinformatics 101: Bioinformatics Journals

Here are some of the journals that I check on a weekly basis.  I would strongly recommend subscribing to the relevant RSS feeds (using something like Google Reader).

Bioinformatics / Computational Biology:


Genomics:


Other Journals:
  • Nature Methods and Nature Biotechnology - not specific for bioinformatics articles, but many important programs / protocols are published here
  • PLOS ONE - general subject journal, but it has some good bioinformatics articles
  • peerJ - similar to PLOS ONE, but utilizes a membership system (so, you pay by author instead of by article)
  • Nature, Science, PNAS, etc.
Tutorials / Blogs:

  • OpenHelix - tutorials for popular programs; some free, some require subscription
    • Open Helix Blog - this covers tutorials and FAQs for common bioinformatics tools. I mostly read it for the Friday SNPpets (collection of popular weekly twitter feeds)
  • Omixon Blog - Bioinformatics company that provides free tutorials for common tools
  • Core Genomics - "personal blog written by James Hadfield who runs a Genomics core facility Cambridge" - lots of interesting technical details about next-generation sequencing
  • MassGenomics - medical genomics blog by Dan Koboldt, a staff scientist at the Genome Institute at Washington University. Consistently great article reviewers.
  • Genomes Unzipped - popular blog run by several genomics researchers. I would argue that it was made popular by Daniel McArthur (who doesn't post there as often now), but there are still other contributors that keep the blog up to date.
  • Getting Genetics Done - a well-maintained blog written mostly by Stephen Turner (Bioinformatics Core director at University of Virginia). Focuses mostly on providing technical suggestions.
  • NIH Bioinformatics Support System - probably doesn't have a feed, but contains useful tutorials

Bioinformatics 101: RNA Sequence Analysis

miRNA Resources:

  • MirBase
    • free database of miRNA sequences
  • TarBase
    • free database of experimentally validated miRNA targets
  • miRecords
    • database of miRNA-target interactions
  • IPA miRNA-target analysis
    • commercial database that includes free databases as well as a proprietary list of miRNA-target interactions found using text-mining of the literature
  • TargetScan
    • free tool to predict miRNA targets
  • sylArray
    • tool to predict miRNA targets from gene expression data.  Uses gene ranking, so it doesn't require mRNA differential expression (although you will need to check that the miRNA regulator is differentially expressed)
In general, I think you really need both miRNA expression and mRNA expression data to get reliable results when trying to identify miRNA-target interactions

RNA-Seq Splicing Events:


  • JunctionSeq - extends DEXSeq to include junction coverage (including junctions not defined among isoforms in reference database).  Strictly speaking, it only calls differential exon and junction coverage (and provides a statistic at the gene-level), but the junction coverage can be helpful in identifying some other types of splicing events.
  • MATS - Provides differential splicing for skipped exon (SE), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), mutually exclusive exons (MXE), and retained intron (IR)
  • MISO - Provides single-sample and differential splicing for skipped exon (SE), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), mutually exclusive exons (MXE), retained intron (IR), tandem 3' UTRs (TandemUTR), alternative first exon (AFE), and alternative last exon (ALE)


RNA Secondary Structure:


RNA Domain Homology:

  • Rfam
    • may be helpful in predicting function of a non-coding RNA of unknown function

de novo Assembly Algorithms (RNA-Seq):

  • Oases
  • Trans-ABySS
  • Trinity
  • eXpress - mRNA quantification tool that works with both de novo assembly transcripts (as well as transcripts from direct genome alignment)
RNA-Seq QC

  • FASTX-Toolkit - popular suite of tools to quantify and manipulate sequences .fastq and .fasta files
  • samtools - popular suite of tools to quantify and reformat .sam/.bam files
  • Picard - Java-based implementation of samtools; CollectRNASeqMetrics can produce a coverage plot (normalized per start to end of transcript)
  • RSeQC - package to produce a variety of RNA-Seq QC figures


General RNA-Seq Analysis

Bioinformatics 101: Literature / Text Mining

Search Engines:
  • PubMed
    • popular, free tool provided by NCBI to search biomedical journal articles
    • includes links to connected NCBI resources (GEO, RefSeq, etc.)
  • Google Scholar
    • popular, free tool to search the scientific literature
    • provides citation information
    • allows authors to create their own bibliographies (which provide author-level citation metrics) 
Gene-Centric Information:
  • NCBI Gene
    • free tool curated by the NCBI
    • includes literature citations, Gene Ontology categories, alternative and official gene symbols, etc.
  • iHOP (Information Hyperlinked Over Proteins)
    • free text-mining program that predicts interactions between genes
  • PolySearch
    • free text-mining program that predicts interactions between genes, diseases, drugs, metabolites, SNPs, pathways, and/or tissues
  • IPA (Ingenuity Pathway Analysis)
    • commercial program curating information about genes, metabolites, etc.
    • most popular use is for functional enrichment analysis, but it can also be used as a general tool for searching the literature

Bioinformatics 101: Pathway Analysis

Gene List Enrichment Tools (Requires Differenital Expression Analysis):


Other Systems-Level Analysis Tools (No Upstream Filtering Necessary):

Bioinformatics 101: Gene Expression Analysis

Differential Expression Tools:

  • R - statistical programming language
    • most common statistical functions (t-test, ANNOVA, etc.) are built in
    • Bioconductor - suite of R packages used for bioinformatic analysis
      • limma - most commonly used differential expression tool for microarray analysis
      • edgeR - R package for RNA-Seq differential expression analysis
      • DEseq - R package for RNA-Seq differential expression analysis
  • cuffdiff
    • differential expression package within cufflinks
    • cufflinks provides transcript abundance calculations
    • strictly speaking, the developers recommend using cuffdiff for differential expression, although it is relatively common to use edgeR, DEseq, etc. for differential expression following mRNA quantification via cufflinks
  • Java TreeView
    • free tool for clustering microarray data
  • OCplus - R package for statistical power calculations (and differential expression) for microarray studies
  • Scotty - web-based tool for statistical power calculations for RNA-Seq data
  • Partek Genomics Suite
    • Commercial program that includes a number of workflows, such as microarray gene expression and RNA-Seq analysis
    • Includes statistics for differential expression analysis as well as tools for downstream functional analysis and upstream quality control assessment
lncRNA Resources:


  • MiTranscriptome - known and novel lncRNAs with cancer-associated profiles
  • TANRIC - TCGA and CCLE expression analysis for lncRNAs (including correlations with protein-coding genes and miRNAs)
  • Expression Atlas - gene expression profiles for known genes across various datasets
  • lncrnadb - includes additional annotations for known lncRNAs
  • lncATLAS - contains subcellular location information for ENSEMBL-format lncRNAs for some cancer cell lines


Transcription Factor Motif Analysis:

  • IPA Upstream Regulator Analysis
    • Commercial tool that searches for enrichment of known targets for regulatory genes and molecules (such as transcription factors)
    • Can also detect if targets are consistent with activation or inhibition of the regulator
  • SCOPE
    • free tool that identifies upstream motifs enriched for gene lists
    • works on a wide variety of species, so it is useful for motif finding in less commonly studies organisms
  • Whole Genome rVISTA - calculate enrichment of transcription factor motifs predicted based upon evolutionary conservation
  • TRED (Transcriptional Regulatory Element Database) - database from CSHL for transcription factors.  Includes target gene lists for transcription factors in human, mouse, and rat
  • TRANSFAC - database of transcription factor motif sequences.  There are commercial and open-source versions of the database
  • JASPAR - open-source database of transcription factor motif sequences
General RNA-Seq Information:


Microarray Annotation Resources:
  • NetAffx
    • Affymetrix resource for probe design information
    • registration is free but required
  • GeneAnnot
    • an alternative resource for Affymetrix probe annotations

Bioinformatics 101: Protein Analysis

Protein Domain / Structure / Homology Tools:



3D-Structure Viewers:

Mass Spectrometry:
  • PRIDE - mass-spectrometry sample database managed by EMBL-EBI
  • PeptideAtlas - database for mass spectrometry data - includes links to relevant publications
  • MaxQuant - popular tool for mapping proteomics spectra from mass spectrometry data
  • ProteinProphet - another popular tool for mapping proteomics spectra to proteins
  • DanteR - R implementation of the popular DAnTE algorithm for differential expression of mass spectrometry proteomics data
  • LabKey / CPAS - open-source LIMS + basic analysis pipeline
  • PIR - UniProt Protein Information Resource: includes links to databases and peptide mapping tools
Protein-Protein Interaction Databases:
  • IntAct - database for protein-protein interactions
  • BioGRID - database of genetic and protein interactions
  • MINT - protein-protein interaction database
  • STRING - database for known and predicted protein-protein interactions
  • HIPPIE: database of human protein-protein interactions, integrating data from several other databases

Other
  • STITCH: database of drug-protein interactions
  • PaxDb - database of protein expression across different tissues and organisms
  • MOPED - database of protein expression across different tissues and model organisms

Bioinformatics 101: Genomic Databases

Genomic Annotations:

Systems Biology Databases:
  • Gene Ontology (GO)
    • Database of functional annotations for protein-coding genes
  • KEGG - Kyoto Encyclopedia of Genes and Genomes
    • primarily used as a pathway database
  • IntAct - database for protein-protein interactions
  • Reactome
  • Regulome Explorer - software to visualize integrative genomic data from the TCGA project
  • BioGRID - database of genetic and protein interactions
  • MINT - protein-protein interaction database
  • STRING - database for known and predicted protein-protein interactions
  • STITCH: database of drug-protein interactions

Microarray / Sequencing Databases:

  • GEO - microarray database
  • ArrayExpress - microarray database
  • SRA - sequencing archive; entries are often also indexed in GEO
  • BioGPS - similar to NCBI Gene, but also includes normal tissue expression levels (from microarray data)
  • TiGER - tissue-specific gene expression database
  • CellMiner - query NCI-60 cell line data
  • TCGA Data Portal - integrative genomic data for large cancer datasets

Genomic Variation Databases:


Disease-Centric Databases:

  • General
    • OMIM - Online Mendelian Inheritance in Man
      • database of human diseases
    • SIDER - EMBL side effect database
  • Cancer
    • cBioPortal
      • User-friendly interface for querying cancer datasets (including TCGA data)
    • TCGA - The Cancer Genome Atlas
      • includes microarray and sequencing data
    • Oncomine
      • database of gene expression and copy number data from patients
      • basic access is free, but license is required for premium access
    • caArray - NCI Cancer Database

Protein Databases:

Bioinformatics 101: Image Analysis

Microscopy Image Analysis / Visualization:

  • ImageJ - NIH image viewer and analysis tool
    • Fiji - Fiji Is Just ImageJ
      • ImageJ wrapper containing a number of plug-ins for advanced analysis
  • Cell Profiler
  • Cell Profiler Analyst
    • tool for high-throughput image analysis
  • LSM Image Viewer
    • free software to view .lsm images
    • more advanced software is commercially available

Medical Images:

  • MicroDicom - Open-Source program to view DICOM files; worked as an alternative solution to viewing a CD from a hospital (at least for my own files)
    • Tutorial to export images in more common format (JPEG)
      • You can remove the text from your medical records by unchecking "Show annotations" and then selecting the "Without overlay" radio button
      • I also increased the JPEG files to 100% quality for my files

General Tools
  • Inkscape
    • open-source version of Adobe Illustrator
    • Useful for creating figures for papers

Bioinformatics 101: DNA Sequence Analysis

Genome Visualization Tools:

  • UCSC Genome Browser
    • popular, free genomic visualization tool for a wide variety of organisms
    • also serves as a database for genomic sequences and features
  • Integrative Genomics Viewer (IGV)
    • very efficient tool for visualizing almost any type of genomic data
    • open-source
  • Gbrowse - open-source genome browser
  • Circos
    • circular genome plot
      • Especially useful for plotting genomic interaction results
    • official code has a step learning curve, but you have a lot of options for precise formatting
    • also implemented in Rcircos
  • POMO
    • creates image similar to circos plot
    • I consider the input file much more intuitive than circos configuration files, and plots are created via web interface (instead of local installation)
    • can be used to plot data from multiple species
    • I would recommend using Firefox; I've had some problems with Chrome and IE

Sequence Alignment:

  • BLAST - search for similar DNA sequences in GenBank
  • ClustalW - multi-species genome alignment
  • TCoffee - multi-species genome alignment
  • Mauve - multi-species alignment and visualization tool to detect segments of conserved sequence

General DNA-Seq Tools:

  • samtools
    • popular, free tool to extract data from .SAM alignment files
    • Picard - java-based version of samtools
    • see short read aligners necessary for upstream analysis
  • Galaxy
    • open-source, cloud-based suite of popular sequence analysis tools (including deep sequencing analysis 
  • GATK
    • toolkit for analysis of next-generation sequencing data
    • previously open-source, but now requires a commercial license
  • CLC Bio Genomics Workbench
    • commercial software covering a wide variety of applications such as sequence alignment, SNP/DIP detection, de novo assembly, etc.
    • CLC Bio Genomics Workbench also has the functionality of CLC Bio Main Workbench for standard sequencing analysis (cloning, primer design, etc.)
      • both are commercial programs that require a purchased license
  • SeqAnswers Software List
Copy Number / Indel Tools:

  • CoNIFER
    • My favorite tool for making copy number calls in exon capture data
    • However, you will want to analyze a pool of samples (say >10) because it is not ideal for analysis of one or two samples. Can also create .bed files to import into DNAcopy.
  • PennCNV
    • Suite of tools for calling copy number alterations from microarray data
    • Includes segmentation algorithm that considers LRR and BAF values
    • PennCNV-Affy is particularly useful for processing Affy SNP chip data
    • PennCNV2 is designed to handle tumor-normal paired data, but I currently prefer the single-sample analysis from the original PennCNV package
  • ASCAT
    • Tool for calling somatic copy number alterations from SNP chip data
    • estimates tumor purity
  • DNAcopy
    • Bioconductor package that makes copy number calls (either for single sample or log2ratio for paired samples).
    • Works for either microarray or NGS data
  • ExomeCopy
    • Bioconductor that can make copy number calls directly from .bam files.
    • I have found it most useful to produce copy number counts that I can then use for analysis in DNAcopy
  • Nexus Copy Number
    • commercial software for analysis of copy number alterations
    • works for a variety of microarray platforms as well as for deep sequencing analysis
  • VarScan
    • Can make copy number calls for individual or paired samples (as well as SNP/small indel calls). 
    • Individual copy number calls is basically the same as a .pileup file, but somatic calls are relatively useful

Transcription Factor Motif Analysis:

  • TRANSFAC
    • database of transcription factor motifs
    • a subscription is required to access the most recent annotations, but older versions are freely available
    • A plug-in is available within CLC Bio (a commercial program for genomics analysis)
  • JASPAR
    • free database of transcription factor motif sequences
  • TFsitescan
    • free tool to search for transcription factor motifs
  • MEME Suite
    • tools for ab initio motif finding
  • rVista / VISTA Suite
    • tool for searching motifs conserved across closely related organisms
  • TESS
    • transcription factor search system
    • unfortunately, this tool now has to be run locally

Mutation Analysis:
  • VarScan
    • open-source variant calling tool
    • see short read aligners necessary for upstream analysis
    • usually also requires something like samtools to create input file (.pileup file)
  • SeattleSNPs Genome Variation Server
    • tool to filter candidate variants (based upon frequency, predicted function, etc.)
  • ANNOVAR (pronounced Anno-Var)
    • tool to filter candidate variants (based upon frequency, predicted function, etc.)
    •  wANNOVAR is the web-based interface
  • GWAS Catalog
    • NHGRI database of SNP-based phenotypic / disease associations
  • Promethease
    • open-source tool for personalized genomic analysis
    • it is technically free to use, but you can pay $5 to get your report more quickly
    • uses annotations from SNPedia
  • Interpretome
    • Genome interpretation tool similar to Promethease
    • In my opinion, nicer interface.  However, it currently only works with raw data from 23andMe and  Lumigenix.
  • SNPedia
    • crowd sourced annotation of SNP associations
    • includes some publicly available genomes
  • Geno2MP
    • Resource to look up information about rare variants
  • DECIPHER
    • Resource to look up clinical variant annotation (including copy number alterations)
ChIP-Seq Tools:


de novo Assembly Algorithms:


Other Tools:
  • Primer3 - PCR primer design
  • Repeatmasker - identifies repetitive elements within a DNA sequence
  • Webcutter - detects restriction enzyme sites in a DNA sequence
  • Translate - a tool that allows translation of nucleotide (DNA / RNA) sequence into a protein sequence

Bioinformatics 101: Short Read Aligners

General Purpose Aligners:

  • BWA
  • Bowtie
  • Novoalign
    • commercial software covering a variety of alignment needs (RNA-Seq, miRNA-Seq, DNA-Seq, BS-Seq, etc.)
    • some functionality is also available in the free version

RNA-Seq Aligners:


BS-Seq Aligners:

Sequencing Technology Tutorials:

Bioinformatics 101: General Coding Information

UNIX:


Perl:

  • Downloading Perl
  • References
  • Parallel Computing
    • Stack Overflow discussion on limiting threads in Perl
    • PerlMonks discussion on threads in Perl
    • Semaphore example
    • Stack Overflow discussion on using for-loop to add threads
    • In a nutshell, I would say the steps are as follows:
      1. Load Dependencies
        • use threads;
        • use Thread::Semaphore;
      2. Create semaphore (with maximum number of reads)
        • our $semaphore = Thread::Semaphore->new($max_threads);
      3. Use subprocess to apply for multiple threads. 
        • That should look something like this (for each sample/process):
        • push @threads, threads->create(\&function_name, $func_var1, $func_var2,...)
        • foreach (@threads) {$_->join;}
      4. Within subprocess (described as "function_name" above), control the number of on-going processes as follows:
        • Start function with $semaphore->down(); (occupying thread)
        • End function with $semaphore->up(); (opening up thread)

Python:


R:


Docker:
  • Understanding Docker (high-level introduction)
  • Docker User Guide
  • My Notes (read tutorials first):
    • To mount Windows Documents folder, docker run -it -v /c/Users/[your username]/Documents:/mnt/[mounted name] [image]
      • If you need to re-enter an exited session, you can use docker start -ia container_ID to re-open it (note use of container ID instead of image ID)
    • docker ps -a to see exited interactive jobs
    • If if host your images on Docker Hub, try to keep them under 3 GB
      • To upload (after running an interactive session):
        • docker commit -m "update message" container_ID [image]
        • docker push [image]
  • Using Docker Through Singularity:
    • General Tutorial - for example, you can run interactive mode with singularity shell docker://user-name/repository
    • Mapping Folders - for example, as an extension of the above example, you can run a Docker image in interactive mode with a mapped drive using singularity shell -B /source/path:/docker/path  docker://user-name/repository
    • NOTE: this may not work for all Docker images (with errors not apparent until you try to run programs within the container), but I think it should work for some of them.
C++:

  • C and R:
  • My Notes
    • For g++ compiler, binary output is created with "-o"
      • You can use "-g" option for debugging and "-Wall" for warning messages, but you'll still get error messages either way
    • If mixing your code with open-source code, take the compiler into consideration.  For example, some string functions that work when compiling in gcc but not g++.
VirtualBox Virtual Machine:
  • Ubuntu .iso
  • Mounting shared folders
    • When you first open Virtual Box, choose settings for your image and define folder (under "Shared Folders")
    • To make that folder accessible, go to "Devices --> Insert Guest Additions CD image"
    • Probably should restart machine
    • Your folder should appear under /media/sf_[folder name]
    • However, you may still not have access to the contents of that folder.  To fix the permissions issue, run sudo mount -t vboxsf [folder name] /media/sf_[folder name]
    • This might not be sufficient to have folder load everytime you start the VM.  If you run into issues, try sudo usermod -G vboxsf -a [username] after re-mounting folder 
  • If you find yourself in a situation where Ubuntu won't load from a locked installation file, you can fix this by pressing "Left-Shift" before Ubuntu starts to load (and then use the GRUB menu to fix the installation file).  I don't think this is unique for the VM environment, but that is where I saw this could work.
    • This was a helpful blog post that reminded me about the alternative boot option
    • There were some issues with guest additions after that (at least one time), but some extra information about that process was described here.
Free Data / Code Sharing:
  • GitHub (up to 1 GB per repository, 100 MB per file)
  • SourceForge (honor system or 5 GB?)
  • FigShare (up to 5 GB)
  • Dryad (up to 20 GB)
  • Zenodo (up to 50 GB)
    • Has versioning (although it took me a little while to realize this)
    • However, this doesn't seem quite as flexible as GitHub.  For example, the upload comes with a warning that "File addition, removal or modification are not allowed after you have published your upload".

Setting up Ubuntu Server
  • I am not sure why, but I had better installation success using the "alternative" installation files (as suggested as a solution in this forum)
  • A little early to say how much of a "success" everything is.  However, at least for the installation step, this is the "No OS" computer that I am trying to set up as a server: Dell Server
  • Restart server via command line using `sudo reboot` or `sudo poweroff`
  • Tutorials to set up SSH key: here and here
    • I found the instructions to be a bit confusing.  However, to essentially have 2 passwords and require an SSH key, you will want to add "AuthenticationMethods publickey,password" to the /etc/ssh/sshd_config file and then restart the service using "sudo service ssh restart" (as described here)
  • Reformat SSH keys to use with PuTTY
  • Using SSH keys with WinSCP
    • Even though I provided a password, I still needed to enter the ssh passphrase as well as the server password (the way that I set things up)
  • Mounting an additional hard drive
    • Ask Ubuntu discussion
      • I was able to see my 2nd hard drive (even though it wasn't accessible for storage yet, using "lsblk")
      • For newer and larger drives, I think the answer whose first step is to run "sudo blkid" may be the most relevant.
    • Ubuntu community help
      • Even without being mounted, I could see information about my 2nd hard drive (which was /dev/sdb), using "sudo lshw -C disk"
      • My 2nd hard drive was 3 TB, and both discussions mention special steps need to be taken for more than 2 TB of space (specifically, fdisk should not be used to create an MBR partition with >2TB)
      • For a new external hard drive, "parted" is recommended to reformat the drive.
        • For each partition, I think "sudo mkfs.ext4 /dev/sd[x][n]" should work.  However, I think that should be for partitions like /dev/sdc1 not the full drive like /dev/sdc.
      • There is some information about command line formatting options here.
        • I also thought this YouTube video provided some general background, but it doesn't really provide as much Linux-specific information (if reformatting for primary use on Ubuntu, which I think would probably be ext4).
      • You probably don't want to have to use "sudo" for all commands within the mounted drive.
        • This discussion relates to that issue.
        • This also relates to the configuration in the /etc/fstab file for loading mounted drives.  There is a recommended set of settings in the Ubuntu guide for Systemwide Mounts (although I am using ext4 instead of vfat).
        • Also, I might need to change things in the future, but I needed to use "defaults" instead of the provided in order to get "sudo mount -a" to correctly load the drive (after editing the /etc/fstab file).
        • I think checking for the presence of the "lost+found" subfolder is another way to see if the mounting was successful.
        • Most of this is also discussed in the first Ubuntu community help link that started this section.
    • Information about RAID drives (which is what I had at one point)
  • Setting up a static IP
    • With the newest version of Ubuntu server, I think this probably uses "netplan" to create a static IP
    • I forget exactly what I used at first, but I used this to help me be able to access external servers (name servers map names to IP addresses, and you must list name servers to be able to do things like update programs, clone git repositories, etc).
    • The subnet mask also confused me, but I think you probably want "24" (where I found the definitions for 255.0.0.0, 255.255.0.0, and 255.255.255.0 masks, which are described on this page, and are 8, 16, and 24 respectively).
    • There is also a more formal website for netplan here.
    • On Windows, you can list IP addresses on your network using arp -a.
  • In general, there is some free information on Linux Journey.
Other:

  • Vi Text Editor
  • Notepad++ Editor
    • With default settings, if you write code in Notepad++ and run the code on a Linux system, it may sometimes be helpful to run 'dos2unix` on your code
    • Ubuntu Notepad++ Alternatives (I recommend gedit)
  • Basic MS-DOS tutorial
  • LaTeX tutorial
  • MiKTeX - Windows software for processing Tex/LaLeX files; also useful for compiling R packages
  • MacTeX - Mac software for processing Tex/LaLeX files
  • Subversion high-speed tutorial
  • Using subversion for Bioconductor packages
  • Google Code University
  • Git Bioconductor Tricks
    • For managing GitHub repository and Bioconductor Repository: http://bioconductor.org/developers/how-to/git/sync-existing-repositories/ 
    • You can confirm that the upstream repository has been added with "git remote -v"
    • You may need passcode to run "git clone git@git.bioconductor.org:packages/[PACKAGE].git", but other users can clone repository with "git clone https://git.bioconductor.org/packages/[PACKAGE]"
    • If you prefer working with the GitHub interface ("origin" in the instructions above), you can indirectly update the Bioconductor repository as follows (except if Bioconductor changes a file, such as the description file in new releases) :
      • git clone https://github.com/[username]/[package]
      • cd [package folder]
      • git remote add upstream git@git.bioconductor.org:packages/[package].git
        • If needing to update release branch, please see Tutorial for fixing bugs
        • If already synced (and you have checked out the appropriate release), you can also update the branch with "git push" and "git push upstream."
      • git add [updated files]
      • git commit -m "update message"
      • git push upstream master
  • Amazon AWS (cloud computing)
    • Even though I still have some free Google Cloud credits, I encountered an issue with a newer gcsfuse interface, such that I thought it might be easier to go back to AWS (or purchase a Linux server for my apartment)
    • So, here are some general notes:
      • I would recommend using putty to connect to your EC2 instances
      • S3 storage and EFS storage are different (I would use S3 for sharing large datasets, and EFS for mounting internally shared data between EC2 instances)
      • Amazon provides a way to make EFS mounting easier using amazon-efs-utils, using the two commands:
        • sudo yum install -y amazon-efs-utils (installation)
        • sudo mount -t efs [file system ID]:/ /path/to/efs (mounting the EFS storage)
        • You can also see similar instructions when you view the full information about the file system that you created.
      • aws Command Line Interface (CLI) - includes commands to work with S3 storage, and it is already installed on EC2 instance (but I noticed a command to transfer data from S3 to EFS/EC2 didn't work exactly like planned)
      • Instead, if you are on Windows, I would recommend WinSCP to transfer data from your local computer to an EC2 instance (and, in turn, the EFS mounted storage)

Bioinformatics 101: DNA Methylation Analysis

Enrichment-Based Analysis Tools:


Bisulfite-Conversion Based Analysis Tools:

Bioinformatics 101

I thought it would be nice to provide a set of links for bioinformatics resources that I find to be useful.  A lot of this information comes from the experience that I gained working in the Bioinformatics Core at City of Hope.  Unlike my other blog posts, I will come back and modify these lists over time (since new bioinformatics resources are constantly being developed.

In order to help organize this huge amount of information, I have divided my annotations into the following sub-posts (all with the "Bioinformatics 101" label):


Also, feel free to leave your own suggestions as comments on the relevant pages!

Sunday, March 10, 2013

Tracking My Thoughts Using the Affectiv Suite

I was most pleased with the Affectiv Suite in the EPOC control panel (click here for more technical details from Emotiv tech support), so I will briefly describe my experience in this post.

The excitement / calm signal commonly varied, whereas the engagement/disinterest and meditation signals were harder to change.

Preparing images for this blog post (typical for random task)

The engagement / disinterest signal seemed to require a sustained change in attention for an extended period of time, which was sometimes accompanied by a clear change in the excitement / calm signal (for example, see the slight long-term increase in the image below).

Watching Futurama:

I wasn't really able to see any clear peaks in the meditation signal but I could see a decrease in the engagement signal when trying to relax (for example, see the long-term change in the image below).

Listening to music with eyes closed

In short, I think the engagement / disinterest signature was the most accurate, the excitement / calm signature may be too sensitive, and the meditation signature may not be sensitive enough.

Reading My Mind Using The Emotiv EPOC

Once I saw the TED talk on the Emotiv EPOC / EEG, I knew that I had to get my hands on that mind-reading gadget.

Emotiv offers two versions of the headset shown in the TED talk: the EEG (which allows users to access their raw EEG data) and the EPOC (it costs much less because which only allows you to run applications and doesn't give you access to the raw data).  Because I only had a causal interest in the topic, I bought the EPOC headset.

I tested the EPOC headset on myself as well as a few friends.  When my friends and I first saw the Cognitiv Suite, the response was similar to the crowd of the TED talk: people were impressed that the virtual box would move after training the first action. However, the excitement faded after using the Cognitiv Suite for a few minutes and trying to control multiple actions because it isn't highly accurate at detecting a specific thought. For example, if you train "push" and "left", you will probably see the box move towards you more than it moves left (or vice versa), and the action probably won't really be in sync with your thoughts.

In short, I didn't find the EPOC headset to be as cool as I was hoping it would be because it wasn't a very effective tool for providing mind control.  Nevertheless, I do think it is interesting to be able to visualize my brain activity (see the links to additional posts below).

Part #1: Review of EPOC Apps

Part #2: Tracking My Thoughts Using the Affectiv Suite

To be fair, my friends and I are only small pool of test subjects.  For example, the Emotiv website lists papers published using data from the EEG and EPOC headsets, and you can find various videos on YouTube for interesting applications (for example, click here or here).  Likewise, I found this presentation that showcases some examples of EPOC headset data with a (simple) technical explanation about what is going on.  However, my own personal experience was more similar to this EPOC review or this article which points out that researchers could use the device to make non-random predictions about user's PIN numbers but the predictions were not super accurate.

Finally, it may be worth noting that there are other devices with similar functionality (for example, see this list from Wikipedia), which is something that I didn't initially realize.  The Emotiv headset may really be the best option, but I would at least recommend researching some other options if you were interested in trying out a "mind reading" device.

Review of EPOC Apps

Useful / Interesting Apps:

1) EPOC Control Panel (Free) - the basic suite of tools: allows you to check the quality of the signal from the sensors on the EPOC headset, detect facial expressions (Expressiv Suite), detect mental states (Affectiv Suite), perform actions (Cognitiv Suite, see my first post), and detect head motions (mouse emulator).  You can see this letter from Emotiv tech support for a more detailed, technical explanation of these features.

The mouse emulator worked perfectly, but it uses a gyroscope in the headpiece (so, it doesn't actually depend on signals from your brain).  I could detect some interesting changes in the Affectiv Suite (see this related post), but I couldn't really get the Expressiv Suite or Cognitiv Suite to work well for me.  I could always get the headset set up correctly, but less than 1/2 of my friends could achieve this step (i.e. the computer couldn't read any signal from the headset).

I did have one friend where the Expressiv Suite seemed to work well, but it was never very accurate for me: it thought that I was always trying to smile, and the overall patterns just got worse when I tried to retrain the algorithm.

2) Emotiv EPOC Brain Activity Map ($9.95) - allows you to visualize the signal for alpha, beta, delta, and theta waves for all of the sensors in a 2D map.  Provides 3 different visualization types.  It would be nice if it could record averaged activity over time, but I think it is still much better than the more expensive 3D version.

3) subConch (all users - Free) - creates sounds that are supposed to match your mental state (e.g. low pitched sound for a calm mind, high pitched sound for a calm mind).  I also found that the application has its own website  which shows how the software is utilized in an art exhibition   FYI, I almost listed it as a "Disappointing App" because it didn't initially install correctly - be sure to extract a compressed file after the installation closes (this is what I failed to do the first time).

Disappointing Apps:

1) Spirit Mountain Demo Game (Free) - you move around an avitar in a 3D world from a 1st person perspective, and you need to use the EPOC headset to accomplish various mind-controlled tasks.  To be clear, you move around the world with a mouse and keyboard: I initially thought you could control movement with your mind (and you probably can try to do this using some advanced option).  To be fair, I think it would be nearly impossible (at least for me) to achieve this level of control with the EPOC headset alone, and I did find a demo video where the narrator does explicitly saw that he is using the mouse and keyboard.  It was also extremely slow on my computer (running Windows 7 with 1 GB of RAM).  I'm sure that I could modify the inital rendering options to improve this, but I honestly didn't think the game was fun enough to warrant the extra effort (probably because I already had so many difficulties with the Cognitiv Suite in the EPOC Control panel).

2) Emotiv EPOC 3D Brain Activity Map (Standard Edition - $49.95) - Wraps 2D signal for alpha, beta, delta, and theta waves (as well as a customized wavelength) around a 3D head.  This is essentially the same as a 2D image because there is no calculated depth within the brain.  I found the interface a little more buggy than the 2D version.  For example, using the scroll button to zoom in and out didn't really work: once you started to zoom in, you had to be either fully zoomed in or fully zoomed out (it didn't seem to measure gradations of zoom).

Also, part of the reason I wanted to check it out was because it came with the ability to record measurements over time.  This is true, but I think it is really just intended for you to be able to see different sides of the head at the same measurement point: it really isn't practical for being able to see how your activity changes over long periods of time (for example, a 30 minute recording would have to be viewed in real time).

Finally, I was absolutely furious when the program wouldn't initially install correctly.   I did eventually figure out how to solve the problem (the .exe extension was simply missing from the executable file) and tech support was prompt to offer a solution.  Nevertheless, I think this sort of thing might be reasonable to expect for a free app, but I would hope these sort of problems would have been figured out for a program that costs $50!

3) MindKeyboard (Free) - the idea is that you can type using your mind by moving a cursor left and right along a string of letters (and using another feature, like the "push" action, to select a letter).  I liked the simple idea, but I could never get it to work (again, probably because I already had so many difficulties with the Cognitiv Suite in the EPOC Control panel).

 
Creative Commons License
Charles Warden's Science Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.