Wednesday, March 13, 2013

Bioinformatics 101: General Coding Information

UNIX:


Perl:

  • Downloading Perl
  • References
  • Parallel Computing
    • Stack Overflow discussion on limiting threads in Perl
    • PerlMonks discussion on threads in Perl
    • Semaphore example
    • Stack Overflow discussion on using for-loop to add threads
    • In a nutshell, I would say the steps are as follows:
      1. Load Dependencies
        • use threads;
        • use Thread::Semaphore;
      2. Create semaphore (with maximum number of reads)
        • our $semaphore = Thread::Semaphore->new($max_threads);
      3. Use subprocess to apply for multiple threads. 
        • That should look something like this (for each sample/process):
        • push @threads, threads->create(\&function_name, $func_var1, $func_var2,...)
        • foreach (@threads) {$_->join;}
      4. Within subprocess (described as "function_name" above), control the number of on-going processes as follows:
        • Start function with $semaphore->down(); (occupying thread)
        • End function with $semaphore->up(); (opening up thread)

Python:


R:


Docker:
  • Understanding Docker (high-level introduction)
  • Docker User Guide
  • My Notes (read tutorials first):
    • To mount Windows Documents folder, docker run -it -v /c/Users/[your username]/Documents:/mnt/[mounted name] [image]
      • If you need to re-enter an exited session, you can use docker start -ia container_ID to re-open it (note use of container ID instead of image ID)
    • docker ps -a to see exited interactive jobs
    • If if host your images on Docker Hub, try to keep them under 3 GB
      • To upload (after running an interactive session):
        • docker commit -m "update message" container_ID [image]
        • docker push [image]
  • Using Docker Through Singularity:
    • General Tutorial - for example, you can run interactive mode with singularity shell docker://user-name/repository
    • Mapping Folders - for example, as an extension of the above example, you can run a Docker image in interactive mode with a mapped drive using singularity shell -B /source/path:/docker/path  docker://user-name/repository
    • NOTE: this may not work for all Docker images (with errors not apparent until you try to run programs within the container), but I think it should work for some of them.
C++:

  • C and R:
  • My Notes
    • For g++ compiler, binary output is created with "-o"
      • You can use "-g" option for debugging and "-Wall" for warning messages, but you'll still get error messages either way
    • If mixing your code with open-source code, take the compiler into consideration.  For example, some string functions that work when compiling in gcc but not g++.
VirtualBox Virtual Machine:
  • Ubuntu .iso
  • Mounting shared folders
    • When you first open Virtual Box, choose settings for your image and define folder (under "Shared Folders")
    • To make that folder accessible, go to "Devices --> Insert Guest Additions CD image"
    • Probably should restart machine
    • Your folder should appear under /media/sf_[folder name]
    • However, you may still not have access to the contents of that folder.  To fix the permissions issue, run sudo mount -t vboxsf [folder name] /media/sf_[folder name]
    • This might not be sufficient to have folder load everytime you start the VM.  If you run into issues, try sudo usermod -G vboxsf -a [username] after re-mounting folder 
  • If you find yourself in a situation where Ubuntu won't load from a locked installation file, you can fix this by pressing "Left-Shift" before Ubuntu starts to load (and then use the GRUB menu to fix the installation file).  I don't think this is unique for the VM environment, but that is where I saw this could work.
    • This was a helpful blog post that reminded me about the alternative boot option
    • There were some issues with guest additions after that (at least one time), but some extra information about that process was described here.
Free Data / Code Sharing:
  • GitHub (up to 1 GB per repository, 100 MB per file)
  • SourceForge (honor system or 5 GB?)
  • FigShare (up to 5 GB)
  • Dryad (up to 20 GB)
  • Zenodo (up to 50 GB)
    • Has versioning (although it took me a little while to realize this)
    • However, this doesn't seem quite as flexible as GitHub.  For example, the upload comes with a warning that "File addition, removal or modification are not allowed after you have published your upload".

Setting up Ubuntu Server
  • I am not sure why, but I had better installation success using the "alternative" installation files (as suggested as a solution in this forum)
  • A little early to say how much of a "success" everything is.  However, at least for the installation step, this is the "No OS" computer that I am trying to set up as a server: Dell Server
  • Restart server via command line using `sudo reboot` or `sudo poweroff`
  • Tutorials to set up SSH key: here and here
    • I found the instructions to be a bit confusing.  However, to essentially have 2 passwords and require an SSH key, you will want to add "AuthenticationMethods publickey,password" to the /etc/ssh/sshd_config file and then restart the service using "sudo service ssh restart" (as described here)
  • Reformat SSH keys to use with PuTTY
  • Using SSH keys with WinSCP
    • Even though I provided a password, I still needed to enter the ssh passphrase as well as the server password (the way that I set things up)
  • Mounting an additional hard drive
    • Ask Ubuntu discussion
      • I was able to see my 2nd hard drive (even though it wasn't accessible for storage yet, using "lsblk")
      • For newer and larger drives, I think the answer whose first step is to run "sudo blkid" may be the most relevant.
    • Ubuntu community help
      • Even without being mounted, I could see information about my 2nd hard drive (which was /dev/sdb), using "sudo lshw -C disk"
      • My 2nd hard drive was 3 TB, and both discussions mention special steps need to be taken for more than 2 TB of space (specifically, fdisk should not be used to create an MBR partition with >2TB)
      • For a new external hard drive, "parted" is recommended to reformat the drive.
        • For each partition, I think "sudo mkfs.ext4 /dev/sd[x][n]" should work.  However, I think that should be for partitions like /dev/sdc1 not the full drive like /dev/sdc.
      • There is some information about command line formatting options here.
        • I also thought this YouTube video provided some general background, but it doesn't really provide as much Linux-specific information (if reformatting for primary use on Ubuntu, which I think would probably be ext4).
      • You probably don't want to have to use "sudo" for all commands within the mounted drive.
        • This discussion relates to that issue.
        • This also relates to the configuration in the /etc/fstab file for loading mounted drives.  There is a recommended set of settings in the Ubuntu guide for Systemwide Mounts (although I am using ext4 instead of vfat).
        • Also, I might need to change things in the future, but I needed to use "defaults" instead of the provided in order to get "sudo mount -a" to correctly load the drive (after editing the /etc/fstab file).
        • I think checking for the presence of the "lost+found" subfolder is another way to see if the mounting was successful.
        • Most of this is also discussed in the first Ubuntu community help link that started this section.
    • Information about RAID drives (which is what I had at one point)
  • Setting up a static IP
    • With the newest version of Ubuntu server, I think this probably uses "netplan" to create a static IP
    • I forget exactly what I used at first, but I used this to help me be able to access external servers (name servers map names to IP addresses, and you must list name servers to be able to do things like update programs, clone git repositories, etc).
    • The subnet mask also confused me, but I think you probably want "24" (where I found the definitions for 255.0.0.0, 255.255.0.0, and 255.255.255.0 masks, which are described on this page, and are 8, 16, and 24 respectively).
    • There is also a more formal website for netplan here.
    • On Windows, you can list IP addresses on your network using arp -a.
  • In general, there is some free information on Linux Journey.
Other:

  • Vi Text Editor
  • Notepad++ Editor
    • With default settings, if you write code in Notepad++ and run the code on a Linux system, it may sometimes be helpful to run 'dos2unix` on your code
    • Ubuntu Notepad++ Alternatives (I recommend gedit)
  • Basic MS-DOS tutorial
  • LaTeX tutorial
  • MiKTeX - Windows software for processing Tex/LaLeX files; also useful for compiling R packages
  • MacTeX - Mac software for processing Tex/LaLeX files
  • Subversion high-speed tutorial
  • Using subversion for Bioconductor packages
  • Google Code University
  • Git Bioconductor Tricks
    • For managing GitHub repository and Bioconductor Repository: http://bioconductor.org/developers/how-to/git/sync-existing-repositories/ 
    • You can confirm that the upstream repository has been added with "git remote -v"
    • You may need passcode to run "git clone git@git.bioconductor.org:packages/[PACKAGE].git", but other users can clone repository with "git clone https://git.bioconductor.org/packages/[PACKAGE]"
    • If you prefer working with the GitHub interface ("origin" in the instructions above), you can indirectly update the Bioconductor repository as follows (except if Bioconductor changes a file, such as the description file in new releases) :
      • git clone https://github.com/[username]/[package]
      • cd [package folder]
      • git remote add upstream git@git.bioconductor.org:packages/[package].git
        • If needing to update release branch, please see Tutorial for fixing bugs
        • If already synced (and you have checked out the appropriate release), you can also update the branch with "git push" and "git push upstream."
      • git add [updated files]
      • git commit -m "update message"
      • git push upstream master
  • Amazon AWS (cloud computing)
    • Even though I still have some free Google Cloud credits, I encountered an issue with a newer gcsfuse interface, such that I thought it might be easier to go back to AWS (or purchase a Linux server for my apartment)
    • So, here are some general notes:
      • I would recommend using putty to connect to your EC2 instances
      • S3 storage and EFS storage are different (I would use S3 for sharing large datasets, and EFS for mounting internally shared data between EC2 instances)
      • Amazon provides a way to make EFS mounting easier using amazon-efs-utils, using the two commands:
        • sudo yum install -y amazon-efs-utils (installation)
        • sudo mount -t efs [file system ID]:/ /path/to/efs (mounting the EFS storage)
        • You can also see similar instructions when you view the full information about the file system that you created.
      • aws Command Line Interface (CLI) - includes commands to work with S3 storage, and it is already installed on EC2 instance (but I noticed a command to transfer data from S3 to EFS/EC2 didn't work exactly like planned)
      • Instead, if you are on Windows, I would recommend WinSCP to transfer data from your local computer to an EC2 instance (and, in turn, the EFS mounted storage)

No comments:

Post a Comment

 
Creative Commons License
Charles Warden's Science Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.