Wednesday, December 4, 2019

Experiences with On-Line Courses

I converted some notes from my Google Sites page to a blog post, in order to make things a little more clear and neat:

So far, I've tested a couple options for on-line courses:

Udemy (2 courses):

  • The first course that I completed was The Complete Ubuntu Linux Server Administration Course !
    • I thought the course that I took was a good overview of concepts / commands to use for data analysis on a personal server (or on a VM at work)
      • For example, at least currently, I thought this was a little better for learning how to set up an Ubuntu server for personal use (versus being a system admin in an enterprise system)
    • However, I am not saying this is sufficient for me to be an system admin for others.  That would require additional experience and probably some evidence that you make mistakes beyond some maximal acceptable level (certification is essentially an honor system, but I think that is OK as long as that is made clear)
    • I think there is usually some sort of discount - I paid $10.99, which I think is very reasonable (making up for typos and possibly slightly outdated information).
      • However, I am sorry, but I would not recommend the course at regular price.
      • Cost is for lifetime access - you don't need to pay a monthly fee to access contents from courses that you have taken
    • So, when a discount is offered again, I may try another Linux / UNIX course.
  • The second course that I completed was Learning FileMaker 18 - Complete Course
    • This provides a lot of good videos to watch (with lifetime access)
    • I purchased at full price, and I think that was OK.
    • However, again, I am not saying this is sufficient for me to be an intermediate or advanced FileMaker developer.
    • No quizzes or assignments, but I thought it was good to follow-up after some shorter LinkedIn Learning courses (such as Learning FileMaker 16 and FileMaker: Relational Database Design)
  • In both cases, you can add bookmarks to tag specific points in the lecture.

Lynda / LinkedIn Learning (multiple courses):

  • So far, all of the courses that I have completed (including Learning Bash ScriptingBuilding an Ubuntu Home Server, and Learning Ubuntu Server) were each shorter than the Udemy course on Linux administration (an estimated time of 1-3 hours)
    • Kind of like the Coursera classes, I currently already use UNIX commands (for bash scripting).  However, there are probably some extra skills that I can learn.
    • The Ubuntu server course is more like what I took from Udemy (some personal experience, but a larger fraction of material that I don't use on a regular basis)
  • Unlike the our sites, I also took some classes on social interaction (Developing Your Emotional IntelligenceGiving and Receiving FeedbackUnconscious BiasHaving Difficult Conversations, and Communicating with Diplomacy and Tact)
  • Similar to Udemy, there is no time limit to complete the courses
  • You can download exercise files, but you will get the certificate if you watch all of the material
    • For example, my bash class was set as complete before I finished the final quiz
    • Likewise, there were no quizzes or exercises for the "building" Ubuntu home server class (although see quizzes in Learning Ubuntu Server)
    • So, similar to Udemy, I would say this is for personal growth and shouldn't really count as certification for a current or future job
  • For LinkedIn Learning, there is a 1-month free trial, and then $29.99/month (monthly) or $19.99/month (for full year)
    • There is a "Notebook" to manually take notes (once you press enter, it logs the time of the video - so, don't wait to take notes after the presentation)
    • There is also an F.A.Q. section to ask questions, etc.
    • However, you can take as many classes as you like each month (without extra charges)
  • Also, I heard Lynda may be offered for free from some libraries.  For example, here are free courses offered through the LA County library system.  I think that is also I hope this is still true from LinkedIn Learning.  That may make the difference between me preferring Lynda versus Udemy.
    • To be fair, I did pay for the Premium version of the Duolingo app on my phone (which is listed on my local library's website), but my concern about posting the "certificate" on LinkedIn is still valid (and I think Duolingo was $10/month as well as being more interactive than most of the Lynda classes that I have checked out, in addition to having a freely available version).


Coursera (2 individual courses, 2 specializations completed):

  • From what I have tested, the Coursera courses seem to have more structure / requirements than a Udemy course
    • If you want to provide these certifications to employers, then I think this is is important - philosophically, I think you would be treating your work at a real job like a commitment to complete a Coursera course
    • For example, if you didn't have previous experience, I think it is important to take the courses in order and/or check if there are dependencies for previous skills (such as general coding in R, etc.)
    • That said, if you didn't want to learn, you can probably find a way to pass the course through either brute force and/or getting answers from others.  So, it is not perfect, but I think this is reasonable security for the price (and I really did learn some new material from the courses).
      • If this is for certification to keep an existing job, an additional rule to have to share your project with your supervisor (in addition to passing peer review) may also help make cheating harder?
  • As a technical note for all courses that I have seen, I find the feature to take notes / bookmarks in lecture videos to be useful.
  • Also, if you are worried about over-committing yourself, I don't know all of the rules, but there is some publicly available content to try and avoid starting  a course that you can't handle.
    • For the Regression Models course that I took, there was code here, videos here, and reading material here.
      • I think this is my favorite course that I have taken so far, and I would recommend it to others.
      • The course was estimated to take 17 hours, and it took me ~24 hours.  Assuming that people with less experience will complete the course less quickly, I think that is at least a 30% underestimate of time.
        • However, if you assumed that you were supposed to spend 8 hours a week for 4 weeks, then this was less than 32 hours (so, perhaps expected time should have been provided as an interval).
      • If I had the ability to rate at the 0.5 star level, I would have given it 4.5 stars (my actual rating was 5 stars)
    • For the Practical Machine Learning course, I think the public materials are a little different, but I did learn about RWeka in the caret package (when I was previously only familiar with the Weka GUI) and this tutorial was linked in the slides.
      • This course has noticeably more bugs in the quizzes than the "Regression Models" class (and it also lacked any swirl practice exercises, which I thought were nice).  You can re-take quizzes and I did like learning about the caret package.  However, I am currently less confident about recommending this course.
      • Interestingly, more than one of issues with the quizzes essentially revealed a limitation / problem with a strategy described in the videos, which might even be a somewhat popular method.  However, explicitly describing limitations / problems in the lectures (and the scientific literature) is a better way to find out about this (even if this causes a hopefully minor level of conflict with other researchers, or even your own earlier publication record), and errors / bugs should respectfully be acknowledged and fixed as soon as possible.  However, I think this is mostly due to changes in the dependencies over time - if the conclusions are likely to change, then that matters (but I think that may have to a general limitation in precision, which users need to be cautious about).
      • While the course primarily uses the caret package, there was at least one task that required the elasticnet R package for LASSO regression.
      • For the project, I think you should take a look at this forum discussion, in terms of posting a compiled HTML page the way that is requested.
      • This course was estimated to take 14 hours to complete.  While I didn't record all of my time as carefully, I would say that spent at least 17-19 hours to complete the course.  While this doesn't seem as bad (at least a 20% underestimate of time), I am reducing the overall rating below since I thought the extra time for exercises helped with understanding.
      • If I had the ability to rate at the 0.5 star level, I would have given it 3.5 stars (but I may have rated it higher closer to the time the course was developed, and I my actual rating was 4 stars)
      • To be fair, I thought the results of the course project were interesting, so I want to clear that I did find the course to be useful.
        • I decided it was probably more tactful to not link to the submission, but I believe there was at least 3 other projects that were able to achieve better accuracy on the quiz.
        • I also thought we were supposed to limit ourselves to 4 types of measurements (but this might have just been my misunderstanding).  For example, I filtered "num_window" before doing any analysis, and I am not sure if that mattered.
        • However, my point is that I agree/believe that a better model can be created (but the need to be cautious about over-estimating accuracy is a real concern for a lot of projects).
    • The 2 individual Johns Hopkins data science courses that I listed above are also part of certificates (for 5 courses or 10 courses) - at $49 per month and an estimated time to completion of 6-8 months, I believe that should be a total cost of less than $500.
  • UCSD Bioinformatics Specialization: Finding Hidden Messages in DNA (Course 1, dropped out before being charged).
    • Even though the names of the classes seemed like a good fit, my experience from the 1st class and looking at the syllabus for the 2nd class made me decide this is not a good fit for what I was looking into in that I think the emphasis on more advanced coding or coding efficiency was more than needed for my current position (courses to introduce others to genomics and/or beginner coding, or provide intermediate level bioinformatics certification for myself).
      • For example, I found the course to be harder than I was expecting, even though I write or modify code (in R/Python/Perl) for my regular job.
    • I thought it was interesting that the basic requirements didn't require coding, but there was an "honors" track allowed practice for testing applications with coding.  However, if you plan to meet the honors requirement, I would recommend taking at least 1 Python core in advance.  I needed to be proficient in coding in order to complete the first few coding exercises.
      • Some of the information needed to pass the quiz is only in the Stepik section.
      • If you don't actually complete all of the tasks for the past week, I think passing the next week becomes harder (even if you decided that you didn't need to get the honors designation).
      • So, even if I could have passed the course with the time allotted, I had other things that I needed to complete and I decided to continue this search (for courses to recommend to beginners or get intermediate-level certification while working a full time job) over completing this particular class.
    • I learned about the Biology Meets Programming: Bioinformatics for Beginners course to learn Python (although I have not current taken a look at that, since I already use Python on a fairly regular basis)
    • I learned about the Ori-Finder program
    • I learned that courses are also available from Stepik, and you can see my profile here (currently, for content linked to the Coursera course).
    • I also found some of the optional calculations (which don't contribute points) required looking at the comments in order for me to be able to figure out the answer.
      • There are also not explanations for the specific answers if you get the question wrong.  For example, that can make it hard to find and confirm if there is a bug in my code, since there are functions worked for the Stepik exercise and certain versions of the quiz, but are I can get scored as having the wrong answer for some versions of the questions.
  • I completed Epidemiology in Public Health Practice specialization from Johns Hopkins University.  This includes the following individual courses:
    • Essential Epidemiologic Tools for Public Health Practice
      • I thought the IHME plots were interesting, including projections for COVID-19
      • I learned about the open-source QGIS software
      • I learned about how Shapefiles can be downloaded for analysis of data displayed along geographical regions (along with US Census data to overlay)
      • I think the time-estimates were reasonable
    • Data and Health Indicators in Public Health Practice
      • Learned about Quality of Mortality Statistics (including sources like the PAHO/WHO and WHO)
      • Learned about the ill-defined cause of death measure/rate (as a quality metric to compare sources), and Quality of Mortality Index Score that was defined as 0.7 * percent under-registered deaths + 0.3 * percent ill-defined cause of death (ideally, less than 10%)
      • Consideration of artifacts (such as changes in the reporting systems affecting the rate estimates) was also discussed
      • Discussed common adjustments to rates in public health applications
      • I liked the use of partially completed Excel files to help provide practice with relevant calculations
      • I kept re-taking the quiz (to learn the right answers), but I had more difficulty getting >80% on my first try for this course (compared to the previous course)
      • I think there were also relatively more typos than the previous course (for example, a table is missing for one of the questions on the last quiz)
      • I think the required time estimate may be under-estimated (perhaps 5 hours should be the lower value for a time interval)
    • Surveillance Systems: The Building Blocks
      • I think the time-estimates were reasonable
      • I was more likely to pass the quizzes on the first try, but I usually went back to increase re-take the quiz and increase my score
    • Surveillance Systems: Analysis, Dissemination, and Special Systems
      • I think the time-estimates were reasonable
      • I think that I was able to pass all of the quizzes on the first try, but I usually went back to increase re-take the quiz and increase my score
    • Outbreaks and Epidemics
      • Explained the Basic Reproductive Number (R0, infections without immunity) and Reproductive Number (R, infections where a certain percent of the population already has immunity, may be estimated as R0* percent susceptible)
      • There were optional exercises throughout the lectures, but answers were often not provided
        • I thought this as an interesting simulation for the scientific process (when the true answer may not be known), and I thought the exercises helped with understanding of the material
        • These could have been used as the quiz questions, but that probably would have decreased the chances of being able to pass on the 1st try.  This would bother me less than having difficulty passing due to typos and/or wrong "correct" answers, but I can see how some other students might view this negatively.
        • Completing the exercises takes extra time, but I think this was still OK
  • I have completed the Genomic Data Science specialization from Johns Hopkins University.  This includes the following individual courses:
    • Introduction to Genomic Technologies
      • The first week of the first class made me think this is probably a better fit for beginners than the UCSD Bioinformatics specialization.  For example, I could get 100% on the 1st quiz on my 1st try.
      • No coding is required for this course
      • I think there might be a benefit to requiring anybody who wants to do a genomics experiment to be able to pass a course like this (offered from an objective 3rd party).
    • Genomic Data Science with Galaxy
      • In general, I think Galaxy is a good intermediate step for becoming familiar with open-source genomics programs.
      • However, the local (or cloud) installation of galaxy was something new for me.
      • While not directly part of the course, I found that I needed to learn how to manage python packages in a virtual environment to troubleshoot local installation of galaxy on a VirtualBox VM (to try and troubleshoot an issue with importing "ensure_str" from the "six" package.  For example, you might find the section "Activating a virtual environment" of this tutorial to be useful (except that I used /path/to/galaxy/.venv/bin/activate, instead of env/bin/activate).
        • Strictly speaking, I have not yet solved this problem, but I tried to see what I could do for the course if I don't install Galaxy locally or use AWS.
        • The course project uses a relatively small set of reads (less than 100,000 paired-end reads per sample), so that should help with being able to use the main (free) Galaxy interface.
      • Similar to my own experiences (where I decided to buy / build a local server install of using AWS), there is a warning in the Week4 discussion that you don't need to use AWS to complete the course and one student was charged $1700 to try and complete the course project.  This is why I was focusing more on the local installation, even though AWS is popular and it was probably a good idea to get exposure to how Galaxy could work on AWS.
      • Even if you don't take the course, you might find the Galaxy Training! website useful as an introduction to various types of analysis and using Galaxy.
        • However, you might need to be careful that you have all of the necessary dependencies on your version of Galaxy for a pre-existing workflow.
    • Python for Genomic Data Science
      • The first week describes Python through the interactive interface for basic concepts (while then showing how to combine those commands into a script).  I think this is good for beginners, such as biologists that want to learn to code.
      • Learned about Python resources that can be passed along, such as LearnPython.org
      • While I could pass (with >70%) on my first try, I thought the wording for the 2nd quiz added some unnecessary complications (kind of like SAT or GRE questions can be intentionally misleading).  I think this adds frustration for true beginners (who may understand the content better than they think the quiz reflects), even if you are allowed to re-take the quiz.
      • If it hadn't been formatting issues for a question in Quiz 4 (in Week 2), I would have been able to get through Week 2 without writing any "long" scripts outside of the interactive interface (for a more complicated, multi-step process).
      • If you are using this for learning (rather than certification), then maybe it is worth mentioning that a lot of students thought the quizzes in week 2 were too difficult (without enough preparation from the lectures).
      • For the most part, you can get through Week3 with either writing no scripts or only short scripts (possibly with 1 exception, requiring the use of the clock function).
      • In general, you might find this tutorial helpful for using Biopython
      • You can also see a summary for the BLAST analysis here.  The run-time makes me question whether this is the optimal way to run BLAST (in other situations), but I think changing the parameters for NCBIWWW.qblast() to use expect=1 * 10**(-20), alignments=3 might help some.
      • The final example require that you are able to write more complicated scripts.
    • Algorithms for DNA Sequencing
      • I think this is essentially the second part of the Python class (covering more intermediate-to-advanced skills), along with discussion more about the details of DNA sequencing.
      • Includes link to try Try Jupyter using for Python analysis
      • There are some externally available materials in ads1-slides and ads1-notebooks, which I believe cover all 4 weeks.
      • While I think it should be OK for certification, I wonder if this might be a bit much for a true beginner (without some details like modules and objects not really explained on the programming side in the lectures).
        • I think you should also plan for 4~7 hours a week, rather than the 2-4 hours suggested)
        • That said, I thought Week 2 had the most difficult questions for me.  If this is true for others, then you should not assume the difficulty will continually increase.
      • Helped me better understand the concept of a De Bruijn graph (which I mostly remembered because of difficulty with pronunciation, but the instructor pronounced it the same in his lectures as in this other video, which I would describe as sounding as if you did not pronounce the "j").
      • I thought that there was fewer typos and less inaccurate information than I encountered in other Coursera courses, so I thought that was great.
    • Command Line Tools for Genomic Data Science
      • A CentOS VirtualBox environment is provided for the course
      • The course covers some basic unix commands, general genomics tools / resources (samtools, bedtools, IGV, NCBI, UCSC Table Browser, etc.), tools for DNA-Seq alignment and variant calling, and tools for RNA-Seq gene expression analysis
      • In Week 3, I learned about being able to use zcat to view compressed files without decompressing them.
      • I learned that you can use grep -v to exclude certain lines (such as headers with "#")
      • I also learned that you can use grep -P to search expressions with tabs (using Perl regular expressions).
      • I also gained experience learning how to interpret the cuffcompare output.
      • A lot of students (including myself) got a very low score on the first attempt the first Exam 4 because the example code provides the GTF to you need to not provide the GTF (at the TopHat2 alignment step) in order to get the right answers.
        • For me, that made the difference between getting <10% versus >95% on Exam 4
    • Bioconductor for Genomic Data Science
      • I believe the class was designed with packages from R-3.2.1
      • I believe that you can view many of the videos here.
      • While only a subset of the content relates to this course, there are some code examples here.
      • While it might be good for intermediate-to-advanced users, this course took me noticeably more time per week than the last course.  So, I am not sure if that might be a bit much for beginners.
      • I learned some new things about basic R structures (such as a limit on the size of a vector, the integer versus numeric type, etc.).  So, this was useful to someone who already has experience, but the instructor recommended having some experience using R before taking the course.
      • I learned about the plotRanges() function (for IRanges objects).
      • While not related to this course, you can also see some examples of those plots within this tutorial.
      • I learned about AnnotationHub search functions (including the display() function for interactive browsing).
      • Quiz 1 had a tip that was not precisely correct, and I think that caused some confusion for some students.
      • I believe that I have had more answers that were wrong (without an explanation) than any previous courses.  Sometimes, I received credit for my guesses (closest to but not exactly matching any of the options).  While this might not cause you to fail the course (and I got 100% for Quiz 2, even though I my answer was noticeably different than the 4 provided options for 3/10 questions), it might be worth knowing about in advance.
    • Statistics for Genomic Data Science
      • There are public links for the course materials here, along with an R package here.
      • For the GitHub code provided in the introduction (linked above), I was a little confused why "biocLite("jtleek/genstats",ref="gh-pages")" instead of "devtools::install_github("jtleek/genstats")", and I admittedly got an error message with both strategies using R-3.4.1 on 12/26/2020 (even though the exact error messages were a little different).
      • Nevertheless, the first link has the R code from the lectures, which I think is what is most important.
      • I tend to prefer using regular R over Rstudio.  However, you can still complete the R Markdown tasks with rmarkdown::render('Question2.Rmd', 'pdf_document') (using the rmarkdown package installed with install.packages("rmarkdown") and loaded with library(rmarkdown), as described here and here).  This might require installing dependencies like pandoc, but I found I could get that working for the second question on Quiz 1 by running R within "Bash on Ubuntu" on Windows 10 (with R version 3.2.3).
      • I wish the statistics were easier to find, but I saw a pop-up saying that the average time to complete Week 1 was 6 hours.  This is noticeably longer than the 3 hours listed in the syllabus, but I think that could be true (especially if you have issues with compatibility with the current version of R and packages and what was used in the course, I believe in 2015).
      • I also needed to read forum discussions like this one (or this one) to realize that an additional line was needed to run the code for the quiz (as well as using R version 3.2.3 in Ubuntu, since I think the dependency commands also changed over time).
      • I needed to use a different version of R in Question 6 of Week 1 (relative to the earlier questions).
      • I learned about the cutree function in R , which can be tested as an alternative to kmeans.
      • I learned that there is an R-base lm.fit function to carry out linear regression for several comparisons faster than lm (in addition to other implementations like fastLmPure in the RcppArmadillo package, a manual calculation in C++ using Rcpp and the boost libraries in the BH package, etc.).
      • For logistic regression and generalized linear models, this link was provided as a class resource for more information.
      • I learned the genefilter package has rowttests and rowFtests implementations for carrying out several comparisons on a matrix (and is easier to use than implementing your own Rcpp calculation).
      • I learned that I can use the snp.rhs.tests function in the snpStats package to relatively quickly apply logistic regression with a table of SNPs.  I also learned how to use the slotNames() and chi.squared() functions to work with the resulting object.
      • There were concerns about incomplete or imprecise information in forum (including quiz questions that did not have an answer).  If you are looking for certification of what you can currently do, perhaps this is OK.  However, if you are trying to learn the material for the first time, then this might be worth keeping in mind.
      • It was initially misunderstanding on my part, but I think this discussion may be worth taking into consideration (where removing genes with average counts less than 100 before running any statistical test increased the correlation of test statistics and defined a less extremely large number of differentially expressed genes).
    • Genomic Data Science Capstone
      • This is spread out among 10 weeks (rather than 4 weeks, like the other courses).
      • More specially, I think the course is designed as if it was for 8 weeks of work, but extra time is allowed for the first 2 "short" weeks (in order to help provided extra time to get through the alignment step in the 4th week, with the 1st task / quiz in the 3rd week).
      • Within 1 month of finishing the 7th "regular" course, I received an e-mail with a subject of "Unsubscribed after specialization completion" and a message that begins with "Congratulations! You earned your certificate".  However, I did not actually receive a certificate at that time, and I had not yet completed the Capstone.
      • Instead, the goal was to not charge me for the Capstone in the same way that I was charged when I was working on the other courses.
      • Access to the Capstone was extended when I tried to follow up on the message.  So, please note that there is a time limit to complete the Capstone after completing the 7th class, and there are still deadlines for each session of the Capstone (after enrolling on 1/24/2021, there were deadlines from 2/14/2021 until 4/7/2021 for me).
      • I thought the instructions were confusing for Week 5 and Week 9
      • I don't think others shared the same confusion for Week 9, but I think there were some difficulties expressed for Week 5.
      • I thought the instructions for Week 6 were OK, but most of the peer reports that I graded did not in fact upload a table with samples in columns and genes in rows.  So, I think something was in fact confusing to those new to the field.
      • I don't think the full documentation could be provided within 5 pages for Week 10, but I hosted my code and reports on GitHub
  • I think I already have a fair amount on my "To-Do" list, but I am also interested in checking out if Biostatistics in Public Health Specialization from Johns Hopkins University
  • Coursera lists the cost at $29-$99, either at the course or month level.
    • The JHU Data Science courses were $49 per month (but that counted for both the Regression Models and Practical Machine Learning courses)
    • The JHU Genomic Data Science courses were $39 per month
    • I wouldn't recommend taking more than 1 course at a time (with a full time job).

edX (1 certificate in progress):


  • I currently don't have experience with courses through this medium, but there are additional on-line courses / certifications / degrees listed here that aren't on Coursera
  • For example, I believe some of the courses for the Georgia Tech on-line data science program are listed here.
    • While I don't have direct experience, this student's GitHub content makes me think CSE 6040x (a core course in the data science curriculum) may cover some similar content as I have taken on Coursera.
    • You can also see some other courses / degrees from GTx here, which includes a MicroMasters in Analytics.
    • As I mention below, I think the on-campus MS in Bioinformatics may be a better fit than the MS in Analytics, for myself or somebody else with a similar background.
  • The University of Maryland, Global Campus has a MicroMasters Bioinformatics on edX
    • My understanding is that you can also get partial credit for the coursework (for BIOT 640, BIOT 630, BIFS 614 and BIFS 619) if you complete the MicroMasters for the Biotechnology : Bioinformatics on-line degree.
      • If I understand things correctly, this might also help with the cost difference for being out of state (~$17,000 in-state, ~$24,000 out-of-state).  However, these also seem like arguably the most important courses in the program, with the information that is most likely to be used in everybody research.
      • However, I also believe this is no longer going to be offered, after 2020
    • I don't have direct experience with this either.  However, this seems like something that could theoretically be appropriate for somebody like myself.
  • I am not sure if it is best fit for me, there are also other related MicroMasters program like Data Science from UCSD and Statistics and Data Science from MIT.
    • I believe these are all <$2,000
    • If I look at the MIT Data Analysis class, the MicroMasters link is only for the final exam (14.310Fx) whereas the course itself (14.310x) has a different timeline and registration.
    • There is also a "professional certificate" in Data Science from Harvard, which costs less and has less of a time commitment.
  • The Coursera Statistics for Genomic Data Science course references the edX Statistical Inference and Modeling for High-Throughput Experiments course for additional information (yes - I really mean the Coursera course recommended the edX course).
    • That page references professional certificates from Harvard (with a UNC - Chapel Hill co-instructor) in Data Analysis for Life Sciences and Data Analysis for Genomics
    • My understanding is that the pacing rules should be similar across courses (each course has deadlines, and the courses must all be completed within 24 months of the purchase date).
    • So, I am not completely what I could have done for free, but I did purchase the professional certificate material for Data Analysis for Life Sciences (I think theoretically completable in 4 months).
      • I am not sure of all of the implications, but I have been notified of a conversion to a for-profit model starting 11/16/2021If this causes some sort of fundamental change, then I may be much less hesitant to recommend edX to others.
      • With or without taking all of the courses, I think this HTML document may be a helpful reference.
      • Likewise, I think this GitHub page is worth knowing about.
      • While possibly confusing, this eBook is also free.  However, a free donation is suggested.  I think this may be a good strategy for providing materials (including R packages), and I would make a donation if I had not already paid for the edX course.
      • Statistics and R (grade of 93%)
        • Introduction says "All of the material is available immediately, and the only deadline is the end of the course".
        • However, each week has homework and quiz deadlines.
        • My understanding is that you can't get an extension if you haven't completed all of the work before the final deadline.
        • Discussion group participation is not supported (for questions about the concepts in the material).
          • For example, for the first question, I am asked to enter the version of R.
          • However, this course was developed a while ago.  So, tried to enter my version as 4.0.3, and I got an error message (which was true at the time, but it is no longer true).
          • I will report that through the formal method provided.  However, I thought having the discussion groups helped with identifying problems in Coursera.  Unfortunately, a number of them were not corrected, but having feedback from others was helpful (and sometimes there was a official moderator that could acknowledge the problem, even if the instructor was no longer directly providing support for the class and/or correcting errors).
            • As follow-up, I think issues of things that need to be changed in the course material is OK.  However, problems with the interface should be reported to edX, and questions about the concepts should be asked on public forums (to be answered by others).
          • So, edX can provide this functionality, but the communication about asking questions references support outside of the course (StackOverflow, etc.).  This list overlooked Biostars, and there was some other misunderstanding on my part.  However, I will check out all 4 courses, and provide an overall assessment / recommendation (even if I don't like this particular decision).
          • Different versions of R affect functionality.  So, in this case, I also had R 3.6.3 (or, generally, 3.x.x), and I went back to use that (instead of the version of RStudio that I had installed.
        • I have used Swirl for at least 1 Coursera course before (with material specifically designed for that course).  However, I found it useful to learn that there are basic R tutorials already build into the base Swirl package.
        • One of the early exercise questions asks you to use a for loop, without having previously described how to do that.  If you are like me, then the course may be OK for professional certification.  However, if you are just starting R, I am not sure if is the best way to learn R (and the instructor did acknowledge that learning the basics of R can take time).
        • For the 2nd set of exercises, the part about "To create a vector with the numbers 3 to 7...," is more relevant for Question 6 than Question 5.  I found this confusing, but I eventually figured it out.
        • I don't plan to use it, but I did learn an explanation for something that I have noticed for a while
          • You can use "<-" or "=" to assign values to variables
          • I started using "<-" because that was in training materials
          • However, when I learned that I could do the same thing with just "=," I started saving characters (by using 1 that was easy to remember, instead of 2).
          • However, I have now seen an example where the function can be different, and there was the explanation that this was used for piping.
          • In other words, the following example was used to combine 3 lines of code to define the values for the variable "controls":
            • controls1 <- filter(dat, Diet=="chow") %>% select(Bodyweight) %>% unlist
          • I find this harder to read (instead of easier to read), so I would rather write out the 3 lines.  However, I did learn something new.
          • That said, the following also works: controls2 = filter(dat, Diet=="chow") %>% select(Bodyweight) %>% unlist
          • That said, in later exercises, it seems like I might need to import the dplyr package in order to use that piping function (at least with the other functions).
          • I learned about the Median Absolute Deviation (MAD), so that the median and MAD can be used as an alternative to mean and standard deviation SD for a collection of samples with outliers.  This can be calculated with the mad() function in R.
        • I learned about the ecdf() function
        • I am not sure how often I will use it, but I learned about the split() function (in the context of creating the input for an alternate way to use the boxplot() function)
        • The order of the boxplot exercises gives a clue, but I think using some extra words in the 3rd exercise to explain that the topic has returned to the data from the 1st exercise would be helpful (and explicitly saying that analysis of data from the 2nd exercise may help).  Again, helps me learn more, but this makes me question if this is the best course for a complete beginner.
        • I think the background for Week 2 was relatively better (perhaps more complete information for a beginner).  However, it did take me a little while to figure out what was being asked for Exercise 2 in the "CLT and t-distribution in Practice Exercises" subsection of the timeline, and figure out what I needed to modify for my code.  Sometimes you have 2 chances, and sometimes you have 5 chances to get the answer right.  In this case, I used the logic of what was being asked to figure out the right answer (with only 2 chances), and then I used the provided solution to go back and figure out how to modify my code and understand precisely what the question was asking.
      • Introduction to Linear Models and Matrix Algebra (grade of 100%)
        • I learned about using the solve() function to calculate the inverse matrix in R
        • I learned about the crossprod() function (faster version of t(x) %*% y) and tcrosspod() function (faster version of x %*% t(y)).  This is discussed in the context of using linear algebra to solve for the Residual Sum of Squares (RSS).
        • Standard errors for model coefficients is also described, with content similar to presented here.
        • I think this is the course where we first see lectures from Michael Love.
        • There is a page with content from those lectures here.  For example, I learned about the contrast() function and package from that lecture.
        • I also learned about the glht() function from the multcomp package.
        • While mostly familiar with the concept of confounded variables, I learned about that in the context of collinearity and the "rank" of a model matrix determined using qr().
        • There are some additional materials about QR decomposition and linear models/regression here.
        • I think there is also some additional useful / interesting information linked from the course to here.
        • During an early slide, I believe the limma package was referenced.  I can see how this contents relates to what is being done in limma.  However, I don't believe there were any lectures or exercises describing applications with that package (in this particular course).
      • Statistical Inference and Modeling for High-Throughput Experiments (grade of 96%)
        • This course covers topics like multiple testing (the concept as well as solutions), statistical models (binomial. Poisson, etc.), maximum likelihood estimates, parametric model fitting, Bayes' theorem, etc.
        • This is also the course where application of the limma package for differential expression is discussed, which I believe helps me understand the method better.
      • High-Dimensional Data Analysis (grade of 98%)
        • I learned how to use cmdscale() create an MDS plot (for 2 out of k dimensions) using R-base code, using a distance object such as that created from dist().
        • I am not listing all examples here, but there are a number of uncorrected typos.  Also, there is at least 1 question where you can't receive credit unless you provide answers that are wrong by any interpretation (Question 4 of the Week3 quiz, which is actually the 5th question because the  question numbers start with the 2nd question).
        • I think the course is useful in a number of ways, but I think the errors mentioned above need to be taken into consideration (especially if you are new to the material).
      • You can view my overall certificate here.

While I am not sure how such certification is perceived by others, Coursera has less expensive options.  For example, University of Chicago has a MasterTrack for "Machine Learning for Analytics" (for $4,000).

If I understand everything, then I believe I think an on-line degree in Analytics from Georgia Tech should be a little more than $10,000 ($275 per credit hour + fees).  I should thank the contact of the Data Science on-line Master's Degree from UC Riverside for having me take a second look at that program (even as a current California resident, I think the Georgia Tech program would cost less, but I think the requirements are different).  However, I think the most advanced analysis courses in a Bioinformatics program (and/or the "MicroMasters" on edX that I list above) may be a better fit for me than the on-line Analytics program (at least for Georgia Tech, where I have more direct experience with understanding the difficulty of the on-campus courses, which I thought were supposed to be the same through edX).

For others that are interested, I think the there is a Bioinformatics MS program at Indiana University - Purdue University Indianapolis, which I think is designed for those who may have other responsibilities (it is an on-campus program, even though there are night classes and/or an out-of-state scholarship waiver?).  Under "Plan of Study", you can see the "Course Schedule" for individual classes (where it looks like a lot of classes start at 6 PM).  While an interesting point of discussion, I think this may end up being somewhat costly for me as an individual (as with most out-of-state or private on-campus programs).

As an undergrad, I commuted from home for 3 out of the 4 years (except freshman year, to make friends in the dorm).  So, if you live near your family, it might be a little more embarrassing as an adult, but that could be one way to get an on-campus master degree at a lower total cost.  However, being a public school makes a difference (if you can qualify for in-state tuition).  As I mentioned in this post, I have partially completed coursework from the Bioinformatics MS degrees at Georgia Tech and University of Michigan (which I believe are a good reflection of what sort of courses I can handle).  However, limits on how much transfer credit can affect the time to get the degree may be something to keep in mind for any MS program.  For example, even if I received credit for some required courses, I would still need to take 37 hours of courses after being admitted and enrolling into the Bioinformatics MS program at Georgia Tech.

If there are other suggestions regarding MS degrees in Bioinformatics / Analytics / Data Science that can be earned for less than $20,000 (and/or you have concerns that the true costs may be higher than any of the estimates above), please feel free to comment below.  If I didn't have any previous experience, I think the on-campus degree programs may have an advantage.  However, if I have ~10 years of experience (and essentially the combined equivalent of an MS degree, along with an MA degree from Princeton), then I am not sure if an on-campus degree (or even necessarily an on-line MS degree) is needed for somebody at this stage.  So, I am guessing this might be a relevant price point for others as well.

However, I really do like the idea of on-line courses for continuing education for those that already have full time jobs that they want to keep (with some necessary professional growth).


Change Log:

12/4/2019 - public post (convert draft due to this Biostars discussion)
12/5/2019 - add links to Lynda / LinkedIn Learning communication classes
12/6/2019 - add another Lynda / LinkedIn Learning communication class
12/7/2019 - minor change to reflect that I have taken more than 2 Lynda / LinkedIn Learning communication classes
12/8/2019 - update number of "Practical Machine Learning" courses with higher accuracy as well as add one more link to a Lynda / LinkedIn Learning class
12/9/2019 - add one more link to a Lynda / LinkedIn Learning class for communication
12/10/2019 - add benefit for no limit to source of classes for Lynda / LinkedIn Learning
4/23/2020 - add links to edX (without primary experience)
4/24/2020 - update information about GT Bioinformatics MS
4/26/2020 - add additional links; re-arrange some information
4/29/2020 - note that I have started to take some epidemiology courses
5/1/2020 - add epidemiology notes
5/3/2020 - add epidemiology notes
5/6/2020 - add epidemiology notes
5/7/2020 - add epidemiology notes
5/8/2020 - add epidemiology notes + start UCSD Bioinformatics notes
5/11/2020 - minor change + add UCSD Bioinformatics notes
5/12/2020 - add UCSD Bioinformatics notes
5/13/2020 - add UCSD Bioinformatics notes (date may not be exactly correct?)
5/17/2020 - add UCSD Bioinformatics notes
5/19/2020 - add JHU Genomic Data Science notes
5/22/2020 - add JHU Genomic Data Science notes
5/24/2020 - add JHU Genomic Data Science notes
5/28/2020 - add JHU Genomic Data Science notes
5/29/2020 - add UMGC MicroMasters notes (even though it is being discontinued)
6/11/2020 - add JHU Genomic Data Science notes
6/13/2020 - add JHU Genomic Data Science + JHU Biostatistics in Public Health notes
6/14/2020 - add JHU Genomic Data Science notes
6/21/2020 - add JHU Genomic Data Science notes
6/24/2020 - add JHU Genomic Data Science notes
6/25/2020 - minor formatting changes
6/28/2020 - add JHU Genomic Data Science notes
7/3/2020 - add JHU Genomic Data Science notes
7/4/2020 - add JHU Genomic Data Science notes
7/31/2020 - add JHU Genomic Data Science notes
8/2/2020 - add JHU Genomic Data Science notes
8/3/2020 - add JHU Genomic Data Science notes
8/5/2020 - add JHU Genomic Data Science notes
8/6/2020 - add JHU Genomic Data Science notes
8/8/2020 - add JHU Genomic Data Science notes
8/14/2020 - add JHU Genomic Data Science notes
12/26/2020 - add JHU Genomic Data Science notes
12/27/2020 - add JHU Genomic Data Science notes
12/29/2020 - add JHU Genomic Data Science notes
12/30/2020 - add JHU Genomic Data Science notes
12/31/2020 - add JHU Genomic Data Science notes
1/1/2021 - add link to earlier Udemy course
1/2/2021 - add JHU Genomic Data Science notes
1/7/2021 - add JHU Genomic Data Science notes
1/25/2021 - add JHU Genomic Data Science notes
3/3/2021 - add JHU Genomic Data Science notes
3/7/2021 - add JHU Genomic Data Science certificates
3/31/2021 - add Udemy FileMaker course + minor changes
6/12/2021 - minor change and reformatting for Harvard edX Data Science for the Life Sciences
6/13/2021 - add Harvard edX Data Science for the Life Sciences notes
6/14/2021 - minor changes
6/16/2021 - add Harvard edX Data Science for the Life Sciences notes
6/19/2021 - minor change
6/20/2021 - add Harvard edX Data Science for the Life Sciences notes
6/23/2021 - add Harvard edX Data Science for the Life Sciences notes
6/26/2021 - add Harvard edX Data Science for the Life Sciences notes
6/27/2021 - add Harvard edX Data Science for the Life Sciences notes
6/28/2021 - add Harvard edX Data Science for the Life Sciences notes
10/8/2021 - add Harvard edX Data Science for the Life Sciences notes
10/9/2021 - add Harvard edX Data Science for the Life Sciences notes
10/10/2021 - add Harvard edX Data Science for the Life Sciences notes
10/11/2021 - add Harvard edX Data Science for the Life Sciences notes
10/12/2021 - add Harvard edX Data Science for the Life Sciences notes
10/14/2021 - add Harvard edX Data Science for the Life Sciences notes
10/17/2021 - add Harvard edX Data Science for the Life Sciences notes
10/19/2021 - add Harvard edX Data Science for the Life Sciences notes
10/23/2021 - add Harvard edX Data Science for the Life Sciences notes
10/24/2021 - add Harvard edX Data Science for the Life Sciences notes
11/25/2021 - add Harvard edX Data Science for the Life Sciences notes
11/26/2021 - add Harvard edX Data Science for the Life Sciences notes
11/27/2021 - add Harvard edX Data Science for the Life Sciences notes

No comments:

Post a Comment

 
Creative Commons License
Charles Warden's Science Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.