R’s tryCatch function is a great tool that helps facilitate robust error handling. It lets you try to run a block of code and if an error occurs, the catch part of the function can be used to handle exceptions in a customized manner (as opposed to halting the entire script). I have personally been deploying this design pattern pretty regularly and there are two situations in which I’ve found tryCatch to be an especially handy tool in my toolbox:
As part of my current postdoctoral research I’ve built the R package coil, which is designed to aid users in DNA barcode data cleaning and analysis. The package is available now on CRAN or through my GitHub! Below I’ve included the package’s vignette, which explains how you can get coil up and running.
#downloading the package from CRAN: #install.packages('coil') library(coil) Abstract coil is an R package designed for the cleaning, contextualization and assessment of cytochrome c oxidase I DNA barcode data (COI-5P, or the five prime portion of COI).
Note: Here you will find the raw RMarkdown file for this post, in case you want to follow along and execute the code yourself!
Introduction R is considered to be a functional programming language. What this means is that the syntax and rules of the language are most effective when you write code built around the use of functions. Functions allow you to modularize code, thereby isolating different blocks in a way that makes your code more generalized, reuseable, readable and easier to debug.
Older readers of this post may remember the boot screen from Windows XP. This featured a load bar that was there to essentially give a user the message: “Hold on a minute, the computer is starting. Please chill out and don’t turn the machine off, that might cause some problems!” This load bar was a bit of a hack, as it didn’t increment with the progress of the boot… it just played a little animation over and over again to calm the user down.
When applying a function to a vector, list or dataframe column, your first instinct may be to iterate across the series of inputs. By doing this each value is touched one after the other and the outputs can be generated consecutively. An extremely useful feature of R is that functions can be vectorized. What is meant by this is that instead of the function being applied to each list member consecutively, it is applied to each member of the vector at the same time.
During a tutorial I gave for the University of Guelph R users group, we were going through how to generate summary stats & tidy dataframes from messy data sources. This involved working with text data, and the exercise called for us to process a series of sentences and answer 3 questions about each line:
Is the line dialogue? (presence of a quotation mark in the string) Is the line a question?
Below is a ipython notebook I wrote with the goal of exploring some genome metadata from NCBI. I set out to use ggplot to try and find some interesting ways to visualize the data. My favourite new plot type is the last one I create below, a half violin/half scatter plot to display the distribution of the data in the three categories. The pairing of the violin plot and the adjacent scatter plot allows a reader to see both the distribution curve via the violin and the scatter plot helps visualize ‘hotspots’ where large numbers of data points cluster.