As the direct author of the code you will have the most intimate knowledge of the code and project details. This means that your ability to communicate with others in your lab is critical to the smoothness and success of the project.
So much of data science is formatting and troubleshooting. See this guide for some handy tips for that. Bugs don’t always come in the form of actual error messages either. The worst bugs are often ones that are silent and have messed up your results without you knowing.
We are borrowing from the debugging guide from the Childhood Cancer Data Lab here:
This may seem like a silly thing to include as a tip, but it’s very easy to gloss over an error message without actually reading it. Often, R may be telling you exactly what is wrong, but if you don’t take the time to understand what the error message means, you will have trouble fixing the error. Error messages often refer to R terms (e.g.. “argument”, “directory”) so if you need a refresher on what some terms mean, we recommend going through one of the intro to R tutorials we recommend.
Secondly, realize that just because you don’t receive an error message, doesn’t mean that your code did what you intended it to. You also will need to carefully review your code (and your results) to try to find “silent” bugs (situations where R did exactly what you asked, but you didn’t get what you intended).
If you ran many lines of code, you may not know which part of your code is the origin of the error message. Isolating the source of the error and trying to better understand your problem should be your first course of action. The best way to determine this, is by running each line, and each phrase by itself, one at a time.
Chunk-out your code and test the individual bits of code. Do you have a lot of lines of code, a lot of arguments, or multiple functions at once? Try each piece by itself to narrow down what piece appears to be the origin of the problem.
It could be that the problem with your code isn’t that it doesn’t work as it is written, but that you didn’t run it or didn’t run it in the correct order. This should be one of the first things you check, while checking that the objects that you believe should be in your environment, are in your environment.
It’s also good practice to be periodically quitting your current R session and starting a new one, in addition to clearing your R notebook output. If you are encountering problems and haven’t refreshed your R session, you may want to do that before further troubleshooting.
In the course of troubleshooting, you will want to re-run all of your code, perhaps many many many times in order to get to the bottom of the problem.
The main advantage to Googling your errors is that you are likely not the first person to encounter the problem. Certain phrases and terms in the error message will yield more fruitful search results then others.
When you do Google, a few common sources that will probably come up that we recommend looking at are:
StackOverflow this is a forum where people post questions and problems they encounter in their code.
People also will post their problems to GitHub issues. Often these are more geared toward fixing problems with the package or software itself, but this is a way to potentially get direct help on an issue from the authors of the package you are using.
R-bloggers has examples of R code that you can use to figure out how to construct various analyses. This is a good resource for example code, although it’s format isn’t built for asking exact questions like StackOverflow.
Once you’ve better determined the origin of the problem, you should use whatever documentation is available to you regarding the problematic code. When using a new function from a package you are unfamiliar with, it’s worthwhile to skim through the documentation so you know how to properly use the functions. For base R functions, Tidyverse functions, and some Bioconductor packages, the documentation will give you a lot of the information you need. However, you will also likely find that not all documentation is thorough or clear.
As we discussed in intro_to_R notebook, objects have structures and types. Having input that doesn’t match the requirements that a function has can be a common source of errors. Pay special attention to what the documentation says about what kind of input and output the function is designed to use.
Here’s a screenshot from the help window in RStudio. Note that here we searched for the levels function. R documentation includes information about what the expected arguments are as well as examples of how to use a function. Note here that this documentation tells us that the input for x is probably a factor. search_bar
For Bioconductor package functions, look at their page on bioconductor.org The documentation on Bioconductor pages have information that can be valuable for troubleshooting. Vignettes can have good example workflows to get started with (can use the browseVignettes
function for RStudio to open them). In addition, every Bioconductor package has a PDF reference where all the functions and objects for that package are described. They can take some getting used to, but generally can have helpful information.
Because it’s unlikely your first attempt at Googling will lead you straight to an answer; this is something you should continue to try with different wordings. Through trial and error, and also Google algorithms learning about what you look for, your search results can eventually lead you to helpful examples and forums.
This should rarely be your first approach to solving a problem, since this approach is difficult and doesn’t always pay off. This approach will require a a bit more practice at reading code, so it may not be the most fruitful approach depending on the readability and complexity of the code.
After you’ve tediously mined the internet for solutions to your problem and still not resolved your problem, you can post your problem to the internet for help.
Asking for help is a skill in itself! You will be able to more successfully receive help from others if you can better communicate the problem from the get go. Read this article for more details. We will cover the basics here.
The best coding help requests include (paraphrased from the article by Jere Xu):
- A description of the problem— Explain what you are doing in the first place and what language and frameworks you are using. Give context to what you are trying to do!
- What code did you use to get this problem — Provide your exact code and data you used and detail exactly what you did in order to get to the problem
- Expected result — Describe what the intended result of your code is and maybe even show a real-world example (assuming that what you’re building isn’t completely new).
- Actual result - Copy and paste the exact error message you are getting or otherwise provide a screenshot of the problem you are seeing.
- Environment — What operating system and version are you running on? What package managers and libraries are you using?
See our list of recommended resources to get going!