Skip to contents

metricminer

metricminer is an R package that helps you mine metrics on common places on the web through the power of their APIs.

It also helps make the data in a format that is easily used for a dashboard or other purposes. It will have an associated dashboard template and tutorials to help you fully use the data you retrieve with metricminer (but these are still under development!)

You can read the metricminer package documentation here.

Apps supported

Currently metricminer supports mining data from:

Data format options

metricminer attempts to retrieve API data for you and give you it to you in a format that is a tidy data.frame. this means metricminer has to be opinionated about what metrics it returns so it fits in a useful and human ready to read data frame.

If you find that the data returned is not what you need you have two options (these options can be pursued concurrently):

  1. You can set the dataformat argument to "raw" to see the original, unedited JSON formatted data as it was returned from the API. Then you can personally look for the data that you want and extract it.
  2. You can post a GitHub issue to explain why the metric missing from the data frame formatted data should be included. And if possible and reasonable, we can work on including that data in the next version of metricminer.

How to install

You can install metricminer from CRAN.

install.packages("metricminer")

If you want the development version (not advised) you can install using the remotes package to install from GitHub.

if (!("remotes" %in% installed.packages())) {
  install.packages("remotes")
}
remotes::install_github("fhdsl/metricminer")
library(metricminer)

Basic Usage

To start, you need to authorize() the package to access your data. If you run authorize() you will be asked which app you’d like to authorize and whether you’d like to cache that auth information. If you already know which app you’d like to authorize, like google for example, you can run authorize("google").

Then follow the instructions on the upcoming screens and select the scopes you feel comfortable sharing (you generally just need read permissions for metricminer to be able to collect data).

authorize()

If you want to clear out authorizations and caches stored by metricminer you can run:

delete_creds()

GitHub

You can retrieve metrics from a repository on GitHub doing this:

authorize("github")
metrics <- get_github_repo_summary(repo = "fhdsl/metricminer")
authorize("github")
metrics <- get_github_repo_timecourse(repo = "fhdsl/metricminer")

Calendly

You can retrieve calendly events information using this type of workflow:

authorize("calendly")
user <- get_calendly_user()
events <- list_calendly_events(user = user$resource$uri)

Google Analytics

You can retrieve Google Analytics data for websites like this.

First you have to retrieve your account information after you’ve authorized.

authorize("google")
accounts <- get_ga_user()

Then you need to retrieve the properties (aka usually the websites you are tracking) underneath that account.

properties_list <- get_ga_properties(account_id = accounts$id[1])

Just need to shave off the properties/ bit from this string.

property_id <- gsub("properties/", "", properties_list$properties$name[1])

Now we can collect some stats.

In Google Analytics metrics are your basic numbers (how many visits to your website, etc.).

metrics <- get_ga_stats(property_id, stats_type = "metrics")

Whereas dimensions are more a list of events that have happened. So here’s a list of people that have logged on.

dimensions <- get_ga_stats(property_id, stats_type = "dimensions")

Lastly, we have a third option of collecting link_clicks and the links they have clicked. This is also known as a dimension according to Google analytics, but often it isn’t compatible for us to download link click data at the same time as other dimension data so in metricminer we collect them separately.

link_clicks <- get_ga_stats(property_id, stats_type = "link_clicks")

Google Forms

You can retrieve Google form information and responses like this:

authorize("google")
form_url <- "https://docs.google.com/forms/d/1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo/edit"
form_info <- get_google_form(form_url)

Slido

If you have used Slido for interactive slide sessions and collected that info and exported it to your googledrive you can use metricminer to collect that data as well.

drive_id <- "https://drive.google.com/drive/folders/0AJb5Zemj0AAkUk9PVA"
slido_data <- get_slido_files(drive_id)

Youtube

If you have a channel and the URL is https://www.youtube.com/channel/a_bunch_of_letters_here

Then you can extract stats for the videos on that youtube channel using that URL.

authorize("google")
youtube_stats <- get_youtube_stats("a_bunch_of_letters_here")

Bulk Retrievals

Maybe you just want to retrieval it ALL. We have som wrapper functions that will attempt to do this for you. These functions are a bit more precarious/risky in that there may be reasons certain websites/repos/events/data may not be able to be collected. So collecting repositories one by one will allow you more insight into what is happening.

However, these bulk retrieval functions may help you if you want to grab ALL of your accounts data in one swoop. Just make sure to carefully look over and curate that data after it is attempted to be collected. You may find some retrievals are empty for potentially good reasons (for example if a google form has no responses to collect it will show up with “no responses” in the respective part of the list).

GitHub bulk

From GitHub you can attempt to collect repository metrics from all repositories from an account.

authorize("github")
all_repos_metrics <- get_multiple_repos_metrics(owner = "fhdsl")

If you want to do this by giving a list of specific repositories you want data from you can just provide a vector of those repository’s names like this:

repo_names <- c("fhdsl/metricminer", "jhudsl/OTTR_Template")
some_repos_metrics <- get_multiple_repos_metrics(repo_names = repo_names)

Google Analytics bulk

Similar to single website retrieval we need to authorize the package.

authorize("google")
accounts <- get_ga_user()

Then we can provide the account id to get_multiple_ga_metrics and it will attempt to grab all stats for all website properties underneath the provided account.

account_stats_list <- get_multiple_ga_metrics(account_id = 209776907)
stats_list <- stats_list <- get_multiple_ga_metrics(property_ids = c(422671031, 422558989))

Google Forms bulk

As always, we need to authorize the app.

authorize("google")

We can retrieve a list of form ids using googledrive R package.

form_list <- googledrive::drive_find(
  shared_drive = googledrive::as_id("0AJb5Zemj0AAkUk9PVA"),
  type = "form")

Now we can provide this vector of form ids to get_multiple_forms

multiple_forms <- get_multiple_forms(form_ids = form_list$id)

Non-interactive authorizing from secrets

If you’d like to authorize non-interactively (whether on GitHub actions or locally) you can set your tokens using Sys.setenv()

Setting Calendly auth from secret

You can go here to get an API key. You likely will have to login first.

Then you can store this by putting your API key in this type of command:

Sys.setenv(METRICMINER_CALENDLY = "Put calendly token here")

Now in your script if you run the following, you will have authorization to Calendly.

auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY"))

Setting GitHub auth from secret

Similar steps can be done for the GitHub personal access token.

First go here to get a GitHub PAT. You will likely have to login first.

Then you can run this command but put your GitHub PAT there.

Sys.setenv(METRICMINER_GITHUB_PAT = "Put GitHub PAT here")

Now in your script if you run the following, you will have authorization to GitHub.

# Authorize GitHub
auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT"))

Setting Google auth from secret

For Google you can authorize from secret by doing the normal interactive way using authorize("google") but storing the result like this:

token <- authorize("google")

Then you can use this object to extract two secrets by printing them out like this:

token$credentials$access_token token$credentials$refresh_token

Then you can set these in your environment doing the same steps as before:

Sys.setenv(METRICMINER_GOOGLE_ACCESS = "Google access token here")

Sys.setenv(METRICMINER_GOOGLE_REFRESH = "Google refresh token here")

Now in your script if you run the following you will have authorization to Google Apps.

# Authorize Google
auth_from_secret("google",
                 refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"),
                 access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"),
                 cache = TRUE
)

Authorizing on GitHub Actions

In GitHub you can run metricminer using authorization if you use the above steps to retrieve the necessary keys but then store them each as GitHub Secrets.

Read here about how to store GitHub secrets

You’ll need the secrets to be stored as the respective key name we’ve referenced above:

METRICMINER_CALENDLY
METRICMINER_GITHUB_PAT
METRICMINER_GOOGLE_REFRESH
METRICMINER_GOOGLE_ACCESS

Then in your GitHub action yaml you’ll need something like this to extract and authorize these secrets in the environment.

      - name: Authorize metricminer
        env:
          METRICMINER_CALENDLY: ${{ secrets.METRICMINER_CALENDLY }}
          METRICMINER_GITHUB_PAT: ${{ secrets.METRICMINER_GITHUB_PAT }}
          METRICMINER_GOOGLE_ACCESS: ${{ secrets.METRICMINER_GOOGLE_ACCESS }}
          METRICMINER_GOOGLE_REFRESH: ${{ secrets.METRICMINER_GOOGLE_REFRESH }}
        run: |
          # Authorize Calendly
          auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY"))

          # Authorize GitHub
          auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT"))

          # Authorize Google
          auth_from_secret("google",
                 refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"),
                 access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"),
                 cache = TRUE
          )

          ### Now run the R commands you want here or call an R script in a later step.
        shell: Rscript {0}

Session info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     desc_1.4.3        R6_2.5.1          fastmap_1.2.0    
#>  [5] xfun_0.49         cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1
#>  [9] rmarkdown_2.29    lifecycle_1.0.4   cli_3.6.3         pkgdown_2.1.1    
#> [13] sass_0.4.9        textshaping_0.4.0 jquerylib_0.1.4   systemfonts_1.1.0
#> [17] compiler_4.4.2    tools_4.4.2       ragg_1.3.3        evaluate_1.0.1   
#> [21] bslib_0.8.0       yaml_2.3.10       jsonlite_1.8.9    rlang_1.1.4      
#> [25] fs_1.6.5