metricminer
metricminer
is an R package that helps you mine metrics
on common places on the web through the power of their APIs.
It also helps make the data in a format that is easily used for a
dashboard or other purposes. It will have an associated dashboard
template and tutorials to help you fully use the data you retrieve with
metricminer
(but these are still under development!)
You can read the metricminer package documentation here.
Apps supported
Currently metricminer
supports mining data from:
- Calendly
- GitHub
- Google Analytics
- Google Forms
- Slido export files stored on Googledrive
Data format options
metricminer
attempts to retrieve API data for you and
give you it to you in a format that is a tidy data.frame. this means
metricminer has to be opinionated about what metrics it returns so it
fits in a useful and human ready to read data frame.
If you find that the data returned is not what you need you have two options (these options can be pursued concurrently):
- You can set the
dataformat
argument to"raw"
to see the original, unedited JSON formatted data as it was returned from the API. Then you can personally look for the data that you want and extract it. - You can post a GitHub issue to explain why the metric missing from
the data frame formatted data should be included. And if possible and
reasonable, we can work on including that data in the next version of
metricminer
.
How to install
You can install metricminer from CRAN.
install.packages("metricminer")
If you want the development version (not advised) you can install
using the remotes
package to install from GitHub.
if (!("remotes" %in% installed.packages())) {
install.packages("remotes")
}
remotes::install_github("fhdsl/metricminer")
library(metricminer)
Basic Usage
To start, you need to authorize()
the package to access
your data. If you run authorize()
you will be asked which
app you’d like to authorize and whether you’d like to cache that auth
information. If you already know which app you’d like to authorize, like
google
for example, you can run
authorize("google")
.
Then follow the instructions on the upcoming screens and select the scopes you feel comfortable sharing (you generally just need read permissions for metricminer to be able to collect data).
authorize()
If you want to clear out authorizations and caches stored by
metricminer
you can run:
delete_creds()
GitHub
You can retrieve metrics from a repository on GitHub doing this:
authorize("github")
metrics <- get_github_repo_summary(repo = "fhdsl/metricminer")
authorize("github")
metrics <- get_github_repo_timecourse(repo = "fhdsl/metricminer")
Calendly
You can retrieve calendly events information using this type of workflow:
authorize("calendly")
user <- get_calendly_user()
events <- list_calendly_events(user = user$resource$uri)
Google Analytics
You can retrieve Google Analytics data for websites like this.
First you have to retrieve your account information after you’ve authorized.
authorize("google")
accounts <- get_ga_user()
Then you need to retrieve the properties (aka usually the websites you are tracking) underneath that account.
properties_list <- get_ga_properties(account_id = accounts$id[1])
Just need to shave off the properties/
bit from this
string.
property_id <- gsub("properties/", "", properties_list$properties$name[1])
Now we can collect some stats.
In Google Analytics metrics
are your basic numbers (how
many visits to your website, etc.).
metrics <- get_ga_stats(property_id, stats_type = "metrics")
Whereas dimensions
are more a list of events that have
happened. So here’s a list of people that have logged on.
dimensions <- get_ga_stats(property_id, stats_type = "dimensions")
Lastly, we have a third option of collecting link_clicks
and the links they have clicked. This is also known as a dimension
according to Google analytics, but often it isn’t compatible for us to
download link click data at the same time as other dimension data so in
metricminer
we collect them separately.
link_clicks <- get_ga_stats(property_id, stats_type = "link_clicks")
Google Forms
You can retrieve Google form information and responses like this:
authorize("google")
form_url <- "https://docs.google.com/forms/d/1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo/edit"
form_info <- get_google_form(form_url)
Slido
If you have used Slido for interactive slide sessions and collected
that info and exported it to your googledrive you can use
metricminer
to collect that data as well.
drive_id <- "https://drive.google.com/drive/folders/0AJb5Zemj0AAkUk9PVA"
slido_data <- get_slido_files(drive_id)
Youtube
If you have a channel and the URL is https://www.youtube.com/channel/a_bunch_of_letters_here
Then you can extract stats for the videos on that youtube channel using that URL.
authorize("google")
youtube_stats <- get_youtube_stats("a_bunch_of_letters_here")
Bulk Retrievals
Maybe you just want to retrieval it ALL. We have som wrapper functions that will attempt to do this for you. These functions are a bit more precarious/risky in that there may be reasons certain websites/repos/events/data may not be able to be collected. So collecting repositories one by one will allow you more insight into what is happening.
However, these bulk retrieval functions may help you if you want to grab ALL of your accounts data in one swoop. Just make sure to carefully look over and curate that data after it is attempted to be collected. You may find some retrievals are empty for potentially good reasons (for example if a google form has no responses to collect it will show up with “no responses” in the respective part of the list).
GitHub bulk
From GitHub you can attempt to collect repository metrics from all repositories from an account.
authorize("github")
all_repos_metrics <- get_multiple_repos_metrics(owner = "fhdsl")
If you want to do this by giving a list of specific repositories you want data from you can just provide a vector of those repository’s names like this:
repo_names <- c("fhdsl/metricminer", "jhudsl/OTTR_Template")
some_repos_metrics <- get_multiple_repos_metrics(repo_names = repo_names)
Google Analytics bulk
Similar to single website retrieval we need to authorize the package.
authorize("google")
accounts <- get_ga_user()
Then we can provide the account id to
get_multiple_ga_metrics
and it will attempt to grab all
stats for all website properties underneath the provided account.
account_stats_list <- get_multiple_ga_metrics(account_id = 209776907)
stats_list <- stats_list <- get_multiple_ga_metrics(property_ids = c(422671031, 422558989))
Google Forms bulk
As always, we need to authorize the app.
authorize("google")
We can retrieve a list of form ids using googledrive
R
package.
form_list <- googledrive::drive_find(
shared_drive = googledrive::as_id("0AJb5Zemj0AAkUk9PVA"),
type = "form")
Now we can provide this vector of form ids to
get_multiple_forms
multiple_forms <- get_multiple_forms(form_ids = form_list$id)
Non-interactive authorizing from secrets
If you’d like to authorize non-interactively (whether on GitHub
actions or locally) you can set your tokens using
Sys.setenv()
Setting Calendly auth from secret
You can go here to get an API key. You likely will have to login first.
Then you can store this by putting your API key in this type of command:
Sys.setenv(METRICMINER_CALENDLY = "Put calendly token here")
Now in your script if you run the following, you will have authorization to Calendly.
auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY"))
Setting GitHub auth from secret
Similar steps can be done for the GitHub personal access token.
First go here to get a GitHub PAT. You will likely have to login first.
Then you can run this command but put your GitHub PAT there.
Sys.setenv(METRICMINER_GITHUB_PAT = "Put GitHub PAT here")
Now in your script if you run the following, you will have authorization to GitHub.
# Authorize GitHub
auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT"))
Setting Google auth from secret
For Google you can authorize from secret by doing the normal
interactive way using authorize("google")
but storing the
result like this:
token <- authorize("google")
Then you can use this object to extract two secrets by printing them out like this:
token$credentials$access_token
token$credentials$refresh_token
Then you can set these in your environment doing the same steps as before:
Sys.setenv(METRICMINER_GOOGLE_ACCESS = "Google access token here")
Sys.setenv(METRICMINER_GOOGLE_REFRESH = "Google refresh token here")
Now in your script if you run the following you will have authorization to Google Apps.
# Authorize Google
auth_from_secret("google",
refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"),
access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"),
cache = TRUE
)
Authorizing on GitHub Actions
In GitHub you can run metricminer
using authorization if
you use the above steps to retrieve the necessary keys but then store
them each as GitHub Secrets.
Read here about how to store GitHub secrets
You’ll need the secrets to be stored as the respective key name we’ve referenced above:
METRICMINER_CALENDLY
METRICMINER_GITHUB_PAT
METRICMINER_GOOGLE_REFRESH
METRICMINER_GOOGLE_ACCESS
Then in your GitHub action yaml you’ll need something like this to extract and authorize these secrets in the environment.
- name: Authorize metricminer
env:
METRICMINER_CALENDLY: ${{ secrets.METRICMINER_CALENDLY }}
METRICMINER_GITHUB_PAT: ${{ secrets.METRICMINER_GITHUB_PAT }}
METRICMINER_GOOGLE_ACCESS: ${{ secrets.METRICMINER_GOOGLE_ACCESS }}
METRICMINER_GOOGLE_REFRESH: ${{ secrets.METRICMINER_GOOGLE_REFRESH }}
run: |
# Authorize Calendly
auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY"))
# Authorize GitHub
auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT"))
# Authorize Google
auth_from_secret("google",
refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"),
access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"),
cache = TRUE
)
### Now run the R commands you want here or call an R script in a later step.
shell: Rscript {0}
Session info
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 desc_1.4.3 R6_2.5.1 fastmap_1.2.0
#> [5] xfun_0.47 cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1
#> [9] rmarkdown_2.28 lifecycle_1.0.4 cli_3.6.3 pkgdown_2.1.1
#> [13] sass_0.4.9 textshaping_0.4.0 jquerylib_0.1.4 systemfonts_1.1.0
#> [17] compiler_4.4.1 tools_4.4.1 ragg_1.3.3 evaluate_1.0.0
#> [21] bslib_0.8.0 yaml_2.3.10 jsonlite_1.8.9 rlang_1.1.4
#> [25] fs_1.6.4