I’m very excited to announce the first release of {depigner}
to
CRAN!
Pigna [pìn’n’a] is the Italian word for pine cone. In jargon, it is used to identify a task which is boring, banal, annoying, painful, frustrating and maybe even with a not so beautiful or rewarding result, just like the obstinate act of trying to challenge yourself in extracting pine nuts from a pine cone, provided that, in the end, you will find at least one inside it.
The {depigner}
aims to provide some useful functions to be used to
solve small everyday problems of coding or analyzing data with R. The
hope is to provide solutions to that kind of small-little problems which
would be normally solved using quick-and-dirty (ugly and maybe even
wrong) patches.
Installation
You can install the released version from CRAN directly calling:
install.packages("depigner")
If you would like to be updated with the last development version available, you can install it from it’s source on GitHub by calling:
# install.packages("devtools")
devtools::install_github("CorradoLanera/depigner")
Next, you can attach it to your session as usual by:
library(depigner)
## Welcome to depigner: we are here to un-stress you!
Provided Tools
Tools Category | Function(s) | Aim |
---|---|---|
Harrell’s verse | tidy_summary() |
pander -ready data frame from Hmisc::summary() |
paired_test_continuous |
Paired test for continuous variable into Hmisc::summary |
|
paired_test_categorical |
Paired test for categorical variable into Hmisc::summary |
|
adjust_p() |
Adjusts P-values for multiplicity of tests at tidy_summary() |
|
summary_interact() |
data frame of OR for interaction from rms::lrm() |
|
htypes() |
Will be your variables continuous or categorical in Hmisc::describe() ? |
|
Statistical | ci2p() |
Get P-value form estimation and confidence interval |
Programming | pb_len() |
Quick set-up of a progress::progress_bar() progress bar |
install_pkg_set() |
Politely install set of packages (topic-related sets at ?pkg_sets ) |
|
Development | use_ui() |
Activate {usethis} user interface into your own package |
please_install() |
Politely ask the user to install a package | |
imported_from() |
List packages imported from a package (which has to be installed) | |
Telegram | start_bot_for_chat() |
Quick start of a {telegram.bot} Telegram’s bot |
send_to_telegram() |
Unified wrapper to send someRthing to a Telegram chat | |
errors_to_telegram() |
Divert all your error messages from the console to a Telegram chat | |
Why not?! | gdp() |
Do you have TOO much pignas in your back?! … try this out ;-) |
Harrell’s Verse Tools
Harrell’s packages {Hmisc}
and {rms}
are
amazing packages to do statistical analyses, especially for clinical
data and purposes. If you do not know them, you should.
They are that useful and vast that they gain the label of “Harrell’verse’”.
One of the functions I use more is its summary()
methods, especially
with the option method = "reverse"
enabled. It is incredibly useful to
get a “table one” like information.
Pigna
Often, my colleagues and I have faced to the problem of
using the data inside the output of the summary()
function, and I have
seen very many colorful patches to manage this issue.
{depigner}
: depigner has a tidy_summary()
pipable function that
produces in output a lovely table already adjusted to be processed by
{pander}
!
tidy_summary()
: produces a data frame from thesummary()
functions provided by{Hmisc}
(Harrell 2020) and{rms}
(Harrell, Jr. 2020) packages ready to bepander::pander()
ed (Daróczi and Tsegelskyi 2018).
Currently it is tested for method reverse only:
library(rms, quietly = TRUE)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
options(datadist = 'dd')
library(survival)
library(pander)
dd <- datadist(iris)
my_summary <- summary(Species ~., data = iris, method = "reverse")
tidy_summary(my_summary) %>%
pander()
setosa (N=50) | versicolor (N=50) | virginica (N=50) | |
---|---|---|---|
Sepal.Length | 4.800/5.000/5.200 | 5.600/5.900/6.300 | 6.225/6.500/6.900 |
Sepal.Width | 3.200/3.400/3.675 | 2.525/2.800/3.000 | 2.800/3.000/3.175 |
Petal.Length | 1.400/1.500/1.575 | 4.000/4.350/4.600 | 5.100/5.550/5.875 |
Petal.Width | 0.2/0.2/0.3 | 1.2/1.3/1.5 | 1.8/2.0/2.3 |
dd <- datadist(heart)
surv <- Surv(heart$start, heart$stop, heart$event)
f <- cph(surv ~ age + year + surgery, data = heart)
my_summary <- summary(f)
tidy_summary(my_summary) %>%
pander()
Diff. | HR | Lower 95% CI | Upper 95% CI | |
---|---|---|---|---|
age | 10.69 | 1.336 | 1.009 | 1.767 |
year | 3.374 | 0.6104 | 0.3831 | 0.9727 |
surgery | 1 | 0.5286 | 0.2574 | 1.085 |
Pigna
What if you have paired samples into the data you would like to
summary()
zed?
{depigner}
: provide the required object to pass to the summary()
function to perform a paired test, both for continuous and categorical
data with two or more groups.1
paired_test_*()
: Paired test for categorical/continuous variables to be used in thesummary()
of the{Hmisc}
(Harrell 2020) package:
data(Arthritis)
# categorical -------------------------
## two groups
summary(Treatment ~ Sex,
data = Arthritis,
method = "reverse",
test = TRUE,
catTest = paired_test_categorical
)
##
##
## Descriptive Statistics by Treatment
##
## +----------+--------------------+--------------------+------------------------------+
## | |Placebo |Treated | Test |
## | |(N=43) |(N=41) |Statistic |
## +----------+--------------------+--------------------+------------------------------+
## |Sex : Male| 26% (11)| 34% (14)|Chi-square=5.92 d.f.=1 P=0.015|
## +----------+--------------------+--------------------+------------------------------+
## more than two groups
summary(Improved ~ Sex,
data = Arthritis,
method = "reverse",
test = TRUE,
catTest = paired_test_categorical
)
##
##
## Descriptive Statistics by Improved
##
## +----------+-----------------+-----------------+-----------------+------------------------+
## | |None |Some |Marked | Test |
## | |(N=42) |(N=14) |(N=28) |Statistic |
## +----------+-----------------+-----------------+-----------------+------------------------+
## |Sex : Male| 40% (17)| 14% ( 2)| 21% ( 6)|chi2=1.71 d.f.=3 P=0.634|
## +----------+-----------------+-----------------+-----------------+------------------------+
# continuous --------------------------
## two groups
summary(Species ~.,
data = iris[iris$Species != "setosa",],
method = "reverse",
test = TRUE,
conTest = paired_test_continuous
)
##
##
## Descriptive Statistics by Species
##
## +------------+---------------------+---------------------+------------------------+
## | |versicolor |virginica | Test |
## | |(N=50) |(N=50) |Statistic |
## +------------+---------------------+---------------------+------------------------+
## |Sepal.Length| 5.600/5.900/6.300| 6.225/6.500/6.900| t=-5.28 d.f.=49 P<0.001|
## +------------+---------------------+---------------------+------------------------+
## |Sepal.Width | 2.525/2.800/3.000| 2.800/3.000/3.175| t=-3.08 d.f.=49 P=0.003|
## +------------+---------------------+---------------------+------------------------+
## |Petal.Length| 4.000/4.350/4.600| 5.100/5.550/5.875|t=-12.09 d.f.=49 P<0.001|
## +------------+---------------------+---------------------+------------------------+
## |Petal.Width | 1.2/1.3/1.5 | 1.8/2.0/2.3 |t=-14.69 d.f.=49 P<0.001|
## +------------+---------------------+---------------------+------------------------+
## more than two groups
summary(Species ~.,
data = iris,
method = "reverse",
test = TRUE,
conTest = paired_test_continuous
)
##
##
## Descriptive Statistics by Species
##
## +------------+--------------------+--------------------+--------------------+-----------------------+
## | |setosa |versicolor |virginica | Test |
## | |(N=50) |(N=50) |(N=50) |Statistic |
## +------------+--------------------+--------------------+--------------------+-----------------------+
## |Sepal.Length| 4.800/5.000/5.200| 5.600/5.900/6.300| 6.225/6.500/6.900| F=30.55 d.f.=2 P<0.001|
## +------------+--------------------+--------------------+--------------------+-----------------------+
## |Sepal.Width | 3.200/3.400/3.675| 2.525/2.800/3.000| 2.800/3.000/3.175| F=12.63 d.f.=2 P<0.001|
## +------------+--------------------+--------------------+--------------------+-----------------------+
## |Petal.Length| 1.400/1.500/1.575| 4.000/4.350/4.600| 5.100/5.550/5.875|F=322.89 d.f.=2 P<0.001|
## +------------+--------------------+--------------------+--------------------+-----------------------+
## |Petal.Width | 0.2/0.2/0.3 | 1.2/1.3/1.5 | 1.8/2.0/2.3 |F=234.21 d.f.=2 P<0.001|
## +------------+--------------------+--------------------+--------------------+-----------------------+
Pigna
How often were you asked to provide “all vs. all” checks and tests to
find that unique randomly sampled P-value, which is less than 0.05 by a
factor of \(10^{-15}\) order of magnitude?
{depigner}
: gives you a function to automatically adjust the resulting
p-values for multiplicity into the summary()
tables are also provided.
adjust_p()
: Adjust P-values of atidy_summary
objects:
my_summary <- summary(Species ~., data = iris,
method = "reverse",
test = TRUE)
tidy_summary(my_summary, prtest = "P") %>%
adjust_p() %>%
pander()
## ✓ P adjusted with BH method.
setosa (N=50) | versicolor (N=50) | virginica (N=50) | |
---|---|---|---|
Sepal.Length | 4.800/5.000/5.200 | 5.600/5.900/6.300 | 6.225/6.500/6.900 |
Sepal.Width | 3.200/3.400/3.675 | 2.525/2.800/3.000 | 2.800/3.000/3.175 |
Petal.Length | 1.400/1.500/1.575 | 4.000/4.350/4.600 | 5.100/5.550/5.875 |
Petal.Width | 0.2/0.2/0.3 | 1.2/1.3/1.5 | 1.8/2.0/2.3 |
P-value |
---|
<0.001 |
<=0.001 |
<=0.001 |
<=0.001 |
Producing a lovely, concise, printable table when interaction kicks in
into the model is the aim of the summary_interact()
function.
summary_interact()
: Produce a data frame of OR (with the corresponding CI95%) for the interactions between different combination of a continuous variable (for which it is possible to define the reference and the target values) and (every or a selection of levels of) a categorical one in a logistic model provided bylrm()
(from the{rms}
package (Harrell, Jr. 2020)):
summary_interact(lrm_mod, age, abo) %>%
pander()
Low | High | Diff. | Odds Ratio | Lower 95% CI | Upper 95% CI | |
---|---|---|---|---|---|---|
age - A | 43 | 58 | 15 | 1.002 | 0.557 | 1.802 |
age - B | 43 | 58 | 15 | 1.817 | 0.74 | 4.463 |
age - AB | 43 | 58 | 15 | 0.635 | 0.186 | 2.169 |
age - O | 43 | 58 | 15 | 0.645 | 0.352 | 1.182 |
summary_interact(lrm_mod, age, abo, p = TRUE) %>%
pander()
Low | High | Diff. | Odds Ratio | Lower 95% CI | Upper 95% CI | P-value | |
---|---|---|---|---|---|---|---|
age - A | 43 | 58 | 15 | 1.002 | 0.557 | 1.802 | 0.498 |
age - B | 43 | 58 | 15 | 1.817 | 0.74 | 4.463 | 0.137 |
age - AB | 43 | 58 | 15 | 0.635 | 0.186 | 2.169 | 0.728 |
age - O | 43 | 58 | 15 | 0.645 | 0.352 | 1.182 | 0.883 |
Pigna
Another super useful function provided by Hmisc
is describe()
, which has a lovely plot()
method for its results. I had faced some issue when I needed to use those plots programmatically because their outputs are not that clear or easy to infer before to see them. I.e., suppose you want to know which variable will be plotted, and in which plot they will be plotted (or even if a plot will be produced). Hence, working with {Hmisc}
, it is crucial to know which variables Harrell considers continuous or categorical. Among them, it is also mandatory to know which one he considers to have (good) enough information to have been plotted.
{depigner}
solve those problems with its family of htypes()
functions, which seems to have “reasonable” results… officially:
I only had time to glance at it; it looks reasonable. You can also just analyze the describe() source code :-)
— Frank Harrell (@f2harrell) June 5, 2020
htypes()
and friends: get/check types of variable with respect to the{Hmisc}
ecosystem (Harrell 2020).
htypes(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## "con" "none" "con" "con" "con" "con" "con" "cat" "cat" "none" "none"
desc <- Hmisc::describe(mtcars)
htypes(desc)
## mpg cyl disp hp drat wt qsec vs am gear carb
## "con" "none" "con" "con" "con" "con" "con" "cat" "cat" "none" "none"
htype(desc[[1]])
## [1] "con"
is_hcat(desc[[1]])
## [1] FALSE
is_hcon(desc[[1]])
## [1] TRUE
Statistical Tools
Pigna
Some people can and prefer to read more information into a (single) value than into two…
{depigner}
: have a wrapper to provide an answer to them.
ci2p()
: compute the p-value related with a provided confidence interval (assuming every type o necessary assumptions):
ci2p(1.125, 0.634, 1.999, log_transform = TRUE)
## [1] 0.367902
Programming Tools
Pigna
Progress bars provided by {progress}
are great, complete, and powerful. They are also quite easy to set up, but for effortless usage (which comes quite often) an even more easy-to-use wrapper could be useful.2
{depigner}
: helps you with this super simple wrapper, you have to know the length (i.e., number of steps) the progress should do overall, and that’s it!
pb_len()
: Progress bar of given length, wrapper from the{progress}
(Csárdi and FitzJohn 2019) package:
pb <- pb_len(100)
for (i in 1:100) {
Sys.sleep(0.1)
tick(pb, paste("i = ", i))
}
Pigna
How often have you (re)installed the same set of packages on (your or other) computers? You need to remember which packages you or they need, you can forget one of them, and next… you have to repeat it for the next computer!
{depigner}
: gives you a function and a set of predefined sets of packages which you can use to install all the packages you need in a single call. Moreover, it does it politely querying the use if they agree for the installation, and possibly for a general update.
install_pkg_set()
: Simple and polite wrapper to install sets of packages. Moreover,{depigner}
provides some sets already defined for common scenario in R (analyses, production, documenting, …). See them by call?pgk_sets
.
install_pkg_set() # this install the whole `?pkg_all`
install_pkg_set(pkg_stan)
?pkg_sets
Development Tools
Pigna
When developing a package, use the {usethis}
interface would be fantastic. Anyway, importing {usethis}
can be not enough: you still need to write the call for its functions explicitly, e.g., you could be face to the problem of writing something like
usethis::ui_stop('field {usethis::ui_field("foo")} must has value {usethis::ui_value(value)}, ...')
Which is not that nice, doesn’t it?
{depigner}
: help you, in a single call in the {usethis}
style to add to your NAMESPACE
all the {usethis}
’ UI permitting to you to go straight to something like
ui_stop('field {ui_field("foo")} must has value {ui_value(value)}, ...')
use_ui()
: Use{usethis}
’ user interface (Wickham and Bryan 2020) in your package
# in the initial setup steps of the development of a package
use_ui()
Pigna
When you need to write code that others should run, it is not kind to force the installation or the update of packages in their environments.
{depigner}
: thanks to a prompt given by He The Hadley during a fantastic workshop in NYC, gives you a function to politely ask others to install on their system the packages that you need.
please_install()
: This is a polite wrapper toinstall.packages()
inspired (= w/ very minimal modification) by a function Hadley showed us during a course.
a_pkg_i_miss <- setdiff(available.packages(), installed.packages())[[1]]
please_install(a_pkg_i_miss)
Pigna
When you install or attach packages, often (quite ever), other packages (i.e., dependencies) were installed/attached too. Whose are they? Do we need to install both those packages explicitly or one of them it is sufficient because the other will be automatically installed as a consequence?
{depigner}
: has a simple single function to provide you a quick answer to those questions.3
imported_from()
: If you would like to know which packages are imported by a package (eg to know which packages are required for its installation or either installed during it) you can use this function
imported_from("depigner")
## [1] "desc" "dplyr" "fs" "ggplot2" "Hmisc"
## [6] "magrittr" "progress" "purrr" "rlang" "rms"
## [11] "rprojroot" "stats" "stringr" "telegram.bot" "tibble"
## [16] "tidyr" "usethis" "utils"
Telegram Tools
Telegram… I love it!4
Pigna
Have you ever run some long, very long, super long, extra long, computation… on a server? How many times did you need to come back to your PC, or to your ssh
+ screen
session to check its state, to check the logs, to see if some errors evilly happened?
{depigner}
: provide you some super easy to use wrappers on some functionalities of {telegram.bot}
package which will permit you to be notified with custom messages, images, and even errors on your phone directly. You need only to define your bot (instruction in the help page), and you are ready to go!
- Wrappers to simple use of Telegram’s bots: wrappers from the
{telegram.bot}
package (Benedito 2019):
# Set up a Telegram bot. read `?start_bot_for_chat`
start_bot_for_chat()
# Send something to telegram
send_to_telegram("hello world")
library(ggplot2)
gg <- ggplot(mtcars, aes(x = mpg, y = hp, colour = cyl)) +
geom_point()
send_to_telegram(
"following an `mtcars` coloured plot",
parse_mode = "Markdown"
)
send_to_telegram(gg)
# Divert output errors to the telegram bot
errors_to_telegram()
Why Not?!
Pigna
SSometimes piñatas are too many to be solved, and you just have to give in to them.
{depigner}
: is near you.5
gdp()
: A wrapper to relax.
gdp(7)
Acknowledgements
The {depigner}
’s logo was lovely designed by
Elisa Sovrano.
References
Pay particular attention to the
?paired_test_*
documentation of interest because data should be provided in some specific way due to the internal management of records from{Hmisc}
’s functions.↩︎Note: currently, after erum2020 I am considering to switch to [
{progressr}
] https://github.com/HenrikBengtsson/progressr) package. However, for the moment, I still consider{progress}
and our{depigner}
wrapper as simple to use as powerful for most cases.↩︎Note: it reports only the packages imported and not the suggested ones.↩︎
Do not ask me why… It is wonderful! That’s it!↩︎
Why “gdp?”… no one knows it…↩︎