**r_packages**46

tidygraph 1.1 – A tidy hope

february 2018 by hallucigenia

I am very pleased to tell you that the next version of tidygraph (1.1) is now available on CRAN. This is not a bug-fix release, nor a change-it-all release, but rather a more-of-it-all release, and in this post I’m going to tell you all about it.

The idea of tidygraph

Before we enter the goldmine of new features that makes this release I’m going to talk a bit about my reasons for making tidygraph and what I want it to become. These ideas have been rummaging in my head for a while and has taken more form as I prepared for my RStudio::conf 2018 talk. They will probably be fleshed out even more in a (series of) blog post(s), or — dare I say — a book, but you’ll get the earliest version of it here…

Network analysis is daunting… Sure, have you spend the better part of your life working with it (I haven’t) it might seem common nature, but for most people it will be an area they enter late, unprepared, and already well-versed in the manners of rectangular data analysis. For many, the instinct will be to quickly produce a plot, which will often end up creating very little insight due to the curse of the hairball, and they will leave the world of network analysis with a sense of broken promises. While all of this sounds overtly melodramatic, I honestly feel that the tools we use to do network analysis can do better in guiding the user towards a meaningful network analysis workflow and I hope that tidygraph (and ggraph) will prove to be a decent attempt at that.

With tidygraph I set out to make it easier to get your data into a graph and perform common transformations on it, but the aim has expanded since its inception. The goal of tidygraph is to empower the user to formulate complex questions regarding relational data as simple steps, thus enabling them to retrieve insights directly from the data itself. The central idea this all boils down to is this: you don’t have to plot a network to understand it. While I absolutely love the field of network visualisation, it is in many ways overused in data science — especially when it comes to extracting knowledge from a network. Just as you don’t need a plot to tell you which car in a dataset is the fastest, you don’t need a plot to tell you which pair of friends are the closest. What you do need, instead of a plot, is a tool that allow you to formulate your question into a logic sequence of operations. For many people in the world of rectangular data, this tool is increasingly dplyr (and friends), and I do hope that tidygraph can take on the same role in the world of relational data.

This is not just about preparing your data for a plot — this is about answering questions.

statistics:networks
R_packages
R
tidyverse
visualization
data_management
The idea of tidygraph

Before we enter the goldmine of new features that makes this release I’m going to talk a bit about my reasons for making tidygraph and what I want it to become. These ideas have been rummaging in my head for a while and has taken more form as I prepared for my RStudio::conf 2018 talk. They will probably be fleshed out even more in a (series of) blog post(s), or — dare I say — a book, but you’ll get the earliest version of it here…

Network analysis is daunting… Sure, have you spend the better part of your life working with it (I haven’t) it might seem common nature, but for most people it will be an area they enter late, unprepared, and already well-versed in the manners of rectangular data analysis. For many, the instinct will be to quickly produce a plot, which will often end up creating very little insight due to the curse of the hairball, and they will leave the world of network analysis with a sense of broken promises. While all of this sounds overtly melodramatic, I honestly feel that the tools we use to do network analysis can do better in guiding the user towards a meaningful network analysis workflow and I hope that tidygraph (and ggraph) will prove to be a decent attempt at that.

With tidygraph I set out to make it easier to get your data into a graph and perform common transformations on it, but the aim has expanded since its inception. The goal of tidygraph is to empower the user to formulate complex questions regarding relational data as simple steps, thus enabling them to retrieve insights directly from the data itself. The central idea this all boils down to is this: you don’t have to plot a network to understand it. While I absolutely love the field of network visualisation, it is in many ways overused in data science — especially when it comes to extracting knowledge from a network. Just as you don’t need a plot to tell you which car in a dataset is the fastest, you don’t need a plot to tell you which pair of friends are the closest. What you do need, instead of a plot, is a tool that allow you to formulate your question into a logic sequence of operations. For many people in the world of rectangular data, this tool is increasingly dplyr (and friends), and I do hope that tidygraph can take on the same role in the world of relational data.

This is not just about preparing your data for a plot — this is about answering questions.

february 2018 by hallucigenia

mjskay/tidybayes: Bayesian analysis + tidy data + geoms (R package)

statistics:bayesian
tidyverse
R
r_packages
visualization
to_try

february 2018 by hallucigenia

tidybayes is an R package that aims to make it easy to integrate popular Bayesian modelling methods into a tidy data + ggplot workflow.

Tidy data frames (one observation per row) are particularly convenient for use in a variety of R data manipulation and visualization packages. However, when using MCMC / Bayesian samplers like JAGS or Stan in R, we often have to translate this data into a form the sampler understands, and then after running the model, translate the resulting sample ...

february 2018 by hallucigenia

Tidyverse: fs 1.0.0

january 2018 by hallucigenia

fs provides a cross-platform, uniform interface to file system operations. fs uses libuv under the hood, which gives a rock solid cross-platform interface to the filesystem.

R
R_packages
tidyverse
computers
Programming
to_try
january 2018 by hallucigenia

mgcViz: visual tools for GAMs

december 2017 by hallucigenia

The mgcViz R package offers visual tools for Generalized Additive Models (GAMs). The visualizations provided by mgcViz differs from those implemented in mgcv, in that most of the plots are based on ggplot2's powerful layering system. This has been implemented by wrapping several ggplot2 layers and integrating them with computations specific to GAM models. Further, mgcViz uses binning and/or sub-sampling to produce plots that can scale to large datasets (n = O(10^7)), and offers a variety of new methods for visual model checking/selection.

R
R_packages
visualization
statistics:gams
to_try
ggplot2
github
december 2017 by hallucigenia

officer: office documents from R

november 2017 by hallucigenia

"The officer package lets R users manipulate Word (.docx) and PowerPoint (*.pptx) documents. In short, one can add images, tables and text into documents from R. An initial document can be provided, contents, styles and properties of the original document will then be available."

R
R_packages
to_try
word_processors
reproducible_research
november 2017 by hallucigenia

strict R package

november 2017 by hallucigenia

The goal of strict to make R behave a little more strictly, making base functions more likely to throw an error rather than returning potentially ambiguous results.

library(strict) forces you to confront potential problems now, instead of in the future. This has both pros and cons: often you can most easily fix a potential ambiguity when you're working on the code (rather than in six months time when you've forgotten how it works), but it also forces you to resolve ambiguities that might never occur with your code/data.

reproducible_research
Programming
Programming:best_practices
R
R_packages
library(strict) forces you to confront potential problems now, instead of in the future. This has both pros and cons: often you can most easily fix a potential ambiguity when you're working on the code (rather than in six months time when you've forgotten how it works), but it also forces you to resolve ambiguities that might never occur with your code/data.

november 2017 by hallucigenia

Glue 1.2.0

november 2017 by hallucigenia

glue 1.2.0 is now available on CRAN! glue is designed to make it easy to interpolate (“glue”) your data into strings. Compared to equivalents like paste() and sprintf() it is easier to write and less time consuming to maintain. It also has no non-base dependencies so is easy to include in packages.

R_packages
R
text_analysis
Programming
programming:strings
november 2017 by hallucigenia

Introduction to Network Analysis with R

november 2017 by hallucigenia

"Over a wide range of fields network analysis has become an increasingly popular tool for scholars to deal with the complexity of the interrelationships between actors of all sorts. The promise of network analysis is the placement of significance on the relationships between actors, rather than seeing actors as isolated entities. The emphasis on complexity, along with the creation of a variety of algorithms to measure various aspects of networks, makes network analysis a central tool for digital...

networks
r_function
r_packages
to_try
statistics:networks
visualization
november 2017 by hallucigenia

Bemovi, software to extract BEhaviour and MOrphology from VIdeos

october 2017 by hallucigenia

Bemovi is an R package that allows to extract abundance, behaviour and morphology of individual organisms from video sequences. The package relies on R - the statistical computing environment and ImageJ, as well as the ParticleTracker plug-in developed for ImageJ.

For a high level description of the package and its functions, as well as information to its application and validation see the following publication (or run citation(“bemovi”) in R):

Pennekamp, Frank, Nicolas Schtickzelle, and Owen L. Petchey. 2015. “Bemovi, Software for Extracting BEhaviour and MOrphology from VIdeos, illustrated with anlyses of microbes”, Ecology & Evolution, June 2015. DOI: 10.1002/ece3.1529

This web site provides accompanying information how to get started with bemovi, from installing the necessary dependencies, conducting analyses and processing the data, to measuring morphological and behavioural traits and predict species identities based on these traits.

movement_ecology
statistics:movement
video_analysis
image_processing
R
R_packages
imageJ
For a high level description of the package and its functions, as well as information to its application and validation see the following publication (or run citation(“bemovi”) in R):

Pennekamp, Frank, Nicolas Schtickzelle, and Owen L. Petchey. 2015. “Bemovi, Software for Extracting BEhaviour and MOrphology from VIdeos, illustrated with anlyses of microbes”, Ecology & Evolution, June 2015. DOI: 10.1002/ece3.1529

This web site provides accompanying information how to get started with bemovi, from installing the necessary dependencies, conducting analyses and processing the data, to measuring morphological and behavioural traits and predict species identities based on these traits.

october 2017 by hallucigenia

VAST: Spatio-temporal analysis of univariate or multivariate data, e.g., standardizing data for multiple species or stage

september 2017 by hallucigenia

VAST

Is an R package for implementing a spatial delta-generalized linear mixed model (delta-GLMM) for multiple categories (species, size, or age classes) when standardizing survey or fishery-dependent data.

Builds upon a previous R package SpatialDeltaGLMM (public available here), and has unit-testing to automatically confirm that VAST and SpatialDeltaGLMM give identical results (to the 3rd decimal place for parameter estimates) for several varied real-world case-study examples

Has built in diagnostic functions and model-comparison tools

Is intended to improve analysis speed, replicability, peer-review, and interpretation of index standardization methods

Background

This tool is designed to estimate spatial variation in density using spatially referenced data, with the goal of habitat associations (correlations among species and with habitat) and estimating total abundance for a target species in one or more years.

The model builds upon spatio-temporal delta-generalized linear mixed modelling techniques (Thorson Shelton Ward Skaug 2015 ICESJMS), which separately models the proportion of tows that catch at least one individual ("encounter probability") and catch rates for tows with at least one individual ("positive catch rates").

Submodels for encounter probability and positive catch rates by default incorporate variation in density among years (as a fixed effect), and can incorporate variation among sampling vessels (as a random effect, Thorson and Ward 2014) which may be correlated among categories (Thorson Fonner Haltuch Ono Winker In press).

Spatial and spatiotemporal variation are approximated as Gaussian Markov random fields (Thorson Skaug Kristensen Shelton Ward Harms Banante 2014 Ecology), which imply that correlations in spatial variation decay as a function of distance.

statistics:gams
statistics:time_series
statistics:fisheries
fisheries
fisheries:methods
statistics:bayesian
statistics:spatial
R_packages
Is an R package for implementing a spatial delta-generalized linear mixed model (delta-GLMM) for multiple categories (species, size, or age classes) when standardizing survey or fishery-dependent data.

Builds upon a previous R package SpatialDeltaGLMM (public available here), and has unit-testing to automatically confirm that VAST and SpatialDeltaGLMM give identical results (to the 3rd decimal place for parameter estimates) for several varied real-world case-study examples

Has built in diagnostic functions and model-comparison tools

Is intended to improve analysis speed, replicability, peer-review, and interpretation of index standardization methods

Background

This tool is designed to estimate spatial variation in density using spatially referenced data, with the goal of habitat associations (correlations among species and with habitat) and estimating total abundance for a target species in one or more years.

The model builds upon spatio-temporal delta-generalized linear mixed modelling techniques (Thorson Shelton Ward Skaug 2015 ICESJMS), which separately models the proportion of tows that catch at least one individual ("encounter probability") and catch rates for tows with at least one individual ("positive catch rates").

Submodels for encounter probability and positive catch rates by default incorporate variation in density among years (as a fixed effect), and can incorporate variation among sampling vessels (as a random effect, Thorson and Ward 2014) which may be correlated among categories (Thorson Fonner Haltuch Ono Winker In press).

Spatial and spatiotemporal variation are approximated as Gaussian Markov random fields (Thorson Skaug Kristensen Shelton Ward Harms Banante 2014 Ecology), which imply that correlations in spatial variation decay as a function of distance.

september 2017 by hallucigenia

inlabru

july 2017 by hallucigenia

What is this?

The inlabru R package is being developed as part of a research project entitled “Modelling spatial distribution and change from wildlife survey data”, which is funded by the UK Engineering and Physical Sciences Research Council, to develop and implement innovative methods to model spatial distribution and change from ecological survey data. It involves developing Integrated Nested Laplace Approximation (INLA) methods for fitting realistically complex spatial models to data obtained from surveys on which the probability of detecting population members is unknown.

The project is a collaborative effort between the Universities of St Andrews (David Borchers, Janine Illian, Steve Buckland and Joyce Yuan) and Edinburgh (Finn Lindgren and Fabian E. Bachl).

R_packages
R
statistics:point_processes
statistics:additive_models
statistics:spatial
The inlabru R package is being developed as part of a research project entitled “Modelling spatial distribution and change from wildlife survey data”, which is funded by the UK Engineering and Physical Sciences Research Council, to develop and implement innovative methods to model spatial distribution and change from ecological survey data. It involves developing Integrated Nested Laplace Approximation (INLA) methods for fitting realistically complex spatial models to data obtained from surveys on which the probability of detecting population members is unknown.

The project is a collaborative effort between the Universities of St Andrews (David Borchers, Janine Illian, Steve Buckland and Joyce Yuan) and Edinburgh (Finn Lindgren and Fabian E. Bachl).

july 2017 by hallucigenia

Introducing tidygraph

july 2017 by hallucigenia

I’m very pleased to announce that my new package tidygraph is now available on CRAN. As the name suggests, tidygraph is an entry into the tidyverse that provides a tidy framework for all things relational (networks/graphs, trees, etc.). tidygraph is a relatively big package in terms of exported functions (280 exported symbols) so all functions will not be covered in this release note. I will however provide an overview of all the areas that tidygraphtouches upon so you should have a pretty good grasp on what the package can do for you.

R_packages
r_hadley
networks
statistics:networks
visualization
to_try
july 2017 by hallucigenia

smooth v2.0.0. What’s new

july 2017 by hallucigenia

Good news, everyone! smooth package has recently received a major update. The version on CRAN is now v2.0.0. I thought that this is a big deal, so I decided to pause for a moment and explain what has happened, and why this new version is interesting.

First of all, there is a new function, ves(), that implements Vector Exponential Smoothing model. This model allows estimating several series together and capture possible interactions between them. It can be especially useful if you need to forecast several similar products and can assume that smoothing parameter or initial seasonal indices are similar across all the series. Let’s say, you want to produce forecasts for several SKUs of cofvefe. You may unite the data of their sales in a vector and use one and the same smoothing parameter across the series using the parameter persistence="group". However, if you think that sales of one type of cofvefe may influence the sales of the other one, you may take this into account and set persistence="dependent". You can also switch between "group" or "individual" initial values, initialSeason, transition and phi (damping parameter). Just keep in mind that vector models can be greedy in the number of parameters and in order to use them efficiently, you my need to have large samples.

smoothing_and_penalization
statistical_software
statistics:additive_models
statistics:time_series
R_packages
First of all, there is a new function, ves(), that implements Vector Exponential Smoothing model. This model allows estimating several series together and capture possible interactions between them. It can be especially useful if you need to forecast several similar products and can assume that smoothing parameter or initial seasonal indices are similar across all the series. Let’s say, you want to produce forecasts for several SKUs of cofvefe. You may unite the data of their sales in a vector and use one and the same smoothing parameter across the series using the parameter persistence="group". However, if you think that sales of one type of cofvefe may influence the sales of the other one, you may take this into account and set persistence="dependent". You can also switch between "group" or "individual" initial values, initialSeason, transition and phi (damping parameter). Just keep in mind that vector models can be greedy in the number of parameters and in order to use them efficiently, you my need to have large samples.

july 2017 by hallucigenia

rprojroot: Finding files in project subdirectories

may 2017 by hallucigenia

The rprojroot package solves a seemingly trivial but annoying problem that occurs sooner or later in any largish project: How to find files in subdirectories? Ideally, file paths are relative to the project root.

Unfortunately, we cannot always be sure about the current working directory: For instance, in RStudio it’s sometimes:

the project root (when running R scripts),

a subdirectory (when building vignettes),

again the project root (when executing chunks of a vignette).

basename(getwd())

## [1] "vignettes"

In some cases, it’s even outside the project root.

This vignette starts with a very brief summary that helps you get started, followed by a longer description of the features.

R_packages
R
project_management
productivity_tool
to_try
Programming
Programs_to_use
Unfortunately, we cannot always be sure about the current working directory: For instance, in RStudio it’s sometimes:

the project root (when running R scripts),

a subdirectory (when building vignettes),

again the project root (when executing chunks of a vignette).

basename(getwd())

## [1] "vignettes"

In some cases, it’s even outside the project root.

This vignette starts with a very brief summary that helps you get started, followed by a longer description of the features.

may 2017 by hallucigenia

Index. odin 0.0.2

R
dynamics
dplyr
math:dynamical_systems
math_and_stats
to_try
library
r_packages

december 2016 by hallucigenia

odin implements a high-level language for describing and implementing ordinary differential equations in R. It provides a “domain specific language” (DSL) which looks like R but is compiled directly to C. The actual solution of the differential equations is done with the deSolve package, giving access to the excellent Livermore solvers (lsoda, lsode, etc).

december 2016 by hallucigenia

**related tags**

Copy this bookmark: