**hallucigenia + r_packages**
40

Do more with R: drag-and-drop ggplot

4 days ago by hallucigenia

Drag-and-drop interface for ggplot2 in Rstudio. Definitely worth digging further in to!

ggplot2
R
R_packages
GUI
plotting
graphics
4 days ago by hallucigenia

tidygraph 1.1 – A tidy hope

february 2018 by hallucigenia

I am very pleased to tell you that the next version of tidygraph (1.1) is now available on CRAN. This is not a bug-fix release, nor a change-it-all release, but rather a more-of-it-all release, and in this post I’m going to tell you all about it.

The idea of tidygraph

Before we enter the goldmine of new features that makes this release I’m going to talk a bit about my reasons for making tidygraph and what I want it to become. These ideas have been rummaging in my head for a while and has taken more form as I prepared for my RStudio::conf 2018 talk. They will probably be fleshed out even more in a (series of) blog post(s), or — dare I say — a book, but you’ll get the earliest version of it here…

Network analysis is daunting… Sure, have you spend the better part of your life working with it (I haven’t) it might seem common nature, but for most people it will be an area they enter late, unprepared, and already well-versed in the manners of rectangular data analysis. For many, the instinct will be to quickly produce a plot, which will often end up creating very little insight due to the curse of the hairball, and they will leave the world of network analysis with a sense of broken promises. While all of this sounds overtly melodramatic, I honestly feel that the tools we use to do network analysis can do better in guiding the user towards a meaningful network analysis workflow and I hope that tidygraph (and ggraph) will prove to be a decent attempt at that.

With tidygraph I set out to make it easier to get your data into a graph and perform common transformations on it, but the aim has expanded since its inception. The goal of tidygraph is to empower the user to formulate complex questions regarding relational data as simple steps, thus enabling them to retrieve insights directly from the data itself. The central idea this all boils down to is this: you don’t have to plot a network to understand it. While I absolutely love the field of network visualisation, it is in many ways overused in data science — especially when it comes to extracting knowledge from a network. Just as you don’t need a plot to tell you which car in a dataset is the fastest, you don’t need a plot to tell you which pair of friends are the closest. What you do need, instead of a plot, is a tool that allow you to formulate your question into a logic sequence of operations. For many people in the world of rectangular data, this tool is increasingly dplyr (and friends), and I do hope that tidygraph can take on the same role in the world of relational data.

This is not just about preparing your data for a plot — this is about answering questions.

statistics:networks
R_packages
R
tidyverse
visualization
data_management
The idea of tidygraph

Before we enter the goldmine of new features that makes this release I’m going to talk a bit about my reasons for making tidygraph and what I want it to become. These ideas have been rummaging in my head for a while and has taken more form as I prepared for my RStudio::conf 2018 talk. They will probably be fleshed out even more in a (series of) blog post(s), or — dare I say — a book, but you’ll get the earliest version of it here…

Network analysis is daunting… Sure, have you spend the better part of your life working with it (I haven’t) it might seem common nature, but for most people it will be an area they enter late, unprepared, and already well-versed in the manners of rectangular data analysis. For many, the instinct will be to quickly produce a plot, which will often end up creating very little insight due to the curse of the hairball, and they will leave the world of network analysis with a sense of broken promises. While all of this sounds overtly melodramatic, I honestly feel that the tools we use to do network analysis can do better in guiding the user towards a meaningful network analysis workflow and I hope that tidygraph (and ggraph) will prove to be a decent attempt at that.

With tidygraph I set out to make it easier to get your data into a graph and perform common transformations on it, but the aim has expanded since its inception. The goal of tidygraph is to empower the user to formulate complex questions regarding relational data as simple steps, thus enabling them to retrieve insights directly from the data itself. The central idea this all boils down to is this: you don’t have to plot a network to understand it. While I absolutely love the field of network visualisation, it is in many ways overused in data science — especially when it comes to extracting knowledge from a network. Just as you don’t need a plot to tell you which car in a dataset is the fastest, you don’t need a plot to tell you which pair of friends are the closest. What you do need, instead of a plot, is a tool that allow you to formulate your question into a logic sequence of operations. For many people in the world of rectangular data, this tool is increasingly dplyr (and friends), and I do hope that tidygraph can take on the same role in the world of relational data.

This is not just about preparing your data for a plot — this is about answering questions.

february 2018 by hallucigenia

mjskay/tidybayes: Bayesian analysis + tidy data + geoms (R package)

statistics:bayesian
tidyverse
R
r_packages
visualization
to_try

february 2018 by hallucigenia

tidybayes is an R package that aims to make it easy to integrate popular Bayesian modelling methods into a tidy data + ggplot workflow.

Tidy data frames (one observation per row) are particularly convenient for use in a variety of R data manipulation and visualization packages. However, when using MCMC / Bayesian samplers like JAGS or Stan in R, we often have to translate this data into a form the sampler understands, and then after running the model, translate the resulting sample ...

february 2018 by hallucigenia

Tidyverse: fs 1.0.0

january 2018 by hallucigenia

fs provides a cross-platform, uniform interface to file system operations. fs uses libuv under the hood, which gives a rock solid cross-platform interface to the filesystem.

R
R_packages
tidyverse
computers
Programming
to_try
january 2018 by hallucigenia

mgcViz: visual tools for GAMs

december 2017 by hallucigenia

The mgcViz R package offers visual tools for Generalized Additive Models (GAMs). The visualizations provided by mgcViz differs from those implemented in mgcv, in that most of the plots are based on ggplot2's powerful layering system. This has been implemented by wrapping several ggplot2 layers and integrating them with computations specific to GAM models. Further, mgcViz uses binning and/or sub-sampling to produce plots that can scale to large datasets (n = O(10^7)), and offers a variety of new methods for visual model checking/selection.

R
R_packages
visualization
statistics:gams
to_try
ggplot2
github
december 2017 by hallucigenia

officer: office documents from R

november 2017 by hallucigenia

"The officer package lets R users manipulate Word (.docx) and PowerPoint (*.pptx) documents. In short, one can add images, tables and text into documents from R. An initial document can be provided, contents, styles and properties of the original document will then be available."

R
R_packages
to_try
word_processors
reproducible_research
november 2017 by hallucigenia

strict R package

november 2017 by hallucigenia

The goal of strict to make R behave a little more strictly, making base functions more likely to throw an error rather than returning potentially ambiguous results.

library(strict) forces you to confront potential problems now, instead of in the future. This has both pros and cons: often you can most easily fix a potential ambiguity when you're working on the code (rather than in six months time when you've forgotten how it works), but it also forces you to resolve ambiguities that might never occur with your code/data.

reproducible_research
Programming
Programming:best_practices
R
R_packages
library(strict) forces you to confront potential problems now, instead of in the future. This has both pros and cons: often you can most easily fix a potential ambiguity when you're working on the code (rather than in six months time when you've forgotten how it works), but it also forces you to resolve ambiguities that might never occur with your code/data.

november 2017 by hallucigenia

Glue 1.2.0

november 2017 by hallucigenia

glue 1.2.0 is now available on CRAN! glue is designed to make it easy to interpolate (“glue”) your data into strings. Compared to equivalents like paste() and sprintf() it is easier to write and less time consuming to maintain. It also has no non-base dependencies so is easy to include in packages.

R_packages
R
text_analysis
Programming
programming:strings
november 2017 by hallucigenia

Introduction to Network Analysis with R

november 2017 by hallucigenia

"Over a wide range of fields network analysis has become an increasingly popular tool for scholars to deal with the complexity of the interrelationships between actors of all sorts. The promise of network analysis is the placement of significance on the relationships between actors, rather than seeing actors as isolated entities. The emphasis on complexity, along with the creation of a variety of algorithms to measure various aspects of networks, makes network analysis a central tool for digital...

networks
r_function
r_packages
to_try
statistics:networks
visualization
november 2017 by hallucigenia

Bemovi, software to extract BEhaviour and MOrphology from VIdeos

october 2017 by hallucigenia

Bemovi is an R package that allows to extract abundance, behaviour and morphology of individual organisms from video sequences. The package relies on R - the statistical computing environment and ImageJ, as well as the ParticleTracker plug-in developed for ImageJ.

For a high level description of the package and its functions, as well as information to its application and validation see the following publication (or run citation(“bemovi”) in R):

Pennekamp, Frank, Nicolas Schtickzelle, and Owen L. Petchey. 2015. “Bemovi, Software for Extracting BEhaviour and MOrphology from VIdeos, illustrated with anlyses of microbes”, Ecology & Evolution, June 2015. DOI: 10.1002/ece3.1529

This web site provides accompanying information how to get started with bemovi, from installing the necessary dependencies, conducting analyses and processing the data, to measuring morphological and behavioural traits and predict species identities based on these traits.

movement_ecology
statistics:movement
video_analysis
image_processing
R
R_packages
imageJ
For a high level description of the package and its functions, as well as information to its application and validation see the following publication (or run citation(“bemovi”) in R):

Pennekamp, Frank, Nicolas Schtickzelle, and Owen L. Petchey. 2015. “Bemovi, Software for Extracting BEhaviour and MOrphology from VIdeos, illustrated with anlyses of microbes”, Ecology & Evolution, June 2015. DOI: 10.1002/ece3.1529

This web site provides accompanying information how to get started with bemovi, from installing the necessary dependencies, conducting analyses and processing the data, to measuring morphological and behavioural traits and predict species identities based on these traits.

october 2017 by hallucigenia

VAST: Spatio-temporal analysis of univariate or multivariate data, e.g., standardizing data for multiple species or stage

september 2017 by hallucigenia

VAST

Is an R package for implementing a spatial delta-generalized linear mixed model (delta-GLMM) for multiple categories (species, size, or age classes) when standardizing survey or fishery-dependent data.

Builds upon a previous R package SpatialDeltaGLMM (public available here), and has unit-testing to automatically confirm that VAST and SpatialDeltaGLMM give identical results (to the 3rd decimal place for parameter estimates) for several varied real-world case-study examples

Has built in diagnostic functions and model-comparison tools

Is intended to improve analysis speed, replicability, peer-review, and interpretation of index standardization methods

Background

This tool is designed to estimate spatial variation in density using spatially referenced data, with the goal of habitat associations (correlations among species and with habitat) and estimating total abundance for a target species in one or more years.

The model builds upon spatio-temporal delta-generalized linear mixed modelling techniques (Thorson Shelton Ward Skaug 2015 ICESJMS), which separately models the proportion of tows that catch at least one individual ("encounter probability") and catch rates for tows with at least one individual ("positive catch rates").

Submodels for encounter probability and positive catch rates by default incorporate variation in density among years (as a fixed effect), and can incorporate variation among sampling vessels (as a random effect, Thorson and Ward 2014) which may be correlated among categories (Thorson Fonner Haltuch Ono Winker In press).

Spatial and spatiotemporal variation are approximated as Gaussian Markov random fields (Thorson Skaug Kristensen Shelton Ward Harms Banante 2014 Ecology), which imply that correlations in spatial variation decay as a function of distance.

statistics:gams
statistics:time_series
statistics:fisheries
fisheries
fisheries:methods
statistics:bayesian
statistics:spatial
R_packages
Is an R package for implementing a spatial delta-generalized linear mixed model (delta-GLMM) for multiple categories (species, size, or age classes) when standardizing survey or fishery-dependent data.

Builds upon a previous R package SpatialDeltaGLMM (public available here), and has unit-testing to automatically confirm that VAST and SpatialDeltaGLMM give identical results (to the 3rd decimal place for parameter estimates) for several varied real-world case-study examples

Has built in diagnostic functions and model-comparison tools

Is intended to improve analysis speed, replicability, peer-review, and interpretation of index standardization methods

Background

This tool is designed to estimate spatial variation in density using spatially referenced data, with the goal of habitat associations (correlations among species and with habitat) and estimating total abundance for a target species in one or more years.

The model builds upon spatio-temporal delta-generalized linear mixed modelling techniques (Thorson Shelton Ward Skaug 2015 ICESJMS), which separately models the proportion of tows that catch at least one individual ("encounter probability") and catch rates for tows with at least one individual ("positive catch rates").

Submodels for encounter probability and positive catch rates by default incorporate variation in density among years (as a fixed effect), and can incorporate variation among sampling vessels (as a random effect, Thorson and Ward 2014) which may be correlated among categories (Thorson Fonner Haltuch Ono Winker In press).

Spatial and spatiotemporal variation are approximated as Gaussian Markov random fields (Thorson Skaug Kristensen Shelton Ward Harms Banante 2014 Ecology), which imply that correlations in spatial variation decay as a function of distance.

september 2017 by hallucigenia

inlabru

july 2017 by hallucigenia

What is this?

The inlabru R package is being developed as part of a research project entitled “Modelling spatial distribution and change from wildlife survey data”, which is funded by the UK Engineering and Physical Sciences Research Council, to develop and implement innovative methods to model spatial distribution and change from ecological survey data. It involves developing Integrated Nested Laplace Approximation (INLA) methods for fitting realistically complex spatial models to data obtained from surveys on which the probability of detecting population members is unknown.

The project is a collaborative effort between the Universities of St Andrews (David Borchers, Janine Illian, Steve Buckland and Joyce Yuan) and Edinburgh (Finn Lindgren and Fabian E. Bachl).

R_packages
R
statistics:point_processes
statistics:additive_models
statistics:spatial
The inlabru R package is being developed as part of a research project entitled “Modelling spatial distribution and change from wildlife survey data”, which is funded by the UK Engineering and Physical Sciences Research Council, to develop and implement innovative methods to model spatial distribution and change from ecological survey data. It involves developing Integrated Nested Laplace Approximation (INLA) methods for fitting realistically complex spatial models to data obtained from surveys on which the probability of detecting population members is unknown.

The project is a collaborative effort between the Universities of St Andrews (David Borchers, Janine Illian, Steve Buckland and Joyce Yuan) and Edinburgh (Finn Lindgren and Fabian E. Bachl).

july 2017 by hallucigenia

Introducing tidygraph

july 2017 by hallucigenia

I’m very pleased to announce that my new package tidygraph is now available on CRAN. As the name suggests, tidygraph is an entry into the tidyverse that provides a tidy framework for all things relational (networks/graphs, trees, etc.). tidygraph is a relatively big package in terms of exported functions (280 exported symbols) so all functions will not be covered in this release note. I will however provide an overview of all the areas that tidygraphtouches upon so you should have a pretty good grasp on what the package can do for you.

R_packages
r_hadley
networks
statistics:networks
visualization
to_try
july 2017 by hallucigenia

smooth v2.0.0. What’s new

july 2017 by hallucigenia

Good news, everyone! smooth package has recently received a major update. The version on CRAN is now v2.0.0. I thought that this is a big deal, so I decided to pause for a moment and explain what has happened, and why this new version is interesting.

First of all, there is a new function, ves(), that implements Vector Exponential Smoothing model. This model allows estimating several series together and capture possible interactions between them. It can be especially useful if you need to forecast several similar products and can assume that smoothing parameter or initial seasonal indices are similar across all the series. Let’s say, you want to produce forecasts for several SKUs of cofvefe. You may unite the data of their sales in a vector and use one and the same smoothing parameter across the series using the parameter persistence="group". However, if you think that sales of one type of cofvefe may influence the sales of the other one, you may take this into account and set persistence="dependent". You can also switch between "group" or "individual" initial values, initialSeason, transition and phi (damping parameter). Just keep in mind that vector models can be greedy in the number of parameters and in order to use them efficiently, you my need to have large samples.

smoothing_and_penalization
statistical_software
statistics:additive_models
statistics:time_series
R_packages
First of all, there is a new function, ves(), that implements Vector Exponential Smoothing model. This model allows estimating several series together and capture possible interactions between them. It can be especially useful if you need to forecast several similar products and can assume that smoothing parameter or initial seasonal indices are similar across all the series. Let’s say, you want to produce forecasts for several SKUs of cofvefe. You may unite the data of their sales in a vector and use one and the same smoothing parameter across the series using the parameter persistence="group". However, if you think that sales of one type of cofvefe may influence the sales of the other one, you may take this into account and set persistence="dependent". You can also switch between "group" or "individual" initial values, initialSeason, transition and phi (damping parameter). Just keep in mind that vector models can be greedy in the number of parameters and in order to use them efficiently, you my need to have large samples.

july 2017 by hallucigenia

rprojroot: Finding files in project subdirectories

may 2017 by hallucigenia

The rprojroot package solves a seemingly trivial but annoying problem that occurs sooner or later in any largish project: How to find files in subdirectories? Ideally, file paths are relative to the project root.

Unfortunately, we cannot always be sure about the current working directory: For instance, in RStudio it’s sometimes:

the project root (when running R scripts),

a subdirectory (when building vignettes),

again the project root (when executing chunks of a vignette).

basename(getwd())

## [1] "vignettes"

In some cases, it’s even outside the project root.

This vignette starts with a very brief summary that helps you get started, followed by a longer description of the features.

R_packages
R
project_management
productivity_tool
to_try
Programming
Programs_to_use
Unfortunately, we cannot always be sure about the current working directory: For instance, in RStudio it’s sometimes:

the project root (when running R scripts),

a subdirectory (when building vignettes),

again the project root (when executing chunks of a vignette).

basename(getwd())

## [1] "vignettes"

In some cases, it’s even outside the project root.

This vignette starts with a very brief summary that helps you get started, followed by a longer description of the features.

may 2017 by hallucigenia

Index. odin 0.0.2

R
dynamics
dplyr
math:dynamical_systems
math_and_stats
to_try
library
r_packages

december 2016 by hallucigenia

odin implements a high-level language for describing and implementing ordinary differential equations in R. It provides a “domain specific language” (DSL) which looks like R but is compiled directly to C. The actual solution of the differential equations is done with the deSolve package, giving access to the excellent Livermore solvers (lsoda, lsode, etc).

december 2016 by hallucigenia

lme4ord

february 2016 by hallucigenia

Extension to the lme4 package, allowing varying weights or correlation structures

lme4
statistics:multivariate
statistics:hierarchical
r_packages
r
statistics:ecological
february 2016 by hallucigenia

An Introduction to merTools

september 2015 by hallucigenia

Working with generalized linear mixed models (GLMM) and linear mixed models (LMM) has become increasingly easy with the advances in the lme4 package recently. As we have found ourselves using these models more and more within our work, we, the authors, have developed a set of tools for simplifying and speeding up common tasks for interacting with merMod objects from lme4. This package provides those tools.

r_packages
to_try
R
visualization
statistics:multivariate
statistics:hierarchical
lme4
september 2015 by hallucigenia

mapView: basic interactive viewing of spatial data in R

july 2015 by hallucigenia

"Working with spatial data in R I find myself quite often in the need to quickly visually check whether a certain analysis has produced reasonable results. There are two ways I usually do this. Either I:

(sp)plot the data in R and then toggle back and forth between the static plots (I use RStudio) or

save the data to the disk and then open in QGIS or similar to interactively examine the results.

Both these approaches are semi-optimal. Where option 1. is fine for a quick glance at a coarse patterns, it lacks the possibility to have a closer look into the results via zooming and paning. While option 2. provides the interactivity, the detour via the hard disk is annoying (at best), especially when fine-tuning and checking regularly.

I attended this years useR2015! conference in Aalborg (which was marvelous!) and attended the session on interactive graphics in R where Joe Cheng from RStudio presented the leaflet package. Leaflet is great but its rather geared towards manually setting up maps. What a GIS-like functionality would need is some default behaviour for different objects from the spatial universe.

This got me thinking and sparked my enthusiasm to write some wrapper functions for leaflet to provide at least very basic GIS-like interactive graphing capabilities that are directly accessible within RStudio (or the web browser, if you’re not using RStudio). So I sat down and wrote a function called mapView()."

Rstudio
R
GIS
R_packages
to_try
maps
visualization
(sp)plot the data in R and then toggle back and forth between the static plots (I use RStudio) or

save the data to the disk and then open in QGIS or similar to interactively examine the results.

Both these approaches are semi-optimal. Where option 1. is fine for a quick glance at a coarse patterns, it lacks the possibility to have a closer look into the results via zooming and paning. While option 2. provides the interactivity, the detour via the hard disk is annoying (at best), especially when fine-tuning and checking regularly.

I attended this years useR2015! conference in Aalborg (which was marvelous!) and attended the session on interactive graphics in R where Joe Cheng from RStudio presented the leaflet package. Leaflet is great but its rather geared towards manually setting up maps. What a GIS-like functionality would need is some default behaviour for different objects from the spatial universe.

This got me thinking and sparked my enthusiasm to write some wrapper functions for leaflet to provide at least very basic GIS-like interactive graphing capabilities that are directly accessible within RStudio (or the web browser, if you’re not using RStudio). So I sat down and wrote a function called mapView()."

july 2015 by hallucigenia

CRAN - Package spBayes

june 2015 by hallucigenia

Fits univariate and multivariate spatio-temporal models with Markov chain Monte Carlo (MCMC).

statistics:spatial
statistics:time_series
statistics:multivariate
R_packages
R
to_try
june 2015 by hallucigenia

hadley/readr · GitHub

databases
data_management
r_packages
R
library
to_try
r_hadley

march 2015 by hallucigenia

readr - Faster ways to read data

march 2015 by hallucigenia

CRAN - Package refund

r_packages
to_try
R
regression
statistics:multivariate
math_and_stats

january 2015 by hallucigenia

Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.

january 2015 by hallucigenia

MCSim - meta-community simulator project for R

december 2014 by hallucigenia

"The overall goal of a metacommunity simulation package for R is to create a framework where ecologists can run "experiments" in silico to see how changing the properties of a metacommunity can result in shifts in emergent biodiversity patterns in a system. We have had success applying this approach to understand zooplankton biodiversity patterns in built ponds at the Baltimore Ecosystem Study Long Term Ecological Research site. I will post a link to the paper when it is available. It should be ...

ecology:community
ecology:landscape
ecology
simulation
r_packages
december 2014 by hallucigenia

jimhester/lintr · GitHub

R
r_packages
library
programming
programs_to_use
software_engineering

october 2014 by hallucigenia

Put the file syntastic/lintr.vim in syntastic/syntax_checkers/r. If you are using pathogen this directory is ~/.vim/bundles/syntastic/syntax_checkers/r.

You will also need to add the following lines to your .vimrc.

let g:syntastic_enable_r_lintr_checker = 1

let g:syntastic_r_checkers = 1

Configuration

You can also configure what linters are used. e.g. using a different line

october 2014 by hallucigenia

dgrtwo/broom · GitHub

data_management
R
r_packages
to_try
model_testing

october 2014 by hallucigenia

broom - Convert statistical analysis objects from R into tidy format

october 2014 by hallucigenia

hadley/tidyr · GitHub

statistical_software
r_packages
R
data_management
statistical_computing

june 2014 by hallucigenia

tidyr - Easily tidy data with spread and gather functions.

june 2014 by hallucigenia

bundles : statistics

**related tags**

Copy this bookmark: