r_packages   46

tidygraph 1.1 – A tidy hope
I am very pleased to tell you that the next version of tidygraph (1.1) is now available on CRAN. This is not a bug-fix release, nor a change-it-all release, but rather a more-of-it-all release, and in this post I’m going to tell you all about it.

The idea of tidygraph
Before we enter the goldmine of new features that makes this release I’m going to talk a bit about my reasons for making tidygraph and what I want it to become. These ideas have been rummaging in my head for a while and has taken more form as I prepared for my RStudio::conf 2018 talk. They will probably be fleshed out even more in a (series of) blog post(s), or — dare I say — a book, but you’ll get the earliest version of it here…

Network analysis is daunting… Sure, have you spend the better part of your life working with it (I haven’t) it might seem common nature, but for most people it will be an area they enter late, unprepared, and already well-versed in the manners of rectangular data analysis. For many, the instinct will be to quickly produce a plot, which will often end up creating very little insight due to the curse of the hairball, and they will leave the world of network analysis with a sense of broken promises. While all of this sounds overtly melodramatic, I honestly feel that the tools we use to do network analysis can do better in guiding the user towards a meaningful network analysis workflow and I hope that tidygraph (and ggraph) will prove to be a decent attempt at that.

With tidygraph I set out to make it easier to get your data into a graph and perform common transformations on it, but the aim has expanded since its inception. The goal of tidygraph is to empower the user to formulate complex questions regarding relational data as simple steps, thus enabling them to retrieve insights directly from the data itself. The central idea this all boils down to is this: you don’t have to plot a network to understand it. While I absolutely love the field of network visualisation, it is in many ways overused in data science — especially when it comes to extracting knowledge from a network. Just as you don’t need a plot to tell you which car in a dataset is the fastest, you don’t need a plot to tell you which pair of friends are the closest. What you do need, instead of a plot, is a tool that allow you to formulate your question into a logic sequence of operations. For many people in the world of rectangular data, this tool is increasingly dplyr (and friends), and I do hope that tidygraph can take on the same role in the world of relational data.

This is not just about preparing your data for a plot — this is about answering questions.
statistics:networks  R_packages  R  tidyverse  visualization  data_management 
february 2018 by hallucigenia
mjskay/tidybayes: Bayesian analysis + tidy data + geoms (R package)
tidybayes is an R package that aims to make it easy to integrate popular Bayesian modelling methods into a tidy data + ggplot workflow.

Tidy data frames (one observation per row) are particularly convenient for use in a variety of R data manipulation and visualization packages. However, when using MCMC / Bayesian samplers like JAGS or Stan in R, we often have to translate this data into a form the sampler understands, and then after running the model, translate the resulting sample ...
statistics:bayesian  tidyverse  R  r_packages  visualization  to_try 
february 2018 by hallucigenia
Tidyverse: fs 1.0.0
fs provides a cross-platform, uniform interface to file system operations. fs uses libuv under the hood, which gives a rock solid cross-platform interface to the filesystem.
R  R_packages  tidyverse  computers  Programming  to_try 
january 2018 by hallucigenia
mgcViz: visual tools for GAMs
The mgcViz R package offers visual tools for Generalized Additive Models (GAMs). The visualizations provided by mgcViz differs from those implemented in mgcv, in that most of the plots are based on ggplot2's powerful layering system. This has been implemented by wrapping several ggplot2 layers and integrating them with computations specific to GAM models. Further, mgcViz uses binning and/or sub-sampling to produce plots that can scale to large datasets (n = O(10^7)), and offers a variety of new methods for visual model checking/selection.
R  R_packages  visualization  statistics:gams  to_try  ggplot2  github 
december 2017 by hallucigenia
officer: office documents from R
"The officer package lets R users manipulate Word (.docx) and PowerPoint (*.pptx) documents. In short, one can add images, tables and text into documents from R. An initial document can be provided, contents, styles and properties of the original document will then be available."
R  R_packages  to_try  word_processors  reproducible_research 
november 2017 by hallucigenia
strict R package
The goal of strict to make R behave a little more strictly, making base functions more likely to throw an error rather than returning potentially ambiguous results.

library(strict) forces you to confront potential problems now, instead of in the future. This has both pros and cons: often you can most easily fix a potential ambiguity when you're working on the code (rather than in six months time when you've forgotten how it works), but it also forces you to resolve ambiguities that might never occur with your code/data.
reproducible_research  Programming  Programming:best_practices  R  R_packages 
november 2017 by hallucigenia
Glue 1.2.0
glue 1.2.0 is now available on CRAN! glue is designed to make it easy to interpolate (“glue”) your data into strings. Compared to equivalents like paste() and sprintf() it is easier to write and less time consuming to maintain. It also has no non-base dependencies so is easy to include in packages.
R_packages  R  text_analysis  Programming  programming:strings 
november 2017 by hallucigenia
Introduction to Network Analysis with R
"Over a wide range of fields network analysis has become an increasingly popular tool for scholars to deal with the complexity of the interrelationships between actors of all sorts. The promise of network analysis is the placement of significance on the relationships between actors, rather than seeing actors as isolated entities. The emphasis on complexity, along with the creation of a variety of algorithms to measure various aspects of networks, makes network analysis a central tool for digital...
networks  r_function  r_packages  to_try  statistics:networks  visualization 
november 2017 by hallucigenia
Bemovi, software to extract BEhaviour and MOrphology from VIdeos
Bemovi is an R package that allows to extract abundance, behaviour and morphology of individual organisms from video sequences. The package relies on R - the statistical computing environment and ImageJ, as well as the ParticleTracker plug-in developed for ImageJ.

For a high level description of the package and its functions, as well as information to its application and validation see the following publication (or run citation(“bemovi”) in R):

Pennekamp, Frank, Nicolas Schtickzelle, and Owen L. Petchey. 2015. “Bemovi, Software for Extracting BEhaviour and MOrphology from VIdeos, illustrated with anlyses of microbes”, Ecology & Evolution, June 2015. DOI: 10.1002/ece3.1529

This web site provides accompanying information how to get started with bemovi, from installing the necessary dependencies, conducting analyses and processing the data, to measuring morphological and behavioural traits and predict species identities based on these traits.
movement_ecology  statistics:movement  video_analysis  image_processing  R  R_packages  imageJ 
october 2017 by hallucigenia
VAST: Spatio-temporal analysis of univariate or multivariate data, e.g., standardizing data for multiple species or stage

Is an R package for implementing a spatial delta-generalized linear mixed model (delta-GLMM) for multiple categories (species, size, or age classes) when standardizing survey or fishery-dependent data.
Builds upon a previous R package SpatialDeltaGLMM (public available here), and has unit-testing to automatically confirm that VAST and SpatialDeltaGLMM give identical results (to the 3rd decimal place for parameter estimates) for several varied real-world case-study examples
Has built in diagnostic functions and model-comparison tools
Is intended to improve analysis speed, replicability, peer-review, and interpretation of index standardization methods

This tool is designed to estimate spatial variation in density using spatially referenced data, with the goal of habitat associations (correlations among species and with habitat) and estimating total abundance for a target species in one or more years.
The model builds upon spatio-temporal delta-generalized linear mixed modelling techniques (Thorson Shelton Ward Skaug 2015 ICESJMS), which separately models the proportion of tows that catch at least one individual ("encounter probability") and catch rates for tows with at least one individual ("positive catch rates").
Submodels for encounter probability and positive catch rates by default incorporate variation in density among years (as a fixed effect), and can incorporate variation among sampling vessels (as a random effect, Thorson and Ward 2014) which may be correlated among categories (Thorson Fonner Haltuch Ono Winker In press).
Spatial and spatiotemporal variation are approximated as Gaussian Markov random fields (Thorson Skaug Kristensen Shelton Ward Harms Banante 2014 Ecology), which imply that correlations in spatial variation decay as a function of distance.
statistics:gams  statistics:time_series  statistics:fisheries  fisheries  fisheries:methods  statistics:bayesian  statistics:spatial  R_packages 
september 2017 by hallucigenia
What is this?
The inlabru R package is being developed as part of a research project entitled “Modelling spatial distribution and change from wildlife survey data”, which is funded by the UK Engineering and Physical Sciences Research Council, to develop and implement innovative methods to model spatial distribution and change from ecological survey data. It involves developing Integrated Nested Laplace Approximation (INLA) methods for fitting realistically complex spatial models to data obtained from surveys on which the probability of detecting population members is unknown.
The project is a collaborative effort between the Universities of St Andrews (David Borchers, Janine Illian, Steve Buckland and Joyce Yuan) and Edinburgh (Finn Lindgren and Fabian E. Bachl).
R_packages  R  statistics:point_processes  statistics:additive_models  statistics:spatial 
july 2017 by hallucigenia
Introducing tidygraph
I’m very pleased to announce that my new package tidygraph is now available on CRAN. As the name suggests, tidygraph is an entry into the tidyverse that provides a tidy framework for all things relational (networks/graphs, trees, etc.). tidygraph is a relatively big package in terms of exported functions (280 exported symbols) so all functions will not be covered in this release note. I will however provide an overview of all the areas that tidygraphtouches upon so you should have a pretty good grasp on what the package can do for you.
R_packages  r_hadley  networks  statistics:networks  visualization  to_try 
july 2017 by hallucigenia
smooth v2.0.0. What’s new
Good news, everyone! smooth package has recently received a major update. The version on CRAN is now v2.0.0. I thought that this is a big deal, so I decided to pause for a moment and explain what has happened, and why this new version is interesting.

First of all, there is a new function, ves(), that implements Vector Exponential Smoothing model. This model allows estimating several series together and capture possible interactions between them. It can be especially useful if you need to forecast several similar products and can assume that smoothing parameter or initial seasonal indices are similar across all the series. Let’s say, you want to produce forecasts for several SKUs of cofvefe. You may unite the data of their sales in a vector and use one and the same smoothing parameter across the series using the parameter persistence="group". However, if you think that sales of one type of cofvefe may influence the sales of the other one, you may take this into account and set persistence="dependent". You can also switch between "group" or "individual" initial values, initialSeason, transition and phi (damping parameter). Just keep in mind that vector models can be greedy in the number of parameters and in order to use them efficiently, you my need to have large samples.
smoothing_and_penalization  statistical_software  statistics:additive_models  statistics:time_series  R_packages 
july 2017 by hallucigenia
rprojroot: Finding files in project subdirectories
The rprojroot package solves a seemingly trivial but annoying problem that occurs sooner or later in any largish project: How to find files in subdirectories? Ideally, file paths are relative to the project root.
Unfortunately, we cannot always be sure about the current working directory: For instance, in RStudio it’s sometimes:
the project root (when running R scripts),
a subdirectory (when building vignettes),
again the project root (when executing chunks of a vignette).
## [1] "vignettes"
In some cases, it’s even outside the project root.
This vignette starts with a very brief summary that helps you get started, followed by a longer description of the features.
R_packages  R  project_management  productivity_tool  to_try  Programming  Programs_to_use 
may 2017 by hallucigenia
Index. odin 0.0.2
odin implements a high-level language for describing and implementing ordinary differential equations in R. It provides a “domain specific language” (DSL) which looks like R but is compiled directly to C. The actual solution of the differential equations is done with the deSolve package, giving access to the excellent Livermore solvers (lsoda, lsode, etc).
R  dynamics  dplyr  math:dynamical_systems  math_and_stats  to_try  library  r_packages 
december 2016 by hallucigenia

related tags

[delicious-do_not_delete]  c++  computers  data_management  databases  deducer  dplyr  dynamics  ecology  ecology:community  ecology:landscape  econometrics  fisheries  fisheries:methods  from  ggplot2  gis  github  graphics  ian_fwllows  image_processing  imagej  interactive_graphics  iplots  jgr  library  lme4  maps  math:dynamical_systems  math_and_stats  mixed_models  model_testing  movement_ecology  networks  nonparamatric_statistics  numeric_methods  ogmap  optimization  parallel_computing  plotting  productivity_tool  programming  programming:best_practices  programming:strings  programs_to_use  project_management  r  r_bloggers  r_function  r_gui  r_hadley  r_package  rcpp  regression  reproducible_research  rjava  rstats  rstudio  simulation  smoothing_and_penalization  software_engineering  spatial_ecology  sql  statistical_computing  statistical_software  statistics  statistics:additive_models  statistics:bayesian  statistics:distributions  statistics:ecological  statistics:fisheries  statistics:gams  statistics:hierarchical  statistics:movement  statistics:multivariate  statistics:networks  statistics:point_processes  statistics:regression  statistics:spatial  statistics:time_series  teaching  teaching:coding  test_based_programming  text_analysis  tidyverse  time_series  to_try  tutorial  twitter  video_analysis  visualization  windows  word_processors 

Copy this bookmark: