sechilds + r   290

Radix for R Markdown
Radix combines the technical authoring features of Distill with R Markdown , enabling a fully reproducible workflow based on literate programming (Knuth 1984 ). Note that by default the Radix format does not display the code for chunks that are evaluated to produce output (knitr option echo = FALSE ). The echo = FALSE default reflects the fact that Radix articles are often used to present the results of analyses rather than the underlying code. Radix provides a set of tools to facilitate clear scientific communication that links richly with other sources of knowledge and insight. Davide Cervone and Volker Sorge created the MathJax library for rendering mathematical notation on the web.
R 
24 days ago by sechilds
The Roots of Quotation
But I have been taking some baby steps in that direction with Peter Seibel's excellent (free online) book Practical Common Lisp . It seems that when people talk about the birth of Lisp, they frequently cite a paper by John McCarthy called Recursive Functions of Symbolic Expressions and Their Computation by Machine Part 1 . As told by Graham, In McCathy's original paper in 1960, he laid out 7 primitive operators and then proved those 7 could be combined into a recursive evaluation system for arbitrary expressions of a list-based form. More exciting still is that quote can be described and understood in quite simple terms, owing to the simplicity and elegance of McCarthy's evaluation system. Tidy eval deserves our appreciation for doing something similar to what purrr did for functional programming: it's rounding off the jagged edges in the API and making metaprogramming in R much more stable and predictable.
R  programming  Lisp  @followup 
24 days ago by sechilds
[1809.02264] Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations
Abstract: Despite the large body of research on missing value distributions and
imputation, there is comparatively little literature on how to make it easy to
handle, explore, and impute missing values in data. The new methodology builds upon tidy data principles, with a goal to
integrating missing value handling as an integral part of data analysis
workflows. New data structures are defined along with new functions (verbs) to
perform common operations. Together these provide a cohesive framework for
handling, exploring, and imputing missing values. From: Nicholas Tierney [ view email ] [v1] Fri, 7 Sep 2018 01:01:38 GMT (4511kb,D)
R  data 
4 weeks ago by sechilds
Debug outputs with debugr
Whether or not a debug message is displayed can be made dependent on the evaluation of a criterion phrased as an R expression. The debug mode is activated and deactivated with debugr_switchOn() and debugr_switchOff() , respectively, which change the logical debugr.active value in the global options. The name of the function that is to be applied to our object z is provided in the argument funs as a string . Remove the upper and lower border of the dwatch() outputs by setting show.frame = FALSE . If you want stop the execution of your as soon as the crit criterion is fulfilled, use halt = TRUE .
R 
10 weeks ago by sechilds
Joachim Zuckarelli on Twitter: Try my new R 📦 debugr: Intelligently print variables, expressions, environments & messages to the R console during runtime. 'debugr' calls can even remain in your code, as debug info will only show up when debug mode
@jsugarelli: Try my new R 📦 debugr: Intelligently print variables, expressions, environments & messages to the R console during runtime. 'debugr' calls can even remain in your code, as debug info will only show up when debug mode is activated. More: https://t.co/du7LAL2buY #rstats #cran

https://cloud.r-project.org/web/packages/debugr/vignettes/debugr.html
R 
10 weeks ago by sechilds
The R Shiny packages you need for your web apps! - Enhance Data Science
Since modal windows are now included in Shiny, the main interest of the packages lies in the popover and tooltips. shinyTree is a wrapper for the jsTree library and introduce interactive tree inputs (for instance to explore folders and directories) with options to have a search field, reorderable nodes, and CSS customization. Possibility to insert raw HTML codes (for instance to add buttons ) and to modify your table style. Even if I tend to prefer DT for its interactivity, I found formattable easier to use for formatting and styling tables. The packages create concurrent sessions which will perform a predefined set of actions to test the ability of your application to sustain the load.
R  R:Shiny 
june 2018 by sechilds
GitHub - m-clark/visibly: Functions related to R visualizations
Failed to load latest commit information. Visibly is a handful of functions I use for color palettes, themes, etc. some ready-made palettes, e.g. based on R blue and Stan red a function to quickly and easily create palettes with using colortools::complementary colortools::adjacent etc. clean, web-friendly themes for ggplot2 and plotly a function to interact with colorgorical For some additional palettes for those fond of another time, you
might be interested in NineteenEightyR .
R  data:visualization 
june 2018 by sechilds
Thread by @dsquintana: "If you’re an academic you need a website so that people can easily find info about your research and publications. Here’s how to make your o […]" #Rstats
So why use blogdown? Sure, there are several free options available to start your own blog (e.g., Medium). However, you generally can’t list your publications or other information easily on these services. Also, who knows where these services will be in a few years?
There are also some great point-and-click services available (e.g., Squarespace). However, you need to pay about $10 a month for these services, and they’re generally not well suited for academic webpages.
Alternatively, R + blogdown is free and can integrate with the Hugo framework, which provides a ton of templates. It also uses Markdown, which is a straightforward markup language.
R 
may 2018 by sechilds
R: Substrings of a Character Vector
When extracting, if start is larger than the string length then "" is returned. For the extraction functions, x or text will be
converted to a character vector by as.character if it is not
already one. For substr , a character vector of the same length and with the
same attributes as x (after possible coercion). This will have names taken from x (if it has any
after coercion, repeated as needed), and other attributes copied from x if it is the longest of the arguments). That does not really work (you want to limit the width, not
the number of characters, so it would be better to use strtrim ), but at least make sure you use the default nchar(type = "c") .
R 
may 2018 by sechilds
R: Convert an R Object to a Character String
toString {base}R Documentation This is a helper function for format to produce a single
character string describing an R object. Optional arguments passed to or from methods. The default method first converts x to character and then
concatenates the elements separated by ", " . A character vector of length 1 is returned. [Package base version 3.6.0 Index ]
R 
may 2018 by sechilds
GitHub - psolymos/pbapply: Adding progress bar to '*apply' functions in R
A lightweight package that adds progress bar to vectorized R functions
( *apply ). The implementation can easily be added to functions where showing the progress is
useful (e.g. bootstrap). The type and style of the progress bar (with percentages or remaining time) can be set through options. Use the issue tracker to report a problem, or to suggest a new feature. Use a conditional statement in your code to fall back on a base function in case of pbapply not installed:
R  @followup 
march 2018 by sechilds
Tidy evaluation in 5 mins - YouTube
This is really useful -- I am still wrapping my head around this.
R 
february 2018 by sechilds
Basic care and feeding of data in R
Data frames package related variables neatly together, keeping them in sync vis-a-vis row order applying any filtering of observations uniformly. If you use homogenous structures, like matrices, for data analysis, you are likely to make the terrible mistake of spreading a dataset out over multiple, unlinked objects. The tidyverse offers a special case of R’s default data frame: the “tibble”, which is a nod to the actual class of these objects, tbl_df . It is a tibble (and also a regular data frame) and the tidyverse provides a nice print method that shows the most important stuff and doesn’t fill up your Console. This will provide a special type of data frame called a “tibble” that has nice default printing behavior, among other benefits.
R 
january 2018 by sechilds
Package Management for Reproducible R Code · R Views
Similarly, when beginning a new data science programming project, it is prudent to assess how much effort should be put into ensuring the code is reproducible. A simple Dockerfile like the following will copy the current project folder into the rstudio user’s home (within the container) and install the necessary dependencies using packrat . Note that doing more complex work typically involves a bit of foresight, familiarity with design conventions, and the creation of a custom Dockerfile . When it comes to the management of packages and other system dependencies, you will need to decide whether you want to spend more time setting up a reproducible environment, or if you want to start exploring immediately. Whether you are putting up a tent for the night or building a house that future generations will enjoy, there are plenty of tools to help you on your way and assist you if you ever need to change course.
R  @followup 
january 2018 by sechilds
Huxtable - an R package for writing LaTeX and HTML tables
Features include control over text styling,
number format, background color, borders, padding
and alignment. Tables can be manipulated with standard R subsetting or dplyr functions.Here are some quick examples: To learn more, check out the vignette in HTML or PDF format, or the original R Markdown . Or,
read the design principles behind huxtable, including a comparison
with other R packages to create tables. There are also new quick_pdf , quick_html and quick_docx functions, for quick output of data frames or similar objects in different formats.
R  @followup 
january 2018 by sechilds
Learn to Write Command Line Utilities in R - part 5 - blog.sellorm.com
We moved away from random assignment and implemented a simple binning system for our input names. While we fixed a major problem with our utility, we didn’t really learn much about writing command line tools, so in this post, we’ll look at implementing some debug logging to let us know what’s happening inside our app while it’s running. The following code will set a variable, app_debug , to either TRUE or FALSE depending on the presence of the word ‘debug’ in the args[2] position. This is great to have when you’re developing and testing utilities, but also useful for debugging issues in production, in the event that problems arise down the road. There are several R packages that implement argument parsing logic in ways that make it easier both for us as writers of these sort of tools, as well as for our end users.
R 
january 2018 by sechilds
Learn to Write Command Line Utilities in R - part 4 - blog.sellorm.com
To be honest, a large part of the motivation for the approach that I’ve taken is that I didn’t want my kids to be able to easily figure out how it was working. In previous posts, we’ve been working on our command line Sorting Hat utility. We started out with a really simple tool that ran on the command line and just output a random Hogwarts house. Since then, we’ve extended that to accept an argument – in this case a name – and also added some input validation and an error message. The focus of this post will be a little different, we’ll be working on improving the Sorting Hat functionality, rather than anything specifically related to the command line operation of the script.
R 
january 2018 by sechilds
Learn to Write Command Line Utilities in R - part 3 - blog.sellorm.com
Yesterday we modified our simple sorting hat command line utility to accept it’s first argument, a name. Given that it’s not really possible to have a sensible default name we probably need to add some sort of input validation to make sure an argument has been provided. Always remember to include good error messages that tell the user of your command line application exactly what went wrong. Just because you’re working on a command line utility, doesn’t mean you shouldn’t take as much care as with any other class of application. Now that we’re properly checking for input and we have an suitable error message in place for those situations where the argument is missing, we can shift our focus to the next step.
R 
january 2018 by sechilds
Learn to Write Command Line Utilities in R - part 2 - blog.sellorm.com
If we take the humble ls command as an example, we can run it on its own, and it will just list the contents of the current working directory. Clearly arguments provide a great way to change the behaviour of our command line utility, so how do we go about doing that in an R script? The second change is that we’ve modified the final line to include the first argument, args[1] , in a short message and we’re now outputting that instead of just the house name. This time around, we’ve extended our Sorting Hat to use an argument, in this case a name, and improved the output a little. In the meantime, have a think about other improvements you’d like to see, like perhaps getting rid of the random assignment of a house (we’ll fix that soon, I promise!
R 
january 2018 by sechilds
Learn to Write Command Line Utilities in R - blog.sellorm.com
Things like grep for searching through text for a given string or regular expression, or wc for counting the number of words or lines in a file. For instance, the cd tool, could also count words in files, like wc does, but that would make the functionality considerably more difficult to learn. It basically says to the bash interpreter, that you want run the rest of the code in this file through whichever version of ‘Rscript’ the env command knows about. Running command line tools on Windows is a little harder than Linux and MacOS, which are both derived from Unix and therefore have very similar underpinnings. This means we’ll be able to quickly add new features that improve dramatically on what we’ve done so far and will hopefully turn our Sorting Hat script into something a lot more fun.
R 
january 2018 by sechilds
Learn to Write Command Line Utilities in R - part 6 - blog.sellorm.com
Even on the Unix derived OSes the convention is sometimes ignored, for example, the find and openssl command line utilities use a single dash for both long and short form options, but this is the exception, rather than the rule. Naturally, each has its own approach to solving the problems outlined above and in the previous article and the reader is encouraged to check them out and find one that suits them best. If I’m using R-based command line utilities to bootstrap a compute environment, the fact that argparser is written purely in R and has no external dependencies really makes a difference. The -x or --opts option is a interesting default as this allows us to specify an RDS file containing the argument values, instead of supplying them on the command line. The output here is unchanged from the previous version, but the way that we obtain it, using either -d or --debug make out utility feel much more like a first class command line citizen.
R 
january 2018 by sechilds
R for Excel users - Rex Analytics
One of the big stumbling blocks, in my view, is having a mental understanding of how we store data in structures in R. You can view your data structures in R, but unlike Excel where it’s in front of your face, it’s not always intuitive to the user just starting out. Homogeneous in this case just means all the ‘bits’ inside these structures need to be of the same type. Generally the content of each sub-list (column of the data frame) is the same (like you’d expect in a spreadsheet) but that’s not necessarily the case. The disadvantage of this structure is it can be slower to process – but if you’re at the stage of coding where you’re not sure if this matters to you, it probably doesn’t just now! That isn’t my idea – Hadley Wickham in Advanced R talks about it in much more detail.
R 
december 2017 by sechilds
set_name_formulas · GitHub
library(tidyverse)library(rlang)library(glue)library(stringr)small_iris <- slice(iris, 1)set_names(small_iris, ~ glue("{.x}_small"))set_names(small_iris, ~ tolower(.x))set_names(small_iris, ~ str_replace_all(.x, "\\. ", "_"))set_names(small_iris, ~ str_replace_all(.x, "\\. ", "_") %>% tolower())
R 
december 2017 by sechilds
bulk_cor · GitHub
library(purrr)library(tidystringdist)(comb <- tidy_comb_all(names(iris)))pmap(comb, ~ cor.test(iris[[.x]], iris[[.y]])) %>% map_df(broom::tidy) %>% cbind(comb, .)
R 
december 2017 by sechilds
R: the least disliked programming language (Revolutions)
According to a recent analysis of Stack Overflow "Developer Stories" , where programmer candidates list the technologies the would and would not like to work with, R is the least disliked programming language: This is probably related to the fact that there's high demand in the job market for fast-growing technologies, which is a disincentive for listing them on the "would not work with" section of an online resume. If you’ve read some of our other posts about the growing and shrinking programming languages, you might notice that the least disliked tags tend to be fast-growing ones . R, Python, Typescript, Go, and Rust are all fast-growing in terms of Stack Overflow activity (we’ve specifically explored Python and R before) and all are among the least polarizing languages. Read the complete analysis linked below for more insights, including the "most liked" languages and rivalries between languages (where liking one and disliking the other go hand-in-hand).
R 
december 2017 by sechilds
archivist: Boost the reproducibility of your research | SmarterPoland.pl
The common belief is that if one is able to replicate the execution environment then the same R source code will produce same results. But what about more complicated results like a random forest with 100k trees created with 100k variables or some deep neural network. Use the addHooksToPrint() function to automatically keep copies of every plot or model or data created with knitr report. Use the asearch() function to browse objects that fit specified search criteria, like class, date of creation, used variables etc. Use asession() to access session info with detailed information about versions of packages available during the object creation.
R  @followup 
december 2017 by sechilds
A Crazy Little Thing Called {purrr} - Part 4: mappers - Colin FAY
I’ve been working lately on a new package called {trycatchthis}, which
is an attempt at making try catch and condition handling in R a little
bit friendlier. If you have some time to spare, I’ll be glad if you can
give me some feedbacks about it: positive, negative, PR, improvement
suggestions https://github.com/ColinFay/trycatchthis … Feel free! Today
I’m gonna talk briefly about this cool stuff in {purrr} called mappers. In the background, mappers are created with {rlang} as_function : so,
as you could have guessed, this turns formulas into functions. As package developers, we need to keep in mind that everything is
possible when it comes to functions being used in the wild (characters
in place of number is just an example).
R 
december 2017 by sechilds
GitHub - ropensci/testdat: A package to run unit tests on tabular data
New pull request
Latest commit a6d95cb Sep 11, 2014 karthikUpdated slack notifications Apr 2, 2014 NAMESPACEUpdated slack notifications Sep 11, 2014 README.mdChanged syntax highlighting from coffee to r Jul 10, 2014 This suite would be extremely useful alongside unit tests for code to ensure that data read into R do not have errors in them. One possible usecase, then, is to print the results of these tests in your analysis or documentation immediately after loading the data. Using the testdat suite of functions allows you to create a convincing argument that you have properly dealt with data quality issues, in a way that is easily followed by readers of your analysis.
R  @followup 
december 2017 by sechilds
Functions with R and rvest: A Laymen’s Guide – peterjgensler – Medium
In addition to these tools, I would strongly encourage you to think about using a sandbox environment, using Docker or RStudio Cloud, as R has had some pain points with encodings on different operating systems. Determine what you would like to do for a given element Turn that “recipie” into a function Apply the recipe with purr over the object (and if necessary, create a new column to store the results in) One thing to note is that while R does have an iconv function, I have found the command line utility to be much more versatile for my needs, and you can simply put a bash chunk in RMarkdown notebook. Understand that when I first ran this script, the very first line failed right out the gate — — due to encoding issues, but I never got an error message until actually trying to use the spread function. If anything, one of the biggest challenges I faced as I wrote this article is just where exactly to start when looking for a solution: R Manual, stack overflow, a particular package, or even the RStudio Community .
R 
november 2017 by sechilds
Manipulation de facteurs avec forcats – R-atique
Quand au changement d'ordre, il peut être utile pour rendre compte d'une ordination naturelle de niveaux (par exemple les niveaux "Adolescent", "Adulte", "Enfant", "Nourrisson" d'un factor "Tranche d'âge" devraient a priori s'organiser dans l'ordre "Nourrisson","Enfant","Adolescent","Adulte"). Je me suis donc empressée d'aller voir ce billet de blog dont voici ci-après ma ré-interprétation/mise en images. Et pour un autre billet, toujours en français, et bien plus complet (et avec un vrai meme de chat -à ne pas confondre avec la mémère à chat-), c'est ici , sur le blog de ThinkR, que ça se passe. Dans certains cas, les effectifs des différents niveaux peuvent être très désequilibrés. Quand on produit des graphiques impliquant une variable catégorielle, il peut être utile d' ajuster la position/les niveaux des points/barres/boîtes à moustaches de manière à rendre le graphique le plus lisible possible.
R 
november 2017 by sechilds
The R manuals in bookdown format (Revolutions)
While there are hundreds of excellent books and websites devoted to R, the canonical source of truth regarding the R system remains the R manuals. Unlike books, the R manuals are updated by the R Core Team with every new release, so if you're not sure how the base R system is supposed to work this is the place to check. R user Colin Fay recently converted the manuals to bookdown format , using the ePub file as the source. This manual also covers a few CRAN packages for accessing databases , and includes a succinct primer on SQL queries. This manual covers how to get R, how to install and configure R on Windows, Mac and Unix, and how to manage package libraries.
R 
november 2017 by sechilds
Saving High-Resolution ggplots: How to Preserve Semi-Transparency - Easy Guides - Wiki - STHDA
This article describes solutions for preserving semi- transparency when saving a ggplot2 -based graphs into a high quality postscript (.eps) file format. If you try to export the picture as vector file (EPS ), the 95% confidence interval will disappear and the saved plot looks as follow: In the following sections, we’ll describe convenient solutions to save high-quality ggplots by preserving semi-transparency. Note that, the argument fallback_resolution is used to control the resolution in dpi at which semi-transparent areas are rasterized (the rest stays as vector format). Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.
ggplot2  R 
november 2017 by sechilds
How do you convince other people to use R? · Simply Statistics
I’d like to give a big thanks to the organizers (Nick Tierney, Di Cook, Rob Hyndman and others) for putting on a great unconference. Unlike packages like SAS, Stata, or SPSS, R came with a robust and sophisticated Lisp-like programming language that was well-suited for data analysis applications. More importantly, the people contributing those packages and the greater R community have expanded tremendously over time, bringing in new users and pushing R to be useful in more applications. Not many other packages have something similar so it makes an obvious selling point to people working in this area The idea of using R end-to-end came up, meaning using R to clean up messy data and taking it all the way to some interactive Shiny app on the other end. The idea that you can use the same tool to do all the things in between made for a compelling case for R. For the spreadsheet audience, the dplyr package was sometimes a good selling point.
R 
november 2017 by sechilds
glue 1.2.0 - Tidyverse
Compared to equivalents like paste() and sprintf() it is easier to write and less time consuming to maintain. glue() works in a similar way to double quotes " in a shell or python’s String Interpolation . glue_data() works like glue() , but instead of looking up its variables from the calling environment it looks them up from the first argument (usually a data frame or tibble). A big thanks goes to all the community members who contributed code and opened issues since the last release! ( @artemklevtsov , @daroczig , @DarwinAwardWinner , @edarague , @hadley , @hughjonesd , @jennybc , @jimhester , @jjchern , @klmr , @krlmlr , @lionel- , @mgirlich , @mmuurr , @npjc , @pssguy , and @robinsones )
R  followup 
november 2017 by sechilds
Render reports directly from R scripts – andrew brooks
Maybe not gripes, maybe just feelings of uncertainty over whether it makes sense to contain your hard work in an Rmarkdown file or an R script, or both. I have no doubt there are tools that exist (or can be easily developed) to strip the code chunks from an Rmarkdown file, but this seems cumbersome. I’ve been tempted in the past to maintain both a bare-bones R script and a verbose flowery Rmd file describing the process. Run-time: This isn’t very well addressed by either method, but I certainly find it easier to work with bigger data anything computationally intensive using native R scripts. Your team members might gaze at seemingly strange comments in your R scripts, but they can run, read, edit and pipe your code as if it was their own.
R  @followup 
october 2017 by sechilds
For the Love of Statistics: A Conversation with Hadley Wickham | Coursera Blog
Hadley developed famous statistical analysis software packages for R (programming language) and co-wrote the book “R for Data Science.” He joined us at our headquarters in Mountain View to talk about the challenges of data science and we were able to ask him a few questions before: I originally wanted to become a genetic engineer so I did medical school for three years before deciding it wasn’t for me. This was before Data Science was prominent, so I double majored in statistics and computer science at The University of Auckland and it was the perfect combination because it’s the home of R. What inspired you to create tools for data scientists? I loved programming and thought “maybe I could make this easier for people.” What advice would you give to someone who is starting out? Whether you’re really into sports or knitting, try to find some data that’s of interest to you because that will help you get through the initial frustrations.
R 
september 2017 by sechilds
PURRRty PowerPoint with R · Len Kiefer
Let’s imagine that our task is to make a PowerPoint chartbook with a bunch of slides summarizing U.S. macroeconomic conditions. Obviously, we can use ggplot2 to make awesome charts in R. And if we’re talking U.S. macroeconomic data we can probably get it from the Saint Louis Federal Reserve Economic Database (FRED) . FRED will make our lives easy (for demonstration purposes only, actual results may vary) by putting all the data we want in one place. What we are going to do is create a simple line plot for each series in our data and save it as it’s own individual image file ( .png here). But I think we can all agree that this type of application is exactly what the creators of purrr, including Hadley Wickham , had in mind when they created this awesome library.
R 
september 2017 by sechilds
GitHub - r-hub/rhub: R-hub API client
New pull request
Latest commit b2b8fd4 Jul 2, 2017 gaborcsardiUpdate roxygen2 version Rcheck_on_macos Jul 2, 2017 instNew email validation ui image in README Oct 6, 2016 manUpdate roxygen2 version Jul 2, 2017 testsRefactor using R6 classes Oct 25, 2016 .RbuildignoreInitial commit of Mason template Aug 30, 2016 .gitignoreInitial commit of Mason template Aug 30, 2016 .travis.ymlTest coverage Oct 5, 2016 DESCRIPTIONUpdate roxygen2 version Jul 2, 2017 LICENSEInitial commit of Mason template Aug 30, 2016 NAMESPACEcheck_on_macos Jul 2, 2017 NEWS.mdInitial commit of Mason template Aug 30, 2016 README.mdadd validate_email() Dec 9, 2016 appveyor.ymlAppveyor config file Oct 4, 2016 The package tries to detect your email address,
and if it fails to do this correctly, you'll need to specify it. rhub stores the token permanently on the machine, so you do not need
to validate your email again. check_with_valgrind() runs the build and check on Linux, in valgrind to find memory leaks and pointer errors.
R 
september 2017 by sechilds
Data.Table by Example – Part 1 | Mathew Analytics
For many years, I actively avoided the data.table package and preferred to utilize the tools available in either base R or dplyr for data aggregation and exploration. For this series of posts, I will be working with data that comes from the Chicago Police Department’s Citizen Law Enforcement Analysis and Reporting system. This dataset contains information on reported incidents of crime that occured in the city of Chicago from 2001 to present. If you want to plot the results as a line graph, just add another chain which executes the visualization or use the maggritr %>% operator. So if you are looking for a statistical analyst or data scientist with strong experience in programming in R, time series econometrics, and machine learning, please contact me at mathewanalytics@gmail.com or reach me through LinkedIn .
R 
september 2017 by sechilds
Ben Casselman on Twitter: Huzzah for #rstats 'chunked' package. Huge help for getting big datasets into R (and plays nicely with dbplyr's SQL tools).
@bencasselman: Huzzah for #rstats 'chunked' package. Huge help for getting big datasets into R (and plays nicely with dbplyr's SQL tools).
R 
september 2017 by sechilds
Using the iGraph package to Analyse the Enron Corpus
Written by: Peter Prevos on 13 July 2017.The Enron scandal is one of the most famous corporate governance failures in the history of capitalism. Network analysis is a versatile technique that can be used to add value to a lot of different data sets, including the complex corporate relationships of Donald Trump . As part of their inquiries, The Federal Energy Regulatory Commission used an extensive collection of emails from Enron employees. When analysing this data set, we need to keep in mind that the majority of former Enron employees were innocent people who lost their jobs due to the greed of their overlords. In the centre of the graph we see w few individuals who are connectors between groups because they send a lot of emails to people outside their community.
R 
august 2017 by sechilds
broom: let's tidy up a bit
The broom package takes the messy output of built-in functions in R, such as lm , nls , or t.test , and turns them into tidy data frames. In R, row names must be unique, so combining coefficients from many models (e.g., from bootstrap resamples, or subgroups) requires workarounds to avoid losing important information. broom is an attempt to bridge the gap from untidy outputs of predictions and estimations to the tidy data we want to work with. It centers around three S3 methods, each of which take common objects produced by R statistical functions ( lm , t.test , nls , etc) and convert them into a data frame. There is no augment function for htest objects, since there is no meaningful sense in which a hypothesis test produces output about each initial data point.
R 
august 2017 by sechilds
r - Create t.test table with dplyr? - Stack Overflow
I want to create a t.test table with a row for "A" and one for "B" I'm wondering if there is a cleaner way of doing this with dplyr. You should really replace the do.call function with rbindlist from data.table : You can add ncontrol, ntreatment manually. An old question, but the broom package has since been made available for this exact purpose (as well as other statistical tests):
R  dplyr 
august 2017 by sechilds
Multiple t-Tests with dplyr – Sebastian Sauer Stats Blog
Suppose you have a data set where you want to perform a t-Test on multiple columns with some grouping variable. As an example, say you a data frame where each column depicts the score on some test (1st, 2nd, 3rd assignment…). is used to select the columns we want to perform the t-Test on (here: tip and total_bill ) plus the grouping variable ( sex ). we “melt” the data frame down, so that all numeric variables are put in one column (underneath each other). That means in practice, that the following t-test will be applied to each member of this group (ie., each variable, here tip and total_bill ).
R  dplyr 
august 2017 by sechilds
My First Steps into The World of Tidyeval
Jenny Bryan and Lionel Henry guided me to different ways to approach the problem, mentioning Tidyeval . The example I shared in the issue was having a list column Tags_terms in a dataframe dat_r , that I wanted to mutate with the count of tags in each row. As we can see, cname quotes the input Tags_terms and returns a quosure (a special type of formula) . After getting acquainted with Tidy eval, I had other situations when I wanted to to write general functions to avoid repetition. bigrams and trigrams tables with counts In the normal case, this means writing identical lines three times (as shown below) , with minor changes in the type of ngrams, and the name of the output columns.
R  dplyr 
august 2017 by sechilds
Feature Engineering with Tidyverse | Open Data Science
Feature engineering is crucial for a variety of reasons, and it requires some care to produce any useful outcome. There are 39 different crime categories, which explains the limits of the sum in the denominator, while 1 is added to the ratio since I will compute the log of this feature eventually. As a result, the trained model found much higher weights for these features, since they are highly correlated to the target by construction. However, if one splits the training data into 2 pieces, and construct crime by address ratios from piece_1 and merge them with piece_2 (and repeat vice versa from piece_2 to piece_1 ) then the overfitting could be mitigated. The reason that this works is because the new features are constructed by using out-of-sample target values and so the crime by address ratios of each piece is not memorized.
R  data:science 
august 2017 by sechilds
The magick package: Advanced Image-Processing in R
A relatively recent addition to the package is a native R graphics device which produces a magick image object. By default image_draw() sets all margins to 0 and uses graphics coordinates to match image size in pixels (width x height) where (0,0) is the top left corner. It is recommended to brew with at least --with-fontconfig and --with-librsvg to support high quality font / svg rendering (the CRAN OSX binary package enables these as well). The image_info function shows some meta data about the image, similar to the imagemagick identify command line utility. The grid package makes it easier to overlay a raster on the graphics device without having to adjust for the x/y coordinates of the plot.
R 
august 2017 by sechilds
Spatial Processing in R: Building a website with pkgdown: a short guide
To build a website using pkgdown , all you need to have is an R package hosted on Git Hub , with a file structure "tweaked" with some functionality provided by devtools . build_site() will do several things: create a "docs" subfolder in your file structure, where it will place all the material needed for rendering the website; knit README.Rmd to "docs/index.html". to create additional material on the right-hand side bar of the home page; Put everything together by some magic to build a working website, and open a preview in RStudio Viewer or your browser. However, spending some time in better configuring the structure of the site and tweaking some small things allows to achieve a much nicer result, as explained below. If you build the site using build_site(run_dont_run = TRUE) , the examples with the "dont_run" specification in the roxygen comment will be run, and their results appear in the documentation page of each function.
R  @followup 
august 2017 by sechilds
Random Forests in R | DataScience+
Ensemble Learning is a type of Supervised Learning Technique in which the basic idea is to generate multiple Models on a training dataset and then simply combining(average) their Output Rules or their Hypothesis \( H_x \) to generate a Strong Model which performs very well and does not overfits and which balances the Bias-Variance Tradeoff too. The idea is that instead of producing a single complicated and complex Model which might have a high variance which will lead to Overfitting or might be too simple and have a high bias which leads to Underfitting, we will generate lots of Models by training on Training Set and at the end combine them. Random Forests are similar to a famous Ensemble technique called Bagging but have a different tweak in it. Now in this article, I gave a simple overview of Random Forests and how they differ from other Ensemble Learning Techniques and also learned how to implement such complex and Strong Modelling Technique in R with a simple package randomForest .Random Forests are a very Nice technique to fit a more Accurate Model by averaging Lots of Decision Trees and reducing the Variance and avoiding Overfitting problem in Trees. I hope the tutorial is enough to get you started with implementing Random Forests in R or at least understand the basic idea behind how this amazing Technique works.
R  machine_learning 
august 2017 by sechilds
A modern database interface for R (Revolutions)
The odbc package provides connections with any ODBC-compliant database, and has been comprehensively tested on SQL Server, PostgreSQL and MySQL. But the real power comes in being able to use high-level functions from the dplyr package and have the data processing run in the database , instead of in the local R session. In the demo starting at the 8:20 mark, Jim connects to SQL Server (here running in a VM on his laptop, in non-demo situations more likely to be on a remote machine). You'll even get a lesson on the importance of sanitizing user inputs to prevent SQL injection, and a solution based on parameterized queries. The DBI package has been around for a while, but has recently undergone a major upgrade thanks to Kirill Müller and funding from the R Consortium .
R  data  database 
august 2017 by sechilds
r - Overriding "Variables not shown" in dplyr, to display all columns from df - Stack Overflow
This reveals that trunc_mat is the function responsible for what is printed and not, including which variables. Sadly, dplyr:::print.tbl_df does not pass on any parameters to trunc_mat and trunc_mat also does not support choosing which variables are shown (only how many rows). I came up with this solution that holds to the spirit of piping but identical in function to the accepted answer (note that the pipe symbol %.% is deprecated in favor of %>% ) When you've finally whittled your stuff down to what you want and don't want to be saved from your own mistakes anymore, just stick print.default on the end to spit out everything. BTW, methods(print) shows how many packages need to write their own print functions (think about, eg, igraph or xts --- these are new data-types so you need to tell them how to be displayed on the screen).
R  dplyr 
august 2017 by sechilds
Glimpse (part of the tibble package)
This looks really useful for having a look at all the variables in a df.

This is like a transposed version of print: columns run down the page, and data runs across. This makes it possible to see e...
R  dplyr  R:tidyverse  from notes
august 2017 by sechilds
The Split-Apply-Combine Strategy for Data Analysis | Wickham | Journal of Statistical Software
Authors:Hadley WickhamTitle: The Split-Apply-Combine Strategy for Data Analysis Abstract:Many data analysis problems involve the application of a split-apply-combine strategy, where you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together. This insight gives rise to a new R package that allows you to smoothly apply this strategy, without having to worry about the type of structure in which your data is stored.The paper includes two case studies showing how these insights make it easier to work with batting records for veteran baseball players and a large 3d array of spatio-temporal ozone measurements. Page views: : 149507. Submitted: 2009-09-24. Paper:
The Split-Apply-Combine Strategy for Data Analysis Download PDF (Downloads: 162271)
Supplements:plyr_1.4.1.tar.gz: R source package Download (Downloads: 2510; 509KB)
v40i01.R: R example code from the paper Download (Downloads: 4167; 8KB)
ozone-map.R: Supplementary R code for ozone map Download (Downloads: 2862; 1KB)
timings.R: Supplementary R code for timing comparisons Download (Downloads: 2800; 1KB)
DOI: 10.18637/jss.v040.i01 This work is licensed under the licensesPaper: Creative Commons Attribution 3.0 Unported License Code: GNU General Public License (at least one of version 2 or version 3 ) or a GPL-compatible license .
data  R  data:analysis  @followup 
august 2017 by sechilds
Securely store API keys in R scripts with the "secret" package
That's easy to do if you just include those keys as strings in your code — but it's not very secure. It's also really easy to inadvertently include those keys in a public repo if you use Github or similar code-sharing services. To address this problem, Gábor Csárdi and Andrie de Vries created the secret package for R. The secret package integrates with OpenSSH , providing R functions that allow you to create a vault to keys on your local machine, define trusted users who can access those keys, and then include encrypted keys in R scripts or packages that can only be decrypted by you or by people you trust. You can see how it works in the vignette secret: Share Sensitive Information in R Packages, and in this presentation by Andrie de Vries at useR!2017 : To use the secret package, you'll need access to your private key, which you'll also need to store securely.
R  @followup 
july 2017 by sechilds
Teach the tidyverse to beginners – Variance Explained
There’s another debate that has popped up recently on Twitter and in conversations (many this week at the useR conference), about how to teach general R programming and data manipulation, and about the role of the “tidyverse” in such education. From experience, I promise that students who have never programmed before can complete and understand the above code in a 2-3 hour workshop, by learning each function one at a time. And teaching beginners how to make a loop efficient (e.g. pre-allocating memory rather than growing a vector) is an advanced topic that would send an introductory course off track. But they’re typically used in cases that involve processing unknown input (you wouldn’t use a conditional when you’re analyzing one dataset) or are implementing more complicated algorithms like expectation-maximization. The first chapter, for example, introduces %>% and several dplyr functions ( filter / mutate / select / group_by / summarize / arrange ), but also variable assignment, logical operators, %in% , and mean(x == 1) .
R  R:tidyverse 
july 2017 by sechilds
The tidyverse style guide
Use %>% when you find yourself composing three or more functions together into a nested call, or creating intermediate objects that you don’t care about. Reserve pipes for a sequence of steps applied to one primary object. There are meaningful intermediate objects that could be given informative names. magrittr allows you to omit () on functions that don’t have arguments. This is because the name acts as a heading, which reminds you of the purpose of the pipe.
R  R:programming 
july 2017 by sechilds
GitHub - mine-cetinkaya-rundel/2017-07-05-teach-ds-to-new-user: Slides and demo materials for the "Teaching data science to new useRs" talk at useR2017.
New pull request
Latest commit ab87acb Jul 5, 2017 mine-cetinkaya-rundeladd examples Failed to load latest commit information. examplesadd examples Jul 5, 2017 .gitignoreignore keynote Jul 5, 2017 README.mdUpdate README.md Mar 6, 2017 abstract.mdupdate package name Jun 30, 2017 teach-ds-to-new-user.pdfslides Jul 5, 2017 Slides and other materials for the "Teaching data science to new useRs" talk at useR2017.
R  @followup 
july 2017 by sechilds
The tidyverse style guide
If any description corresponding to a roxygen tag spans over multiple lines, add another two spaces of extra indention. For most tags, like @param , @seealso and @return , the text should be a sentence, starting with a capital letter and ending with a full stop. For all bullets, enumerations argument descriptions and the like, use sentence case and put a period at the end of each text element, even if it is only a few words. However, avoid capitalization of function names or packages since R is case sensitive. When referring to other sections in the documentation, use single quotes and upper title style capitalization.
R 
july 2017 by sechilds
user2017/user2017_shiny-collections.pdf at master · Appsilon/user2017 · GitHub
a084268 Jul 7, 2017 marekrogalaUpload shiny.collections presentation. 1 contributor Sorry, we cannot display this file. Sorry, this file is invalid so it cannot be displayed. Jump to Line
R 
july 2017 by sechilds
Down the rabbit hole with tidyeval — Part 1 - Colin FAY
you type/send something to the console (called a symbol) press enter R does some magic stuffs R returns you the value associated with the expression In fact, R takes the symbol you’ve entered (here a ), turns it into and internal representation, then looks in the direct environment of the expression in order to return the value associated with it. God damn, how is it that dplyr::select works with unquoted element, while select_custom needs a quoted string? In the case of filter , R looks for a column named var in df (in practice, that’s not exactly how it works, but you get the point). So the thing is: dplyr functions work with a special type of objects, called quosure — this is how symbols are evaluated.
R  R:programming 
july 2017 by sechilds
Lesser known dplyr tricks - Econometrics and Free Software
In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr . Still using select() , it is easy te re-order columns in your data frame: It is easy to select the columns that start with “spam” with some helper functions: You might want to create a new variable conditionally on several values of another column: Then I converted the output to a tidy data frame using broom::tidy() .
R  dplyr 
july 2017 by sechilds
How to not be afraid of your data
Bioinformatics Developer, Instructor, and Postdoctoral Researcher at OHSU Collaborative Informaticist and R/Data Science evangelist Cascadia-R co-organizer Plays well with others Exploratory Data Analysis Review Using shiny / dplyr to teach EDA Quick Workshop! In contrast to Confirmatory Data Analysis (CDA) , such as hypothesis testing, the goals of EDA are to: An experimental weight loss drug was first tested at one site with volunteers (DatasetA). Your goal is to conduct EDA on the two separate datasets to assess whether there was an effect from the weight loss drug.
data  data:analysis  R  R:Shiny 
july 2017 by sechilds
Getting Started with Shiny - data visualization - Bocoup
In this blog post we take a look at Rstudio’s Shiny package and the first steps toward creating a working interactive to explore your data with it. If you have a data analysis written in R and want to create an exploration tool or dashboard using that R code, Shiny is a great option! An important gotcha here is that due to the details of implementing reactive variables, filtered_data is actually a function that we call in order to get the updated data, hence the () at the end. You can imagine repeating this paradigm any number of times to create a variety of plots (and other outputs like tables ) for your users to consume and interact with. Shiny could also be a great solution for when a dashboard needs to be built and iterated upon quickly, or for when you need to view datasets that are too large to fit into the browser.
R  R:Shiny  @followup 
july 2017 by sechilds
The current state of naming conventions in R - UseR 2017
This is a lightning talk I held at the UseR 2017 conference in Brussels. I talk about the current state of naming conventions used in the R community, what h...
video  R  @followup 
july 2017 by sechilds
HexJSON HTMLWidget for R, Part 2
In my previous post - HexJSON HTMLWidget for R, Part 1 - I described a first attempt at an HTMLwidget for displaying hexJSON maps using d3-hexJSON. I had another play today and added a few extra features, including the ability to: add a grid (as demonstrated in the original d3-hexJSON examples), modify the default colour…
R  JSON 
july 2017 by sechilds
bigrquery 0.4.0
> I’m pleased to announce that bigrquery 0.4.0 is now on CRAN. bigrquery makes it possible to talk to Google’s BigQuery cloud database. It provides both DBI and dplyr backends so you can interact with BigQuery using either low-level SQL or high-level dplyr verbs.
R  BigQuery 
july 2017 by sechilds
« earlier      
per page:    204080120160

Copy this bookmark:



description:


tags: