sharon_howardbundlesall_the_data_things   785
Math in Data Science
Statistics is the only mathematical discipline we mentioned in that definition, but data science also regularly involves other fields within math. Learning statistics is a great start, but data science also uses algorithms to make predictions. These algorithms are called machine learning algorithms and there are literally hundreds of them. Covering how much math is needed for every type of algorithm in depth is not within the scope of this post, I will discuss how much math you need to know for each of the following commonly-used algorithms:

Naive Bayes
Linear Regression
Logistic Regression
Neural Networks
K-Means clustering
Decision Trees
programming  maths  statistics 
15 days ago by sharon_howard
SKILLNET – Sharing Knowledge in Learned and Literary Networks | CEMROL
CEMROL (Crowdsourcing Epistolary Metadata of the Republic of Letters) is a crowdsourcing project in which anyone interested in history or letters can participate. The project focuses on letters from the 15th up to the 18th century with a learned character.

The aim is to map out who wrote letters with whom, from where and on what date. We call on the help of the public to type in the data of letters from scanned books in which these letters are printed. We challenge the public to indicate on their own computer screens where in these books letters begin and end, and where the names of the letter writers and recipients are, the date and place of dispatch, and preferably also the place of addressing. We also ask the public to type in the names and dates. You can choose the language of the letters you want to work with.
crowdsourcing  data  letters 
22 days ago by sharon_howard
The British Library / Qatar Foundation Partnership Imaging Hack Day - Digital scholarship blog
Our imaging team are a highly-skilled group, with a variety of backgrounds, experiences and talents, and we wished to harness these. Therefore, we decided to set aside a day for our Imaging team to use their creative and technical skills to ‘hack’ the material in our collection.
data  digtal  hacking 
27 days ago by sharon_howard
Does the decline of gender within literary studies matter? – .txtLAB @ McGill
Since then I’ve been working with more models to understand how gender has played out as a category of research in the past half-century. The figure below shows the rise and fall of the “gender studies topic”, which was derived using LDA with 60 topics. As I mention elsewhere, it is one of the most stable topics across multiple runs and multiple topic-size parameters.
data  gender  scholarship 
4 weeks ago by sharon_howard
R can API and So Can You! – Heather Nolis – Medium
When our team was tasked with creating a customer-facing deep learning model, I proposed making it into an API– which was met with a swath of data science confusion. An API is the textbook way to allow other T-Mobile software to leverage our model. The most engineering-savvy data scientist on the team kept referring to it as “R as a web server.” While technically true, to me this as an amusingly spot-on, living example of the typical data scientist resistance towards API’s. On the opposite side of the fence is software engineering where everything is an API.
r  api  data 
5 weeks ago by sharon_howard
How to Create a Correlation Matrix in R | Displayr
A correlation matrix is a table of correlation coefficients for a set of variables used to determine if a relationship exists between the variables. The coefficient indicates both the strength of the relationship as well as the direction (positive vs. negative correlations).
r  dataviz  correlation 
5 weeks ago by sharon_howard
Impact of Social Sciences – Excel is threatening the quality of research data — Data Packages are here to help
Excel has a well-documented history of silently corrupting data in unexpected ways which leads some, like data scientist Jenny Bryan, to compile lists of “Scary Excel Stories” advising researchers to choose alternative formats or, at least, treat data stored in Excel warily.

With data validation and long-term preservation in mind, we’ve created Data Packages which provide researchers with an alternative format to Excel by building on simpler, well-understood, text-based file formats like CSV and JSON and adding advanced features. Added features include providing a framework for linking multiple tables together; setting column types, constraints, and relations between columns; and adding high-level metadata like licensing information. Transporting research data with open, granular metadata in this format, paired with tools like Good Tables for validation, can be a safer and more transparent option than Excel.
data  rdm  kill_excel 
5 weeks ago by sharon_howard
Financial sustainability of local authorities 2018 visualisation - National Audit Office (NAO)
The data we present shows changes in income and spending alongside analysis of factors such as budget overspends and use of reserves. These figures can change for a range of reasons such as local political priorities, changes in local demand and changes in government policy and priorities. Consequently, comparisons between places need to be undertaken with caution. The complexity of factors underlying the data means that differences in figures presented here should not be viewed as indicative in any way of the current ‘performance’ of an authority. Any apparent differences between places should be seen as an opportunity to gather more information and build a richer understanding.

The data in the report was used to present a picture of the key trends affecting the sector or particular groups of authorities. The analysis was not designed to identify specific authorities that were felt to be at risk. While we present data for individual authorities in the visualisation, the purpose is to allow for comparison between individual authorities and comparator groups on individual indicators. While the data in these visualisations is potentially relevant to an assessment of the financial sustainability of individual authorities it by no means represents a full assessment. As suc
data  local_authorities 
7 weeks ago by sharon_howard
all_the_data_things 91 tags


Copy this bookmark: