Nonstop Metropolis: | Queens Museum
includes linguistic map of queens + info about languages
nyc  linguistics 
september 2016 by robincamille
Characterizing the Google Books corpus
TLDR not actually a reflection of language use in culture
october 2015 by robincamille
HTRC Portal - About
Extracted Features From the HTRC
Note that this is an alpha data release. Please send feedback to

A great deal of fruitful research can be performed using non-consumptive pre-extracted features. For this reason, HTRC has put together a select set of page-level features extracted from the HathiTrust's non-Google-digitized public domain volumes. The source texts for this set of feature files are primarily in English.

Features are notable or informative characteristics of the text. We have processed a number of useful features, including part-of-speech tagged token counts, header and footer identification, and various line-level information. This is all provided per-page. Providing token information at the page level makes it possible to separate text from paratext. (An example of the latter may be: thirty pages of publishers’ ads at the back of a book). We have also decided to break each page into a collection of three parts: header, body, and footer. The specific features that we extract from the text are described in more detail below.
linguistics  thesis 
june 2014 by robincamille
PLOS ONE: Digital Language Death
Of the approximately 7,000 languages spoken today, some 2,500 are generally considered endangered. Here we argue that this consensus figure vastly underestimates the danger of digital language death, in that less than 5% of all languages can still ascend to the digital realm. We present evidence of a massive die-off caused by the digital divide.
march 2014 by robincamille
Inflationary effects in language and elsewhere
via angus grieve-smith

Inflation is a well-known phenomenon to most of us. Together with unemployment
inflation is one of the typical diseases of modern economies. However, inflationary
processes are not restricted to the economic sphere in the proper sense. Consider for
instance the English words gentleman and lady, which in their original meaning denoted
persons from the nobility, but today are often used synonymously to man and woman.
... Intuitively, we may say that titles tend to lose their "value" over
time, but exactly what is the parallel with money here?
january 2014 by robincamille
How to write 261 leads in a fraction of a second | Poynter.
If a bot can write a better story than you can, it’s not you: it’s the story. It’s a crappy story and a human being shouldn’t be writing it.
september 2013 by robincamille
Language Log » Computational linguistics and literary scholarship
wikipedia as a problematic dataset; science's savior complex. good comments.
linguistics  criticism 
september 2013 by robincamille

