anglo   321

« earlier    

Why is Google Translate so bad for Latin? A longish answer. : latin
hmm:
> All it does its correlate sequences of up to five consecutive words in texts that have been manually translated into two or more languages.
That sort of system ought to be perfect for a dead language, though. Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.

We're not exactly inundated with brand new Latin to translate.
--
> Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.
What makes you think that the Google folks haven't done so and used that to create the language models they use?
> That sort of system ought to be perfect for a dead language, though.
Perhaps. But it will be bad at translating novel English sentences to Latin.
foreign-lang  reddit  social  discussion  language  the-classics  literature  dataset  measurement  roots  traces  syntax  anglo  nlp  stackex  links  q-n-a  linguistics  lexical  deep-learning  sequential  hmm  project  arrows  generalization 
18 days ago by nhaliday
Hardware is unforgiving
Today, anyone with a CS 101 background can take Geoffrey Hinton's course on neural networks and deep learning, and start applying state of the art machine learning techniques in production within a couple months. In software land, you can fix minor bugs in real time. If it takes a whole day to run your regression test suite, you consider yourself lucky because it means you're in one of the few environments that takes testing seriously. If the architecture is fundamentally flawed, you pull out your copy of Feathers' “Working Effectively with Legacy Code” and you apply minor fixes until you're done.

This isn't to say that software isn't hard, it's just a different kind of hard: the sort of hard that can be attacked with genius and perseverance, even without experience. But, if you want to build a ship, and you "only" have a decade of experience with carpentry, milling, metalworking, etc., well, good luck. You're going to need it. With a large ship, “minor” fixes can take days or weeks, and a fundamental flaw means that your ship sinks and you've lost half a year of work and tens of millions of dollars. By the time you get to something with the complexity of a modern high-performance microprocessor, a minor bug discovered in production costs three months and five million dollars. A fundamental flaw in the architecture will cost you five years and hundreds of millions of dollars2.

Physical mistakes are costly. There's no undo and editing isn't simply a matter of pressing some keys; changes consume real, physical resources. You need enough wisdom and experience to avoid common mistakes entirely – especially the ones that can't be fixed.
techtariat  comparison  software  hardware  programming  engineering  nitty-gritty  realness  roots  explanans  startups  tech  sv  the-world-is-just-atoms  examples  stories  economics  heavy-industry  hard-tech  cs  IEEE  oceans  trade  korea  asia  recruiting  britain  anglo  expert-experience  growth-econ  world  developing-world  books  recommendations  intricacy  dan-luu  age-generation  system-design  correctness 
4 weeks ago by nhaliday
List of languages by total number of speakers - Wikipedia
- has both L1 (native speakers) and L2 (second-language speakers)
- I'm guessing most of Mandarin's L2 speakers are Chinese natives. Lots of dialects and such (Cantonese) within the country.
wiki  reference  data  list  top-n  ranking  population  scale  language  linguistics  anglo  china  asia  foreign-lang  objektbuch  india  MENA  europe  gallic  demographics 
march 2019 by nhaliday
ellipsis - Why is the subject omitted in sentences like "Thought you'd never ask"? - English Language & Usage Stack Exchange
This is due to a phenomenon that occurs in intimate conversational spoken English called "Conversational Deletion". It was discussed and exemplified quite thoroughly in a 1974 PhD dissertation in linguistics at the University of Michigan that I had the honor of directing.

Thrasher, Randolph H. Jr. 1974. Shouldn't Ignore These Strings: A Study of Conversational Deletion, Ph.D. Dissertation, Linguistics, University of Michigan, Ann Arbor

...

"The phenomenon can be viewed as erosion of the beginning of sentences, deleting (some, but not all) articles, dummies, auxiliaries, possessives, conditional if, and [most relevantly for this discussion -jl] subject pronouns. But it only erodes up to a point, and only in some cases.

"Whatever is exposed (in sentence initial position) can be swept away. If erosion of the first element exposes another vulnerable element, this too may be eroded. The process continues until a hard (non-vulnerable) element is encountered." [ibidem p.9]
q-n-a  stackex  anglo  language  writing  speaking  linguistics  thesis  trivia  cocktail  parsimony  compression 
march 2019 by nhaliday
Verbal Edge: Borges & Buckley | Eamonn Fitzgerald: Rainy Day
At one point, Borges said that he found English “a far finer language” than Spanish and Buckley asked “Why?”

Borges: There are many reasons. Firstly, English is both a Germanic and a Latin language, those two registers.

...

And then there is another reason. And the reason is that I think that of all languages, English is the most physical. You can, for example, say “He loomed over.” You can’t very well say that in Spanish.

Buckley: Asomo?
Borges: No; they’re not exactly the same. And then, in English, you can do almost anything with verbs and prepositions. For example, to “laugh off,” to “dream away.” Those things can’t be said in Spanish.

http://www.oenewsletter.org/OEN/print.php/essays/toswell43_1/Array
J.L.B.: "You will say that it's easier for a Dane to study English than for a Spanish-speaking person to learn English or an Englishman Spanish; but I don't think this is true, because English is a Latin language as well as a Germanic one. At least half the English vocabulary is Latin. Remember that in English there are two words for every idea: one Saxon and one Latin. You can say 'Holy Ghost' or 'Holy Spirit,' 'sacred' or 'holy.' There's always a slight difference, but one that's very important for poetry, the difference between 'dark' and 'obscure' for instance, or 'regal' and 'kingly,' or 'fraternal' and 'brotherly.' In the English language almost al words representing abstract ideas come from Latin, and those for concrete ideas from Saxon, but there aren't so many concrete ideas." (P. 71) [2]

In his own words, then, Borges was fascinated by Old English and Old Norse.
interview  history  mostly-modern  language  foreign-lang  anglo  anglosphere  culture  literature  writing  mediterranean  latin-america  germanic  roots  comparison  quotes  flexibility  org:junk  multi  medieval  nordic  lexical  parallax 
february 2019 by nhaliday
Language Log » English or Mandarin as the World Language?
- writing system frequently mentioned as barrier
- also imprecision of Chinese might hurt its use for technical writing
- most predicting it won't (but English might be replaced by absence of lingua franca per Nicholas Ostler)
linguistics  language  foreign-lang  china  asia  anglo  world  trends  prediction  speculation  expert-experience  analytical-holistic  writing  network-structure  science  discussion  commentary  flux-stasis  nationalism-globalism 
february 2019 by nhaliday
A cross-language perspective on speech information rate
Figure 2.

English (IREN = 1.08) shows a higher Information Rate than Vietnamese (IRVI = 1). On the contrary, Japanese exhibits the lowest IRL value of the sample. Moreover, one can observe that several languages may reach very close IRL with different encoding strategies: Spanish is characterized by a fast rate of low-density syllables while Mandarin exhibits a 34% slower syllabic rate with syllables ‘denser’ by a factor of 49%. Finally, their Information Rates differ only by 4%.

Is spoken English more efficient than other languages?: https://linguistics.stackexchange.com/questions/2550/is-spoken-english-more-efficient-than-other-languages
As a translator, I can assure you that English is no more efficient than other languages.
--
[some comments on a different answer:]
Russian, when spoken, is somewhat less efficient than English, and that is for sure. No one who has ever worked as an interpreter can deny it. You can convey somewhat more information in English than in Russian within an hour. The English language is not constrained by the rigid case and gender systems of the Russian language, which somewhat reduce the information density of the Russian language. The rules of the Russian language force the speaker to incorporate sometimes unnecessary details in his speech, which can be problematic for interpreters – user74809 Nov 12 '18 at 12:48
But in writing, though, I do think that Russian is somewhat superior. However, when it comes to common daily speech, I do not think that anyone can claim that English is less efficient than Russian. As a matter of fact, I also find Russian to be somewhat more mentally taxing than English when interpreting. I mean, anyone who has lived in the world of Russian and then moved to the world of English is certain to notice that English is somewhat more efficient in everyday life. It is not a night-and-day difference, but it is certainly noticeable. – user74809 Nov 12 '18 at 13:01
...
By the way, I am not knocking Russian. I love Russian, it is my mother tongue and the only language, in which I sound like a native speaker. I mean, I still have a pretty thick Russian accent. I am not losing it anytime soon, if ever. But like I said, living in both worlds, the Moscow world and the Washington D.C. world, I do notice that English is objectively more efficient, even if I am myself not as efficient in it as most other people. – user74809 Nov 12 '18 at 13:40

Do most languages need more space than English?: https://english.stackexchange.com/questions/2998/do-most-languages-need-more-space-than-english
Speaking as a translator, I can share a few rules of thumb that are popular in our profession:
- Hebrew texts are usually shorter than their English equivalents by approximately 1/3. To a large extent, that can be attributed to cheating, what with no vowels and all.
- Spanish, Portuguese and French (I guess we can just settle on Romance) texts are longer than their English counterparts by about 1/5 to 1/4.
- Scandinavian languages are pretty much on par with English. Swedish is a tiny bit more compact.
- Whether or not Russian (and by extension, Ukrainian and Belorussian) is more compact than English is subject to heated debate, and if you ask five people, you'll be presented with six different opinions. However, everybody seems to agree that the difference is just a couple percent, be it this way or the other.

--

A point of reference from the website I maintain. The files where we store the translations have the following sizes:

English: 200k
Portuguese: 208k
Spanish: 209k
German: 219k
And the translations are out of date. That is, there are strings in the English file that aren't yet in the other files.

For Chinese, the situation is a bit different because the character encoding comes into play. Chinese text will have shorter strings, because most words are one or two characters, but each character takes 3–4 bytes (for UTF-8 encoding), so each word is 3–12 bytes long on average. So visually the text takes less space but in terms of the information exchanged it uses more space. This Language Log post suggests that if you account for the encoding and remove redundancy in the data using compression you find that English is slightly more efficient than Chinese.

Is English more efficient than Chinese after all?: https://languagelog.ldc.upenn.edu/nll/?p=93
[Executive summary: Who knows?]

This follows up on a series of earlier posts about the comparative efficiency — in terms of text size — of different languages ("One world, how many bytes?", 8/5/2005; "Comparing communication efficiency across languages", 4/4/2008; "Mailbag: comparative communication efficiency", 4/5/2008). Hinrich Schütze wrote:
pdf  study  language  foreign-lang  linguistics  pro-rata  bits  communication  efficiency  density  anglo  japan  asia  china  mediterranean  data  multi  comparison  writing  meta:reading  measure  compression  empirical  evidence-based  experiment  analysis  chart  trivia  cocktail 
february 2019 by nhaliday

« earlier    

related tags

academia  adversarial  advice  africa  age-generation  age-of-discovery  aging  albion  alien-character  allodium  alt-inst  american-nations  analogy  analysis  analytical-holistic  anglosphere  announcement  anthropology  antidemos  antiquity  aphorism  archaeology  architecture  aristos  arms  arrows  article  asia  assimilation  attaq  audio  authoritarianism  axioms  backup  battle  bayesian  best-practices  big-peeps  big-picture  bio  biodet  bits  blog  books  borjas  bounded-cognition  branches  britain  british  broad-econ  building  business  calculator  canada  canon  causation  censorship  chart  cheatsheet  checklists  china  christianity  civic  civil-liberty  civilization  clarity  class  classic  cliometrics  coalitions  cocktail  cog-psych  cohesion  cold-war  colonialism  commentary  communication  community  comparison  compression  confluence  confucian  confusion  conquest-empire  context  contracts  contradiction  contrarianism  cool  coordination  correctness  correlation  cost-benefit  courage  creative  crime  criminal-justice  criminology  critique  crooked  cs  cultural-dynamics  culture-war  culture  curiosity  current-events  dan-luu  data  database  dataset  death  debate  deep-learning  defense  definite-planning  definition  democracy  demographics  density  developing-world  dictionary  dignity  direct-indirect  direction  dirty-hands  discovery  discussion  disease  distribution  domestication  douthatish  draft  drugs  dumb-ml  duty  dynamic  early-modern  earth  econ-metrics  economics  econotariat  education  efficiency  egalitarianism-hierarchy  einstein  elections  electromag  elite  embodied  emotion  empirical  endo-exo  endogenous-exogenous  ends-means  energy-resources  engineering  enlightenment-renaissance-restoration-reformation  entertainment  environment  environmental-effects  epistemic  error  essay  estimate  ethanol  ethics  eu  europe  events  evidence-based  evidence  examples  exit-voice  exocortex  expansionism  expectancy  experiment  expert-experience  explanans  explanation  exploratory  expression-survival  farmers-and-foragers  features  fertility  fiction  field-study  film  flexibility  fluid  flux-stasis  food  foreign-lang  foreign-policy  formal-values  forms-instances  french  frontier  gallic  games  garett-jones  gavisti  gbooks  gedanken  gender-diff  gender  gene-flow  generalization  genetics  genomics  geoengineering  geography  geopolitics  germanic  giants  gibbon  gnon  gnxp  google  gotchas  government  grad-school  graphs  great-powers  gregory-clark  group-level  growth-econ  hard-tech  hardware  hari-seldon  hastings  health  heavy-industry  higher-ed  history  hmm  homo-hetero  honor  howto  hsu  human-capital  ideas  ideology  idk  ieee  immune  incentives  india  indian  individualism-collectivism  industrial-org  inequality  info-dynamics  info-foraging  innovation  input-output  institutions  insurance  integrity  interpretation  intervention  interview  intricacy  investing  iq  iron-age  islam  isteveish  janus  japan  jargon  journos-pundits  judaism  justice  kinship  korea  kumbaya-kult  labor  language  latin-america  law  leadership  learning  left-wing  legacy  let-me-see  letters  leviathan  lexical  linguistics  links  list  literature  lived-experience  logos  lol  long-short-run  longitudinal  love-hate  machine-learning  malaise  management  maps  marginal-rev  marginal  martial  matching  meaningness  measure  measurement  mechanics  media  medieval  mediterranean  memetics  mena  mental-math  meta:reading  meta:rhetoric  meta:science  meta:war  methodology  microfoundations  migration  military  mobility  models  money  mostly-modern  multi  music  mystic  myth  n-factor  nascent-state  nationalism-globalism  natural-experiment  navigation  network-structure  new-religion  news  nibble  nihil  nitty-gritty  nlp  nordic  nuclear  obesity  objektbuch  occident  oceans  of  old-anglo  optimate  order-disorder  orders  org:anglo  org:biz  org:data  org:davos  org:edu  org:foreign  org:gov  org:junk  org:lite  org:mag  org:nat  org:ngo  org:rec  org:theos  organizing  paganism  paleocon  papers  paradox  parallax  parasites-microbiome  parsimony  path-dependence  patience  pdf  peace-violence  people  pessimism  phalanges  philosophy  phys-energy  physics  pic  piracy  poast  poetry  polanyi-marx  policy  polis  polisci  political-econ  politics  poll  pop-diff  pop-structure  population-genetics  population  postmortem  pre-ww2  prediction  preprint  presentation  pro-rata  programming  project  protestant-catholic  protocol  psychiatry  psychology  psychometrics  public-goodish  publishing  q-n-a  qra  questions  quixotic  quotes  race  randy-ayndy  ranking  rant  ratty  realness  realpolitik  reason  recent-selection  recommendations  recruiting  reddit  redistribution  reference  reflection  regional-scatter-plots  regularizer  regulation  religion  responsibility  retention  review  revolution  rhetoric  rhythm  right-wing  ritual  rock  roots  rot  russia  s-factor  sanctity-degradation  sapiens  saxon  scale  schelling  science  scitariat  search  selection  sequential  shakespeare  similarity  simulation  sinosphere  sky  social-capital  social-psych  social-science  social-structure  social  sociality  society  sociology  software  space  spatial  speaking  spearhead  speculation  spreading  spring-2019  ssc  stackex  startups  statesmen  stats  status  stereotypes  stock-flow  stories  strategy  straussian  stream  study  studying  stylized-facts  sulla  summary  survey  sv  syntax  system-design  tactics  taxes  tech  technical-writing  technology  techtariat  telos-atelos  temperance  terrorism  the-bones  the-classics  the-devil  the-founding  the-great-west-whale  the-south  the-trenches  the-world-is-just-atoms  theos  thermo  thesis  thucydides  time-preference  time-series  time  tip-of-tongue  tocqueville  tools  top-n  tower  traces  track-record  trade  trees  trends  tribalism  tricks  trivia  troll  trump  trust  truth  tv  twitter  unaffiliated  unintended-consequences  universalism-particularism  us-them  usa  vague  values  vampire-squid  video  virtu  visualization  visuo  vitality  vulgar  walls  war  water  wealth-of-nations  wealth  weird  welfare-state  west-hunter  westminster  white-paper  wiki  within-group  wonkish  words  world-war  world  writing  yvain  zeitgeist  🌞  🎩  🔬 

Copy this bookmark:



description:


tags: