Why is Google Translate so bad for Latin? A longish answer. : latin
> All it does its correlate sequences of up to five consecutive words in texts that have been manually translated into two or more languages.
That sort of system ought to be perfect for a dead language, though. Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.

We're not exactly inundated with brand new Latin to translate.
> Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.
What makes you think that the Google folks haven't done so and used that to create the language models they use?
> That sort of system ought to be perfect for a dead language, though.
Perhaps. But it will be bad at translating novel English sentences to Latin.
18 days ago by nhaliday
Hardware is unforgiving
Today, anyone with a CS 101 background can take Geoffrey Hinton's course on neural networks and deep learning, and start applying state of the art machine learning techniques in production within a couple months. In software land, you can fix minor bugs in real time. If it takes a whole day to run your regression test suite, you consider yourself lucky because it means you're in one of the few environments that takes testing seriously. If the architecture is fundamentally flawed, you pull out your copy of Feathers' “Working Effectively with Legacy Code” and you apply minor fixes until you're done.

This isn't to say that software isn't hard, it's just a different kind of hard: the sort of hard that can be attacked with genius and perseverance, even without experience. But, if you want to build a ship, and you "only" have a decade of experience with carpentry, milling, metalworking, etc., well, good luck. You're going to need it. With a large ship, “minor” fixes can take days or weeks, and a fundamental flaw means that your ship sinks and you've lost half a year of work and tens of millions of dollars. By the time you get to something with the complexity of a modern high-performance microprocessor, a minor bug discovered in production costs three months and five million dollars. A fundamental flaw in the architecture will cost you five years and hundreds of millions of dollars2.

Physical mistakes are costly. There's no undo and editing isn't simply a matter of pressing some keys; changes consume real, physical resources. You need enough wisdom and experience to avoid common mistakes entirely – especially the ones that can't be fixed.
4 weeks ago by nhaliday
List of languages by total number of speakers - Wikipedia
- has both L1 (native speakers) and L2 (second-language speakers)
- I'm guessing most of Mandarin's L2 speakers are Chinese natives. Lots of dialects and such (Cantonese) within the country.
march 2019 by nhaliday
ellipsis - Why is the subject omitted in sentences like "Thought you'd never ask"? - English Language & Usage Stack Exchange
This is due to a phenomenon that occurs in intimate conversational spoken English called "Conversational Deletion". It was discussed and exemplified quite thoroughly in a 1974 PhD dissertation in linguistics at the University of Michigan that I had the honor of directing.

Thrasher, Randolph H. Jr. 1974. Shouldn't Ignore These Strings: A Study of Conversational Deletion, Ph.D. Dissertation, Linguistics, University of Michigan, Ann Arbor


"The phenomenon can be viewed as erosion of the beginning of sentences, deleting (some, but not all) articles, dummies, auxiliaries, possessives, conditional if, and [most relevantly for this discussion -jl] subject pronouns. But it only erodes up to a point, and only in some cases.

"Whatever is exposed (in sentence initial position) can be swept away. If erosion of the first element exposes another vulnerable element, this too may be eroded. The process continues until a hard (non-vulnerable) element is encountered." [ibidem p.9]
march 2019 by nhaliday
Verbal Edge: Borges & Buckley | Eamonn Fitzgerald: Rainy Day
At one point, Borges said that he found English “a far finer language” than Spanish and Buckley asked “Why?”

Borges: There are many reasons. Firstly, English is both a Germanic and a Latin language, those two registers.


And then there is another reason. And the reason is that I think that of all languages, English is the most physical. You can, for example, say “He loomed over.” You can’t very well say that in Spanish.

Buckley: Asomo?
Borges: No; they’re not exactly the same. And then, in English, you can do almost anything with verbs and prepositions. For example, to “laugh off,” to “dream away.” Those things can’t be said in Spanish.
J.L.B.: "You will say that it's easier for a Dane to study English than for a Spanish-speaking person to learn English or an Englishman Spanish; but I don't think this is true, because English is a Latin language as well as a Germanic one. At least half the English vocabulary is Latin. Remember that in English there are two words for every idea: one Saxon and one Latin. You can say 'Holy Ghost' or 'Holy Spirit,' 'sacred' or 'holy.' There's always a slight difference, but one that's very important for poetry, the difference between 'dark' and 'obscure' for instance, or 'regal' and 'kingly,' or 'fraternal' and 'brotherly.' In the English language almost al words representing abstract ideas come from Latin, and those for concrete ideas from Saxon, but there aren't so many concrete ideas." (P. 71) [2]

In his own words, then, Borges was fascinated by Old English and Old Norse.
february 2019 by nhaliday
Language Log » English or Mandarin as the World Language?
- writing system frequently mentioned as barrier
- also imprecision of Chinese might hurt its use for technical writing
- most predicting it won't (but English might be replaced by absence of lingua franca per Nicholas Ostler)
february 2019 by nhaliday
A cross-language perspective on speech information rate
Figure 2.

English (IREN = 1.08) shows a higher Information Rate than Vietnamese (IRVI = 1). On the contrary, Japanese exhibits the lowest IRL value of the sample. Moreover, one can observe that several languages may reach very close IRL with different encoding strategies: Spanish is characterized by a fast rate of low-density syllables while Mandarin exhibits a 34% slower syllabic rate with syllables ‘denser’ by a factor of 49%. Finally, their Information Rates differ only by 4%.

Is spoken English more efficient than other languages?:
As a translator, I can assure you that English is no more efficient than other languages.
[some comments on a different answer:]
Russian, when spoken, is somewhat less efficient than English, and that is for sure. No one who has ever worked as an interpreter can deny it. You can convey somewhat more information in English than in Russian within an hour. The English language is not constrained by the rigid case and gender systems of the Russian language, which somewhat reduce the information density of the Russian language. The rules of the Russian language force the speaker to incorporate sometimes unnecessary details in his speech, which can be problematic for interpreters – user74809 Nov 12 '18 at 12:48
But in writing, though, I do think that Russian is somewhat superior. However, when it comes to common daily speech, I do not think that anyone can claim that English is less efficient than Russian. As a matter of fact, I also find Russian to be somewhat more mentally taxing than English when interpreting. I mean, anyone who has lived in the world of Russian and then moved to the world of English is certain to notice that English is somewhat more efficient in everyday life. It is not a night-and-day difference, but it is certainly noticeable. – user74809 Nov 12 '18 at 13:01
By the way, I am not knocking Russian. I love Russian, it is my mother tongue and the only language, in which I sound like a native speaker. I mean, I still have a pretty thick Russian accent. I am not losing it anytime soon, if ever. But like I said, living in both worlds, the Moscow world and the Washington D.C. world, I do notice that English is objectively more efficient, even if I am myself not as efficient in it as most other people. – user74809 Nov 12 '18 at 13:40

Do most languages need more space than English?:
Speaking as a translator, I can share a few rules of thumb that are popular in our profession:
- Hebrew texts are usually shorter than their English equivalents by approximately 1/3. To a large extent, that can be attributed to cheating, what with no vowels and all.
- Spanish, Portuguese and French (I guess we can just settle on Romance) texts are longer than their English counterparts by about 1/5 to 1/4.
- Scandinavian languages are pretty much on par with English. Swedish is a tiny bit more compact.
- Whether or not Russian (and by extension, Ukrainian and Belorussian) is more compact than English is subject to heated debate, and if you ask five people, you'll be presented with six different opinions. However, everybody seems to agree that the difference is just a couple percent, be it this way or the other.


A point of reference from the website I maintain. The files where we store the translations have the following sizes:

English: 200k
Portuguese: 208k
Spanish: 209k
German: 219k
And the translations are out of date. That is, there are strings in the English file that aren't yet in the other files.

For Chinese, the situation is a bit different because the character encoding comes into play. Chinese text will have shorter strings, because most words are one or two characters, but each character takes 3–4 bytes (for UTF-8 encoding), so each word is 3–12 bytes long on average. So visually the text takes less space but in terms of the information exchanged it uses more space. This Language Log post suggests that if you account for the encoding and remove redundancy in the data using compression you find that English is slightly more efficient than Chinese.

Is English more efficient than Chinese after all?:
[Executive summary: Who knows?]

This follows up on a series of earlier posts about the comparative efficiency — in terms of text size — of different languages ("One world, how many bytes?", 8/5/2005; "Comparing communication efficiency across languages", 4/4/2008; "Mailbag: comparative communication efficiency", 4/5/2008). Hinrich Schütze wrote:
february 2019 by nhaliday

