Thursday, August 8, 2013

The Awful Chemical Language

I was often asked to explain chemical nomenclature in the context of such and such intellectual property law matter and one day I surprised a trial lawyer — an elderly gent — with my knowledge. He was actually annoyed at first, perhaps because he felt hostage to knowledge which he did not possess. Actually, he probably just resented that I could bill time for knowledge which I already possessed — just like he could. Had he known what it had cost me to acquire my art, he would also have known that it would break any law firm to buy it. Meanwhile, I had been hard at work learning legal terminology for several weeks, and although I had made good progress, it had been accomplished under great difficulty and annoyance. But he was greatly impressed and after I had explained a while, he said my explanation of the chemical language was very rare, possibly "unique" and he wanted to add me to his litigation team.

Friedrich Wöhler, the father of modern organic chemistry, already remarked in 1835:
Organic chemistry just now is enough to drive one mad. It gives me the impression of a primeval forest full of the most remarkable things, a monstrous and boundless thicket, with no way of escape, into which one may well dread to enter.
A person who has not studied chemistry — especially organic chemistry — can form no idea of what a perplexing language describes that thicket. Or perhaps they can, but can come up with no logical explanation for why things are so. I aim here to simplify.

The Germans invented modern organic chemistry and they logically fashioned the nomenclature in their own image — and just as the German language is troublesome for the beginner — having so many parts of speech — it's no wonder organic nomenclature is so troublesome.

An average organic chemical name is a sublime and impressive curiosity; it may occupy several lines and comprise several unfamiliar names and numbers — things called moieties — and even Greek letters; it is built mainly of compound words synthesized by the writer around a core or parent name; it's quite often a word not to be found in any normal dictionary — several words compacted into one, but with joints and seams — that is, with hyphens; it may treat of up to umpteen different subunits, each enclosed in a parenthesis of its own, with here and there extra parentheses which reinclose three or four of the minor parentheses, making pens within pens: finally, all the parentheses and re-parentheses are massed together between a couple of king-parentheses, one of which is placed in the first line of the majestic structure and the other in the middle of the last line of it — after which comes the parent compound name, and you find out for the first time what the molecule is or at least what some chemical lexicographer thought it should be derived from. Sometimes, often as an afterthought — merely by way of differentiation — the writer shovels in the name of a salt, or in patent parlance "or salts thereof," signifying that the delicate molecular flower has been preserved as a salt, and the monument is finished. I suppose that this closing hurrah is often the doing of patent attorneys seeking to claim more broadly; it's not necessary, but covers the doctrine of equivalents.

To repeat, organic chemistry nomenclature was invented by 19th century Germans who wanted to create a simple system which closely mimicked the logic of their own language. Full stop. Therein lies the secret why that nomenclature is so seemingly obtuse — it is patterned after German syntax. In linguistics, syntax refers to the way in which morphemes are arranged. By analogy, "chemical morphemes" are irreducible units of metaphor — core words like "meth-," eth-," "prop-," and "but-" and ringed ones like "phen-" or "benz-" represent chemical entities. [1]  The studied reader may already recognize these morphemes in methane, ethane, propane, butane, phenyl, and benzene and the like. The endings "ane," "yl," and "ene" are, in a linguistic sense, inflections of the morphemes. Another word related to morphemes commonly used by chemists is moiety. Moiety refers to small clusters of recognizable function, for example, "acyl," carboxyl," "alkyl," etc.

By way of example, consider the common pain reliever ibuprofen which more properly goes by the name
RS-2-(4-(2-methylpropyl)phenyl)propanoic acid.

Chemical names are easier to read when you hold them next to the actual structure which is like a pictograph (hold that thought for later) or read them backwards in a mirror or stand on your head — so as to reverse the construction -- but because many refuse to learn the real language of chemistry — structural short hand — I'll muddle through the name of ibuprofen by way of example.  It's not a particularly elaborate molecule or name, but it strikes a nice balance between complexity and simplicity.

R,S-2-(4-(2-methylpropyl)phenyl)propanoic acid

First comes "R,S." The "R" stands for rectus (Latin for right) and the "S" stands for sinister (Latin for left). This gives the enlightened reader notice that chirality is at hand — more on this later.

R,S-2-(4-(2-methylpropyl)phenyl)propanoic acid

The second element in the name is the number 2, and because this number stands alone — outside of parentheses — the reader is asked to hold its meaning in abeyance until such a time as the parent morpheme is finally reached after much exhaustion of patience. Putting the "2" in front resembles the dreaded separable prefix verbs so common to German. Mark Twain wrote in his delightful essay, The Awful German Language:
The Germans have another kind of parenthesis, which they make by splitting a verb in two and putting half of it at the beginning of an exciting chapter and the other half at the end of it. Can any one conceive of anything more confusing than that? These things are called "separable verbs." The German grammar is blistered all over with separable verbs; and the wider the two portions of one of them are spread apart, the better the author of the crime is pleased with his performance.
R,S-2-(4-(2-methylpropyl)phenyl)propanoic acid

The third element, (4-(2-methylpropyl)phenyl) is a microcosm of the whole name writ larger; in it we have a 2-methylpropyl corralled by parentheses, which is itself corralled by "4-" and "phenyl." The name is starting to look like a matryoshka doll.

R,S-2-(4-(2-methylpropyl)phenyl)propanoic acid

At long last we arrive at the parent morpheme, which like the verb in a German sentence, tells us the key information: propanoic acid. In the lexicographer's mind, ibuprofen is a derivative of propanoic acid.

We have the germanic parenthesis disease in our language, too; also often expressed with em dashes and sometimes elipses and one may see cases of it every day in our books and blog posts: but with us it is — unless botched — the mark and sign of a practiced writer or a clear intellect, whereas with the Germans and chemical lexicographers it is doubtless the mark and sign of a practiced pen and of the presence of that sort of luminous intellectual fog which stands for clearness among these people. For surely it is not clearness — it necessarily can't be clearness.

Now dear reader, allow me to introduce a better way to depict all the foregoing and to illustrate the  foolishness:
Ibuprofen
R,S-2-(4-(2-methylpropyl)phenyl)propanoic acid

I have color-coded the three main parts of the molecule, both in name and in the depiction. The reader immediately grasps that the red propanoic acid portion has a three carbon chain. The red number "2" in the name describes wherefrom the rest depends. The "R,S" refers to the two possible ways that the invisible hydrogen atom attached to carbon 2 may point: either out of or into to the screen or page. The portion circled in light blue is a phenyl moiety having six carbons numbered as shown. The curious reader can attest that the portion in green indeed appends from carbon 4 of the blue phenyl. The left-most portion — circled in green — is the "2-methylpropyl" portion: it's really a 3-carbon propyl chain having a methyl affixed to carbon 2.

Lastly, it is perhaps now apparent (to me at least) where the trivial name ibuprofen comes from: I parse ibuprofen into three separable pieces: ibu/pro/fen

"ibu" is short for "isobutyl (another name for 2-methylpropyl;"
"pro" is short for "propanoic acid;"
"fen" stands for "phenyl."

Have you got a headache yet?
________________________
Suggested further reading:

[1] An Algorithm For Translating Chemical Names To Molecular Formulas
[2] Development Of Systematic Names For The Simple Alkanes

4 comments:

  1. Replies
    1. Take two doses of 2-acetoxybenzoic acid and call me in the morning.

      Delete
  2. I though of transformers (the children's cartoon, later made into movies) while reading it.

    But I chickened out making that comment because of the obvious comic repercussions.

    ReplyDelete

Titus and Titus-like comments will be shot on sight.