Monday, 29 May 2017

How many wine-quality scales are there?

There are a number of ratings systems for describing wine quality, which use 100 points, 20 points, 5 stars, 3 glasses, etc. Unfortunately, there is usually no "gold standard" for these systems, and so no two wine commentators use these systems in quite the same way.

That is, when critics differ in their wine scores for a particular wine, it can be for one of two reasons: (i) their opinions on the wine's quality differ, or (ii) they are expressing their opinion using different numbers. That is, when the critics produce the same score, they may or may not be assessing the wine as having the same quality, and similarly when they produce different scores. Each critic has their own personal version of the "100-point scale" or the "20-point scale".


This situation is similar to people speaking different languages. Simply looking at a word does not necessarily tell you what language is being used, because the same combination of letters can occur in different languages, with or without the same meaning. For example, the word "December" appears in both Swedish and English, and in this case it has the same meaning in both languages. However, the word "sex" also appears in both languages, but in Swedish it usually refers to the number 6, which is not necessarily related to any of the word's possible meanings in English.

So, if the Wine Spectator gives a wine 90 points, does that mean the same thing as when the Wine Advocate gives that same wine 90 points? Probably not. Just for variety, instead of using the 100-point scale to illustrate this topic, I will use the 20-point scale for wine quality — this emphasizes the need to translate the ratings systems to a common one.

20-point ratings systems

Many American wine drinkers are familiar with the 20-point scale developed in the 1950s by Maynard Amerine and his colleagues at the University of California, Davis, intended as a teaching tool for identifying faulty wines. This was, indeed, an attempt to produce a "gold standard" wine rating system. Each organoleptic characteristic of the wine is assigned a number of points based on its perceived quality, and these points are summed to produce the final score. In both theory and practice, everyone who uses the UCDavis scale should be "speaking the same language"; and therefore any differences in wine scores should represent differences in wine quality, not differences in language.

Sadly, not everyone has agreed with or used the UCDavis scale, especially as a general tool for wine tastings; this topic is discussed in detail in recommended books such as those by Clive S. Michelsen (Tasting and Grading Wine. 2005) and Andrew Sharp (Winetaster's Secrets. 2005). So, there are innumerable 20-point scales in use around the world, and they all seem to represent different languages. To illustrate the range of scales in use, we can compare the scores given to the same wines by different critics.

In order to standardize the scales for direct comparison, we need to translate the different languages into a common language. Jean-Marie Cardebat and Emmanuel Paroissien (American Association of Wine Economists Working Paper No. 180. 2015) have suggested doing this by converting the different scales to a single 100-point scale. The one they chose was the scale used by the Wine Advocate (which is not necessarily the same as that used by the Wine Spectator, or the Wine Enthusiast, etc), and I will do the same here. Furthermore, I will compare the quality scales based on their scores for the five First Growth red wines of the Left Bank of Bordeaux (as described in the post How large is between-critic variation in quality scores?).

The scales for five different commentators are shown in the first graph. The original scores are shown on the horizontal axis, while the standardized score is shown vertically. The vertical axis represents the score that the Wine Advocate would give a wine of the same quality. If the critics were all speaking the same language to express their opinions about wine quality, then the lines would be sitting on top of each other; and the further apart they are, the more different are the languages.

Five different 20-point wine-quality ratings systems

Also shown is the difference in meaning for a wine that gets a score of 18 from each of the critics. If we see a wine score of 18, then La Revue du Vin de France, Jean-Marc Quarin and Bettane et Desseauve mean a somewhat better wine than does Jancis Robinson. On the other hand, Vinum Weinmagazin is indicating a somewhat worse wine. They are, indeed, all speaking different languages; and we readers need to translate between these languages in order to get their meaning.

As another example, at the end of June 2012 Decanter magazine changed from using a 20-point ratings scale to a 100-point scale (see New Decanter panel tasting system). In order to do this, they had to convert their old scores to the new scores. They used a conversion that is precisely halfway between the scoring systems of Jancis Robinson and Bettane & Desseauve, as shown in the next graph (see How to convert Decanter wine scores and ratings to and from the 100 point scale). So, this is yet another different 20-point language.

Seven different 20-point wine-quality ratings systems

So far, I have assumed that there is a linear relationship between the scores from the different critics (ie. the graph lines are straight). However, in an earlier post (Two centuries of Bordeaux vintages) I suggested that the relationship between the Bordeaux scores from Tastet & Lawton and from Jeff Leve (the Wine Cellar Insider) is curved, instead. Indeed, The World of Fine Wine magazine explicitly indicates that their 20-point scoring system is non-linear, as shown in the second graph above. This makes for a very complex language translation, indeed.

As we shall see in the next post, translating between 20- and 100-point scales is not straightforward, either.

Conclusions

The short answer to the question posed in the title is: pretty much one for each commentator. Fortunately, there are not quite as many wine-quality rating systems as there are languages. Nevertheless, the idea of translating among them is just as necessary in both cases, if we are to get any meaning.

Does all of this matter in practice? Quite definitely. Indeed, every time a wine retailer plies us with a combination of critics' scores, we have to translate those scores into a common language, in order to work out whether the critics are agreeing with each other or not. Since most of us are not doing this, we may well be fooling ourselves into seeing a false sense of agreement among those critics. The world of fine wine is more complex than most people realize, or would like.

Furthermore, this issue is at the heart of the objections that mathematicians have to simply averaging wine scores across different critics. If the critics are all using different ratings scales, then the average score has no mathematical meaning. That is, if the critics are speaking different languages, then what would the "average" of those languages mean? It would be gibberish, unintelligible to anyone, even if the combination of letters looks like it might be a real word. A classic example of this is the Judgment of Paris, from 1976, in which the "official" summed scores are meaningless, because the tasters were all using different versions of the 20-point scale (see A Mathematical Analysis of The Judgment of Paris). Note also, that the scores using the UCDavis scale are much higher than are the scores for the Judgment (see Was the Judgment of Paris repeatable?).