Paper Rank: A Controversy on Academia

Share Embed


Descrição do Produto

Paper Rank: A Controversy on Academia? From the beginning Academia.edu has offered various metrics about its users and the material they post. The one that comes closest to measuring an author’s ability, or badge of honour, or egostroking, is the 2% up markers for how many times an author or paper has been viewed. These are really just a bit of fun, a feed-back mechanism that keeps people engaged with the site. How do you ‘game’ these ratings – simple, engage with the site on a routine basis, post your work regularly, etc. However, recently, the site has been trialling a system of metrics based upon recommendations. At the moment you need to apply to get this power (I have not), but people who have it can recommend any paper (mine have been). The frontman on the team for this is Zachary Foster (https://team.academia.edu/ZacharyFoster) who recently opened up a position paper on the subject to comment (https://www.academia.edu/s/657715fb2b/academiaedu-launches-recommendations). This suggested several things; that Academia.edu believe there are serious systemic flaws in the assessment of research worldwide, that they think they are in a position to fix that, and that they intend to do it by introducing a metric. The vast bulk of reaction was very negative. Academia users, mostly in the humanities, felt any attempt to define quality by a simple number was a bad thing, and were angered by a lot of the defences which seem to be predicated on assumptions about the sciences, particularly the medical sciences. So I thought I would spend an afternoon looking into this problem in more detail. This note is not research – there’s a big literature touching on this I am entirely unfamiliar with. It probably does not even qualify as journalism and might be better thought of as editorializing – you have been warned1 . How does it work http://support.academia.edu/customer/en/portal/articles/2201342-what-are-authorrank-and-paperrank-

The system is admirably simple. Everyone has an author rank. This defaults to 1 if you have received no recommendations for your papers. When you recommend some-one else’s paper that paper receives a paper rank. A paper’s rank is the square root of the sum of the author ranks of all its recommenders. If four people with an author rank of 1 recommend a paper, the sum of those is 4 and the paper receives a rank of 2. Once an author has received recommendations their author rank is calculated from the ranks of their papers, as the square root of the sum of the paper ranks. So if you have three papers with ranks of 3, a total of 9, your author rank would also be 3.

1

I have included screen shots with snippets of people’s responses in the comment section of the position paper. I have not asked permission – if you took part in a public discussion you are fair game – but I have refrained from dissecting single comments which often represent off the cuff remarks. I’ve also drawn public data from people’s academia pages. These are illustrative – do not make judgements unless you know a lot more context than I do.

What is the objective? Well the stated objectives involved addressing things like the reproducibility crisis (the claim that most submissions to journals specializing in experimental sciences actually contain false conclusions) or artificial scarcity (that prestigious journals turn down sound work to keep their acceptance rates, another sort of metric, low). Now these are things people are concerned about but rating papers does not really address either of them, and their importance and validity are debatable issues, which in many ways are not applicable to the humanities. The other slightly more implicit objective is to assist readers in finding good work in each field. It is not clear how the metric would be used in practice for this but the principle is obvious. The fairly explicit claim is that this metric is better than other metrics in allowing a reader to distinguish good papers from bad – such as citation counts, an author’s institution, the publishing journal, skimming the abstract etc. No actual evidence is offered in the position paper as to why this particular metric would be better than any other, or why it will work at all. Establishing that would require some actual research. Which brings us on to the biggest criticism. That a number assigned to papers or authors to assess their quality, can be abused to make bad papers (and authors) look good, or can contain systematic biases in the way it is generated to encourage bad practice or make good authors/papers look bad. Transparency This is a central claim made in Foster’s position paper that transparency will stop abuses. As you can see who recommends you can quickly spot bad behaviour. I have included Foster’s comments in the subsequent discussions so you can read them yourself. Notice two elements. One is he thinks the maths behind the system is hard to understand. He comes from a humanities field so this is not surprising but the maths is actually very simple, as I explained above. Some implementation bits (how you find the limit when two people are mutually recommending each other) must be a bit more complicated but are not relevant to understanding it. However, his response strongly implies that he is not adequately numerate to understand the mathematical implications of the system he is promoting. That should clearly be a worry. The second point is that he contrasts checking for abuses in the recommendation system with checking the citation system used in some scientific fields. Apparently checking citations is cumbersome and slow. By contrast we are led to believe checking recommendations is easy and fast. Well, I have done so, and will discuss that in a moment below. To check an author rank, you

need to scroll through the author’s papers looking for paper ranks visually (it’s a small number next to an icon for a bar chart). The author rank must be the root of the sum of the paper ranks so it is usually clear once you have found them all. Then for each one you need to click through to the paper. A small icon appears at the top for each recommender. If you click on that it lists who they are and their author ranks (which is their contribution). You can then go through to their academia page and repeat the process to understand how their author rank was arrived at. You can turn this into a graph which allows you to visualize it. I did this for a few hours in a morning (18 September 2016) to generate a graph which I will discuss in a moment. It’s not fast, it is cumbersome, it is a lot like chasing citations. It is transparent only in a theoretical sense. Unless the system is accompanied by a visualisation tool then there is no way it could be described as transparent. And even if it is, as we shall see in a moment, it’s not at all obvious whether something is abuse or not. In other words the system remains very open to abuses. An example network of recommendations So whose paper did I pick on? Well Richard Price was one of the more hawkish supporters of the system, and has in fact posted two papers promoting Academia.edu so he seemed like the logical choice. The graph below shows how one of his papers, ranked 3.2, arrives at its rating. It’s not a complete graph, I did pursue onwards to Price and his co-authors, and I stopped when I hit Ahmed S. Yahya. I have encountered Yahya’s surprisingly high author rank before (he is part of a cluster of very high author ranks).

What do we learn from this chart? Well, quite a few things. If you write a paper that a lot of people are interested in it will get a higher rating. There are eight recommenders for Price’s paper (which discusses metrics). That is the largest number I have seen for a paper in my limited viewing of Academia.edu papers. That gives Price himself a relatively unimpressive 1.8 author rating. What is the second large cluster on the right? I’ve blown this up on the final page so you can take a closer look at it. The recommenders for Prices paper are mostly people who have not received

recommendations. The little cluster at the top is a researcher who works on dreams who has received two. The exception is John Edgar Browning, he and the people he is clustered with work on literary/culture studies around horror, except Joseph Carroll at the bottom who is also in literature but works on an ‘evolutionary’ theory that Matthias Classen applies to monster movies and zombies. Classen and Carroll have much higher author rankings (3.4 and 4.1) than Price (1.8) but they have fewer people recommending all of their work than Price has recommending just one paper. How? Well Carroll recommends five of Classen’s papers and Classen recommends 6 of Carroll’s. This boosts both of their author ratings considerably. Is this an abuse? Well, both have written quite a few papers, Classen is clearly using Carroll’s theory and therefore endorsing it, maybe they genuinely do think each other’s papers are good. The problem is, it is exactly the pattern one would expect to find with abuse. Will Academia.edu act on this, sending a polite reprimand to Classen and Carroll? I doubt it, I’m pretty sure they should not. The graph raises an additional problem, which is calibration. How do I know what a good ranking is? A 2 author rating in the area Price is in seems to indicate quite a good author, but in metrics a paper should be a 3+. In evolutionary literature ratings are inflated and a 4+ would mark a good author, but because small numbers of people are tagging a lot of papers (rather than a lot tagging a few) a 2+ is probably a good paper, except for Carroll’s paper on conflict which is probably a bad paper and its 3.6 rating needs to be ignored because it got a single recommendation from another field where 10+ author ratings are normal. And that last is the most extreme, that 3.6 paper by Caroll should be considered a worse paper than Price’s 3.2 paper, but how would any user know it without tracing this graph out? It is hard to imagine how a rating system that behaves like this could possibly be navigated even by relatively savvy users. Could this be fixed? Building a good rating system is hard. It does become easier the more qualified people you have contributing to each rating (the so-called ‘wisdom of crowds’). So something like Google’s restaurant ratings are actually quite reliable at helping you pick a good restaurant. However the current paper-rank/author rank system is clearly not. It suffers from three significant problems: Imprecision. Much of what it measures is not what you are interested in. Author rank will respond to how many papers you have written, your popularity, etc. and paper ranks to the type of paper (cross-disciplinary surveys will score higher than pure research or negative results). Abuse. It is very easy to abuse the present system, and largely undetectable. In the graph I presented two authors rankings were inflated by the way their recommendations were arranged but it is impossible to know if the two authors were trying

to do that or just have a mutual respect for each other’s work. Calibration. The systems creates very different rankings depending on behaviour in small fields and the mathematics stop contagion of one clusters rankings moving very far into other fields. This makes working out what is a good rating or a bad rating very difficult. So assuming that you really did want to pursue a metric, and the signs are that some people in Academia.edu are wedded to the idea, can you make better one than the present proposal? Here are a few approaches to changing this from a terrible metric to a not very good metric:

The Sticking Plaster. If paper rank was calculated by summing the root of recommending author ranks divided by the number of papers of the same author the recommender has recommended, this would remove several of the problems identified above. Setting the default rank below 1 would probably also help the system discern the quality of recommenders. The sticking plaster approach is how, for example, Google operates with its search engines. Its recommendation system was a powerful new tool when it was introduced but so easily open to abuse that it has had to commit to constant updates and changes to what is now a secret algorithm. Unfortunately that requires a dedicated effort and would destroy any transparency. The 1-5 rating. Simply have recommenders rate a paper 1-5. This would solve the calibration problem, though it depends on large numbers of people ranking the same thing. Some systems like this also incorporate ‘dummy’ votes, so that papers only acquire higher rankings once they have received a lot of recommendations. Since at least some commenters suggested boycotting the ranking system when it was opened up to all users it is doubtful the critical mass needed to make this work would appear.

Negative ratings. A lot of the justifications (reproducibility crisis, the poor quality of other metrics) suggest a belief that many papers are actually bad research and need to be weeded out to find the quality. A negative system that allowed people to flag poor quality work, highlight flaws in reasoning, etc, could be seriously considered. This does not serve Academia.edu business case and is contrary to a lot of the ethos of the site so probably will not be considered. Do some research. This is probably the most important. Get some statisticians in who’ve done work with ELO and current metrics. Encourage sociologists/anthropologists familiar with coding excercises to look at how systems operate in practice. Involve people who know about human behaviour (micor-economists, psychologists, historians) to do some serious thinking on the metric. Do not bas e your decision on some random editorialising by people with no expertise (like me). It would be reassuring if the Academia.edu team prepared a survey of work on metrics (that literature is out there). This would not take the heat out of the current debate but it might elevate how informed it was.

How bad is this? Academia.edu will want to push author rank and paper rank. They want them to be important, for hiring and promotion and evaluation, because if they are the site can monetize that importance. In other words commercial logic dictates that once you have a system like this you will lobby and push for it to be used. That is an issue because the system being proposed will not work. Much of the opposition expressed in the comments to this position paper centred on how dangerous such a number is. But it ignored that this particular implementation is about as bad a measurement as you could possibly imagine. I am nearly certain (though I have no evidence) that simply rating papers 1-5 like a restaurant review would produce results that more accurately reflected the quality.

Did we learn anything in this exchange? Yes, it seems the idea is deeply unpopular amongst those people in the humanities who care enough to respond, and that Academia.edu will carry on regardless. A metric seems inevitable, so it is important to recognise this is a bad one and press for it to at least be better researched. On a criticism the site may listen to we did also learn that the comments function for papers works very badly once you get any significant number of people involved.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.