My love/hate relationship with impact metrics.

Academic impact metrics fascinate me. They always have. I’m the kind of person that loves to self-reflect in quantitative ways – to chart my own progress over time, and with NUMBERS. That go UP. It’s why I’ve been a Fitbit addict for five years. And it’s why I’ve joined endless academic networks that calculate various impact metrics and show me how they go UP over time. I love it. It’s satisfying.

Checkbox - Graph Going Up Icon PNG Image | Transparent PNG Free Download on  SeekPNG
Image from SeakPNG

But as with anything one tends to fangirl over, early on I started picking holes in the details. Some of the metrics overlook key papers of mine for no apparent reason. Almost all value citations above all else – and citations themselves are problematic to say the least.

Journal impact factor is a great example of a problematic and overly relied upon metric. I am currently teaching our MSc students about this, and I found some useful graphs from Nature that show exactly why (which you can read about here) – from to variations across disciplines & times, outlier effects and impact factor inflation, all of which were no surprise, to an over reliance on front matter – which was new to me!

There are problems.

They are noteworthy.

But we still use impact factor religiously regardless.

My husband used to run committee meetings for a funding body, where he would sometimes have to remind the members & peer reviewers that they should not take journal impact factor into account when assessing publication record in relation to researcher track record, as per the San Francisco declaration https://sfdora.org/read/. Naturally, these reminders would often be ignored.

There’s a bit of a false sense of security around ‘high impact’ journals. That feeling of surely this has been so thoroughly and rigorously peer reviewed that it MUST be true. But sadly this is not the case. Some recent articles published in very high impact journals (New England Journal of Medicine, Nature, Lancet) were retracted, having been found to include fabricated research or unethical research. These can be read about at the following links:

1. “New England Journal of Medicine reviews controversial stent study”: https://www.bmj.com/content/368/bmj.m878

2. “Two retractions highlight long-standing issues of trust and sloppiness that must be addressed”: https://www.nature.com/news/stap-retracted-1.15488

3. “Retraction—Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis”: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)31324-6/fulltext

Individual metrics such as H-index also typically rely on citations. An author’s H index is calculated as the number of papers (H) that have been cited at least H times. For example a researcher who has at least 4 papers that have each been cited at least 4 times, has a H index of 4. This researcher may have many more publications – but the rest have not been cited at least 4 times. Equally, this researcher may have one paper that has been cited 200 times – but their H index remains 4. The way in which the H index is calculated attempts to correct for unusually highly cited articles, such as the example given above, reducing the effects of outliers.

The H index is quite a useful measure of how highly cited an individual researcher is across their papers. However, as with impact factor – it is a metric based on citations, and citations do not necessarily imply quality or impact.

Another key limitation is that H index does not take into account authorship position. Depending on the field, the first author may have carried out the majority of the work, and written the majority of the manuscript – but the seventeenth author on a fifty author paper will get the same benefit from that paper to their own personal H index. In some studies hundreds of authors are listed – and all will benefit equally, though some will have contributed little.

An individual’s H index will also improve over time, given it takes into account the quantity of papers they have written, and the citations on those papers – which will themselves accumulate over time. Therefore, H index correlates with age, making it difficult to compare researchers at different career stages using this metric.

Then of course there’s also the sea of unreliable metrics dreamt up by specific websites trying to inflate their own readership and authority, such as Research Gate. This is one of the most blatant, and openly gives significant extra weight to reads, downloads, recommendations and Q&A posts within its own website in the calculation of its impact metrics, ‘RG Score’, and ‘Research Impact’ – a thinly veiled advertisement for Research Gate itself.

If you’re looking for a bad metric rabbit hole to go down, please enjoy the wide range of controversy both highlighted by and surrounding Beall’s lists: https://beallslist.net/misleading-metrics/

Altmetrics represent an attempt to broaden the scope of these types of impact metrics. While most other metrics focus on citations, altmetrics include other types of indicators. This can include journal article indicators (page views, downloads, saves to social bookmarks), social media indicators (tweets, Facebook mentions), non-scholarly indicators (Wikipedia mentions) and more. While it is beneficial that altimetrics rely on more than just citations, their disadvantages include susceptibility to gaming, data sparsity, and difficulties translating the evidence into specific types of impact.

Of course, despite all of the known issues with all kinds of impact metrics, I still have profiles on Google Scholar, Research Gate, LinkedIn, Mendelay, Publons, Scopus, Loop, and God knows how many others.

I can’t help it, I like to see numbers that go up!

In an effort to fix the issues, I did make a somewhat naive attempt at designing my own personal research impact metric this summer. It took into account authorship position, as well as weighting different types of articles differently (I’ve never thought my metrics should get as much of a bump from conference proceedings or editorials as they do from original articles, for example). I used it to rank my 84 Google Scholar items from top to bottom according to this attempted ‘metric’, and see which of my personal contributions to each paper represented my most significant contributions to the field. But beyond the extra weighting I brought in, I found myself falling into the pitfall of incorporating citations, journal impact factor etc. – so it was still very far from perfect.

If you know of a better attempt out there please let me know – I’m very curious to find alternatives & maybe even make my own attempt workable!

Many thanks to Prof Kuinchi Gurusamy for discussions and examples around this topic.