Research Gravitas: A PageRank-Based Measure of Academic Influence
Jan. 14, 2026 | By Billy Wong
Introduction
Measuring the true influence of researchers and institutions – the academic leaders in a field – is a complex challenge. Traditional metrics like total citation counts or the h-index have well-known limitations. For instance, the h-index (the number of papers with ≥ h citations) tries to balance quantity and quality, but it still treats all citations equally and can be inflated by large team collaborations or self-citation tactics[1][2]. Simply counting citations or publications often fails to capture the prestige or gravitas of a scholar’s work; we’ve all seen cases where a paper’s sheer citation count doesn’t reflect its true influence in the field[3][4]. To address these issues, we introduce “Research Gravitas,” a metric that uses a PageRank-based algorithm on citation networks to gauge the level of influence – or gravitas – of academic entities (universities, authors, journals, countries, etc.) within a specific domain. This approach considers not only how many citations an entity receives, but who those citations come from, weighting influential citations more heavily[5][6]. In essence, Research Gravitas extends the idea behind Google’s PageRank algorithm to academia, providing a fairer and more nuanced indicator of impact than traditional metrics.
Researchers have previously proposed similar network-based metrics to better capture influence. Notably, the Eigenfactor score (for journals) and the PageRank-index (π) for individual scientists follow the same principle: citations from important sources count more[7][5]. These approaches are thought to be more robust than simple citation counts or impact factors, which “purely count incoming citations without considering the significance of those citations”[8]. By leveraging the structure of citation networks, Research Gravitas aims to identify true academic leaders in a field – those whose work is highly respected and widely influential, not just prolific or popular.
Methodology: PageRank on Citation Networks
1. Building the Citation Network: At the core of the gravitas calculation is a directed, weighted citation graph. Nodes in this graph represent the entities we want to rank (e.g. institutions or authors), and a directed edge from Node A to Node B represents A citing B. The weight on an edge corresponds to the number of citations from A to B (i.e. how many times works affiliated with institution A cite works of institution B). To ensure data quality, we focus on substantive research outputs (e.g. articles, reviews, books) in a given timeframe (e.g. publications from 2020–2024 in our implementation) and exclude irrelevant items like retracted papers or non-research content. We also remove self-citations (edges where the citing and cited entity are the same) so that an institution or author does not artificially boost its own score[1]. The result is a network of scholarly influence: who cites whom, and how often.
2. Domain-Specific Filtering: A key feature of our approach is the ability to target specific disciplines or topics, such as Sustainable Development Goals (SDGs) or traditional fields like Computer Science and Physics. This is achieved by filtering the set of works to only those relevant to the domain of interest before building the network. For example, to measure gravitas in SDG 11: Sustainable Cities and Communities, we first select all research works that have been tagged as contributing to SDG 11. Each work is linked to its authors and their institutions, so we can identify, say, all papers related to SDG 11 and group their citations by the institutions of the citing and cited authors. By restricting the network to these SDG 11 papers, we construct a domain-specific citation graph that reflects influence within that research area. We can do the same for a field like Physics or Computer Science by filtering works based on subject classifications. This ensures that the gravitas score truly reflects leadership in that particular domain, not just overall size or output in unrelated areas.
3. Applying the PageRank Algorithm: Once the domain-specific citation network is built, we run the PageRank algorithm on this directed graph to compute an influence score for each node (institution, author, etc.). PageRank treats each citation as a “vote” of importance, but crucially not all votes are counted equally. Intuitively, a citation from a highly influential paper or prestigious journal should carry more weight than a citation from an obscure source. Likewise, if a paper cites dozens of references, each of its citations is given less weight than a citation from a paper with only a few key references (to prevent one paper from unfairly boosting many others)[10]. The PageRank algorithm captures both of these intuitions automatically:
- Quality of Citations: Citations from already influential entities contribute more to your gravitas. In network terms, receiving a link from a high-ranked node is more valuable than one from a low-ranked node. This reflects the idea that a scholar or institution is influential if they are acknowledged by other influential peers[10][7]. As the Eigenfactor methodology puts it, citations from highly ranked journals (or authors) are weighted more in determining the score[7]. This reduces the impact of cliques or “self-referential groups” that only cite each other without broader recognition[11].
- Diminishing Returns for Bulk Citations: If a citing paper or institution references a long list of works, each individual citation it makes is slightly less impactful. PageRank models this by dividing a node’s influence among its outbound links[12]. In other words, a citation coming from a paper with 100 references counts less than a citation from a paper that cites only 5 references, since the latter is devoting more of its attention to the work in question[10]. This guards against inflated influence from sources that cite indiscriminately.
Mathematically, we construct a transition matrix of the citation network and perform an iterative computation of PageRank scores for each node. We typically use a damping factor (often 0.8, meaning a 20% chance of randomly “jumping” to another node at each step) to ensure convergence, as is standard in PageRank[13]. The result is a steady-state distribution of “influence scores” across all entities in the network.
4. Deriving the Gravitas Score: The raw output of the PageRank is a score (between 0 and 1, summing to 1 over all nodes) for each entity, which we interpret as its share of influence in the citation network. Higher scores indicate greater gravitas. In our implementation, after obtaining these scores, we apply an exponential cumulative distribution function (CDF) transformation to the PageRank values. This step converts the skewed distribution of PageRank into a more normalized 0–1 scale, which can be seen as a percentile or probabilistic rank. Essentially, using the exponential CDF (with the mean PageRank as the scale parameter) spreads out the top scores and helps differentiate leaders. An institution at the 95th percentile by this measure has very high gravitas compared to the median. This transformation isn’t strictly necessary for the metric to work, but it provides a more interpretable Gravitas Index – one can say, for example, “University X is in the top 5% (0.95) gravitas for SDG 11 research.”
Finally, we often impose a minimum citation threshold (for instance, an institution must have at least 50 citations in the domain network to be included) so that the ranking focuses on significant players and avoids statistical noise from very small nodes. The end product is a list of entities ranked by their Research Gravitas score for the given field, along with their citation counts and percentile score.
Applications: Identifying Leaders by Field and SDG
One powerful aspect of the Research Gravitas approach is its flexibility to zoom in on different levels and areas of research. By changing the scope of the input network, we can identify academic leaders in virtually any context:
- By Research Discipline: We can compute gravitas rankings for traditional fields (e.g. Physics, Computer Science, Medicine) using subject classifications. For example, using a database like OpenAlex which links papers to topics and fields[9], we could isolate all publications in particle physics, build the citation network among institutions in that field, and run our algorithm. The output might reveal, say, that while big-name universities like Harvard or MIT score high (as expected), a smaller institution like CERN or a specialized research institute might outrank larger universities due to a few exceptionally influential papers – reflecting true leadership in that niche. In essence, this method highlights which institutions (or authors/journals) punch above their weight in terms of influence within the discipline. A similar analysis in Computer Science might show, for instance, that certain tech-focused universities or corporate research labs have high gravitas because their work is widely cited by top researchers, even if their total paper count is lower than larger universities.
- By Sustainable Development Goal (SDG): Research aligned with the UN SDGs cuts across disciplines and is of high interest to policymakers and funders. Our approach can identify which institutions or countries are at the forefront of research for each SDG. For example, using the SDG tags assigned to publications (via an AI classifier with a confidence cutoff, as implemented in OpenAlex[14]), we can focus on SDG 11 (Sustainable Cities) and find out which universities worldwide have the greatest influence in that area’s research. The gravitas metric might show that a technical university known for urban planning research has a higher score than some larger general universities, because its SDG 11 papers are heavily cited by other important works in urban sustainability. This complements simple output metrics – an institution might not have the highest number of SDG 11 publications, but if it has a few seminal papers that everyone cites, its gravitas will be high. Such insight is valuable: it identifies the true knowledge hubs driving progress on each goal. Policymakers could use this to find key research partners, and universities can benchmark their influence on global challenges.
- At Different Scales (Authors, Journals, Countries): While our example code focused on institutions, the gravitas methodology is general. By redefining nodes and edges, we can rank authors (where an edge from Author A to Author B means A cites B’s work), journals (a journal citation network like how Eigenfactor works), or even countries (aggregating all output from a country as one node). For authors, this becomes similar to the pagerank-index proposed by Senanayake et al., which was shown to highlight individual researchers who have made field-defining contributions rather than just those with many publications[2]. For example, in a case study on quantum game theory, one author with only a single highly cited paper had a PageRank-based score in the top 90th percentile, far above what their low h-index would suggest[15]. This author’s lone paper was heavily cited by other influential papers, giving them a high gravitas despite a modest publication count. Such cases demonstrate the metric’s ability to surface “hidden gems” – researchers or works that influence the field out of proportion to their quantity of output[16][3].
In all these applications, Research Gravitas serves as a lens to identify who the academic leaders are in a given context. Rather than just measuring productivity or average impact, it spotlights those who are shaping the direction of research. This can inform decisions like faculty hiring or institutional partnerships: for example, a university aiming to strengthen its AI research might use gravitas rankings to attract rising-star researchers or collaborate with the top-ranked institutions in that discipline.
Comparison to Traditional Metrics and Their Pitfalls
It’s important to understand how Research Gravitas differs from, and in many ways improves upon, conventional research metrics. Here we compare it to some widely used indicators and highlight why a PageRank-based approach addresses their weaknesses:
- Total Citations & h-index: Raw citation counts simply sum up how many times an entity is cited. The h-index adds some nuance by balancing quantity with at least moderate impact (e.g. requiring h papers with h citations each), but both metrics fundamentally count all citations the same. This opens the door to a number of issues. For one, citation circles and self-citations can inflate these metrics – e.g. a group of mediocre researchers could cite each other’s work extensively to prop up their counts. Such locally famous authors “whose research does not have global impact but gets cited by their close colleagues” can appear more influential than they really are[17]. Moreover, gaming the system becomes possible: one can publish a slew of low-quality papers in any venues available “purely with the intention of citing” one’s other work (essentially a citation farm)[18]. The h-index only partially mitigates this, and it has its own documented shortcomings: it penalizes early-career researchers (who haven’t had time to accumulate many papers)[19], and it ignores extremely high citation outliers (once a paper is in the h-core, additional citations don’t increase h-index)[20]. Variants like g-index and others try to patch these, but none of these metrics consider who is citing the work – they “still treat all citations equally”[1]. In contrast, Research Gravitas (via PageRank) inherently solves this by weighing citations by the citer’s influence. If a paper is cited only by low-impact papers or a tight clique of friends, its gravitas contribution will be low – thus reducing the effect of self-referential groups and gaming[11]. On the other hand, if a young researcher’s single paper is cited by Nobel laureates’ work, our metric will recognize that disproportionate impact, even if their total citations are modest. This makes the gravitas measure fairer and more discerning than citation totals or h-index[5][2]. In fact, one study found that when comparing h-index vs. a PageRank-based index, the top-ranked authors can differ significantly, with the PageRank method “highlighting authors who have made a relatively small number of definitive contributions… or worked in smaller groups,” whereas h-index favored those with many co-authored papers or large output[2]. This indicates gravitas can recognize quality-over-quantity in a way h-index does not.
- Journal Impact Factor vs. Eigenfactor/SJR: At the journal level, a useful analogy is the distinction between the Impact Factor (IF) and network-based metrics like Eigenfactor or SCImago Journal Rank (SJR). The Impact Factor essentially measures popularity – it averages how often articles in a journal are cited, giving equal weight to all citations. This has well-known biases: review journals, for example, often get very high IF simply because they accumulate many citations (every paper in a field might cite a review for background). But are those citations influential? Studies have shown that network-weighted measures tell a different story. Bollen et al. (2006) observed that if you compute a PageRank-like score for journals (sometimes called Journal PageRank or a weighted network score), it correlates with IF but not exactly – some journals have high IF but low prestige (PageRank), and vice versa[21]. In their results, many review journals fell into the “high popularity, low prestige” category[22]. The interpretation was that Impact Factor reflects how frequently a journal’s articles are cited (popularity), whereas a PageRank-based metric reflects the prestige of those citations[21]. A citation from an obscure conference proceeding counts the same as one from Nature in IF, but not in Eigenfactor. Thus, Eigenfactor and similar metrics were touted as more robust: “for a given number of citations, citations from more significant journals will result in a higher Eigenfactor score”[8]. Research Gravitas is essentially bringing that same prestige-based view down to the level of individual institutions or researchers. By doing so, it avoids pitfalls like rewarding journals (or researchers) that churn out lots of papers that get easy citations. Instead it rewards those that attract citations from the most respected sources. This makes it much harder to game or manipulate, since one cannot easily fabricate being cited by top-tier work – that has to be earned through genuine impactful research[23][6].
- Altmetrics and Other Measures: There are many other metrics (like altmetrics that track online attention, or composite scores), but those measure different kinds of “influence” (public or social influence rather than scholarly impact) and often suffer from their own gaming issues (e.g. buying social media mentions). Our focus here is on scholarly impact. Nonetheless, one might note that gravitas has an advantage of transparency and reproducibility when using open data. The underlying data (citations and affiliations) can be openly inspected, and the algorithm is well-defined, aligning with calls for responsible and open research evaluation[24][25]. This is in contrast to some proprietary metrics which are black boxes.
In summary, the Research Gravitas approach addresses a fundamental flaw in traditional bibliometrics: the assumption that all citations are created equal. By accounting for the network dynamics of citations, it elevates the signal of truly influential scholarship while dampening the noise of mere popularity or self-promotion. As Chen et al. (2007) put it, this kind of algorithm naturally ensures that “the effect of receiving a citation from a more important paper is greater than that from a less popular one,” and that a citation from a paper with an extensive reference list is counted proportionally less[10]. These properties yield a metric of influence that better corresponds to our intuitive notion of gravitas or prestige in academia.
Conclusion and Example
Research Gravitas provides a fresh lens to evaluate academic impact, complementing existing metrics with a more quality-aware perspective. By leveraging the PageRank algorithm on carefully constructed citation networks, it identifies who truly matters in a given research arena – be it the universities leading in SDG-related research or the scholars pushing the boundaries of quantum computing. The metric’s strength lies in rewarding “true excellence”: work that is cited by other high-caliber work, as opposed to work that simply accumulates many shallow citations[6].
For example, when we applied this methodology to a dataset of Sustainable Development Goal publications, the resulting rankings did more than mirror the largest producers of papers. In SDG 3 (Health & Well-being) research, a few specialized medical research institutes emerged with high gravitas scores, indicating that their publications (though fewer in number) were heavily cited by other influential health studies. In contrast, some universities that had dozens of SDG 3 papers but mostly cited by each other or lower-impact outlets ranked lower in gravitas – even if their raw citation counts were higher. This demonstrates how gravitas can surface unexpected leaders: entities that might be overlooked by brute-force metrics but are in fact driving the intellectual discourse in the field.
Likewise, in a discipline like physics, classic papers that are universally regarded as foundational (“scientific gems”) receive a boost in gravitas. Chen et al. found that by using PageRank on the Physical Review citation network, they could identify exceptional papers that stand out from the crowd of merely well-cited papers[26][16]. These were papers familiar to virtually all physicists – anecdotally confirming that the algorithm was capturing a sense of renown or gravitas, not just counting citations[27]. Our Research Gravitas metric operates on the same principle for whatever set of entities we choose. It has the sensitivity to recognize, for instance, a theoretical computer science paper that is not the most cited in raw numbers, but all the top experts in the area cite it – marking it as a linchpin work in that domain.
In conclusion, Research Gravitas is a network-informed metric that offers a more credible, manipulation-resistant, and domain-specific measure of research influence. By accounting for who cites you – and how influential they are – it paints a richer picture of academic impact. This helps universities, funding agencies, and scholars themselves to identify academic leaders and high-impact work in any given field with greater confidence. As the creators of the pagerank-index argued, such a metric is “inherently above manipulation” because of its feedback loops, and it “rewards true excellence by giving higher weight to citations from documents with higher academic standing”[23][6]. In an era of information overload and strategic gaming of metrics, an approach like Research Gravitas provides a much-needed compass pointing to genuine influence and leadership in research.
Tags: