By Ruth A. Pagell*
(5 February 2016) In early December 2015, MIT’s Technology Review provided a summary of a new paper on using data from Wikipedia to rank the world’s most INFLUENTIAL universities. A month later, ACCESS posted the abstract for another paper that introduced Wikiometrics, also using data from Wikipedia to rank universities. In between, SCImago and its research partners who provide Ranking Web of Universities introduced the first edition of the top 2000 Universities Ranked by Google Citations.
This article introduces the two Wikipedia rankings. While the two articles are technical, they are important to us because they are offering alternative ways to measure universities and alternative ways to think about ranking universities. The Google Scholar (GS) ranking uses the Webometrics platform, discussed in Ruth’s Rankings 9. Since the articles provide comparisons with existing world rankings, this critique provides comparison tables for Asia and my personal observations.
Lages, Patt and Shepelyansky (Lages 2015) introduce WRWU (Wikipedia Rankings of World Universities) to measure the influence of universities. They compare their results using a 2013 version of Wikipedia with ARWU 2013. They find a 60% overlap in the top 100 world universities. The article is very technical and statements in quotations below are from a clarification email from Prof Jose Lages. The MIT Technology Review article simplifies the math and results and I will simplify it further to focus on what is most important to us.
In our previous articles we noted a bias toward English language research. Our bibliometricians continue to massage the data from Web of Science and SCOPUS to reduce this and other biases. Other scholars and researchers decide the publication content for WOS and SCOPUS. Wikipedia’s content comes from the crowd. There are many language editions of Wikipedia. For example, there are about four million articles in English, 1.5 million in German and one million in French. I checked Web of Science and Scopus just for 2013 and found over two million articles in English with about 1% in German or French.
What is unique about WRWU is that it includes results from multiple languages. Lages and his group apply three algorithms similar to Google algorithms for extracting the top 100s from 24 language editions of Wikipedia. Asian countries, such as India, Thailand and Malaysia are well represented because their Wikipedias are included. On the other hand, Hong Kong is underrepresented because it is included with China and the Chinese edition of Wikipedia even though many of their top universities are English language based.
WRWU includes three separate indicators based on these algorithms. University names are manually cross-checked and pure research institutes are removed.
PageRank (WPRWU) ranks university websites based on incoming links. With the PageRank algorithm “a given article is all the more important if other important articles point to it. This ranking places at the top the most referred to articles.“
CheiRank algorithm (WCRWU is based on outgoing links. With this algorithm “a given article is all the more important if it points to other important articles. This ranking places at the top the most diffusive/communicative articles.”
2Drank (W2RWU) is based on both incoming and outgoing links
The article lists the top ten universities in the world for each indicator. Table 17.1 lists the top ten Asian universities for the three indicators and their overlaps with ARWU. 23 universities from Japan, China, Singapore, Malaysia, Thailand and India comprise the top 200 WPRWU. 26 make up the top 200 in WCRWU, adding Hong Kong and North Korea to the list of countries included. 20 also are on the list for W2RWU. Many of these names are unfamiliar and are newer private universities, including Kim Il-song University from North Korea. Additional datasets with over 1,000 universities are available on a separate website.
The authors use the adjective “influential” to describe their lists and highlight Wikipedia’s coverage of many centuries and cultures. The overlap for top 100 world universities of WPRWU and ARWU is 62% and the number of US universities drops from 58 in ARWU to 38 in WPRWU. German universities come in second.
Older universities also do well in WRWU. 13 of the 19 universities ranked in 1925 (Hughes) are in the top 20. Table 17.2 compares the 90 year old Hughes rankings with WRWU, Wikiometrics (to follow) and ARWU. The authors “believe the Wikipedia ranking provides the firm mathematical statistical evaluation of world universities which can be viewed as a new independent rankings being complementary to already existing approaches.”
We have examined many derivative indicators created from the underlying data in WOS and SCOPUS. Katz and Rokach (2016) propose “Wikiometrics, the derivation of metrics and indicators from Wikipedia.” They believe Wikipedia represents “the real world, due to its size, structure, editing policy and popularity.” They base their trust in Wikipedia on characteristics such as [my notes]:
- Size and scope that goes beyond existing encyclopedias [the equivalent of two current years of traditional academic papers in English]
- Timely and updated [no delays for peer review process]
- Tags and meta-data – [generated by the crowd]
- Wisdom of the crowd – a measure of popularity and importance
Much of the article includes statistical comparisons of Wikiometrics to three of the rankings we have previously examined: They compare the top 20 universities using Wikiometrics from the December 2013 Wikipedia with 2011 ARWU, Times Higher Education and Webometrics (Katz, Table 4, pg 11). They queried DBpedia to extract data from Wikipedia on the word “university” and then included only those universities which appeared in two of the three rankings above, resulting in 389 universities. Correlations show a statistically significant relationship between Wikiometrics top World universities with each of the other three. Note that THE has changed its underlying data since this comparison but not its weightings. Also note that the inclusion requirement results in the fewest number of Asian universities with 8 in the top 200 and 53 in the entire dataset.
Wikiometrics uses three Wikipedia features to compute its rankings: links, page views and Infobox metadata, The authors apply their methodology to the ranking of both universities and journals but we will only cover universities.
Links: similar to the incoming links above but different methodology
Overall page views during a fixed time period
Infobox data to identify universities, faculty and alumni. It counts all people affiliated with a university, whether or not their affiliations are “scholarly”. See Figure17.1 for a sample of an institution and an individual Infobox.
Table 17.3 compares Asian universities in Wikiometrics to the other databases. The overlap for top 20 is 65% for THE and ARWU and 50% with Webometrics. Data are presented as extracted, without controls for size or subject.
This ranking is “an experiment for testing the suitability of including GSC data in the Rankings Web” with “many shortcomings in its first iteration.”
1. Only institutional profiles are chosen
Google creates institutional profiles based on faculty setting up their own Google profiles, using a “normalized name” for themselves and their universities and using the institution’s email address. I looked at my own Google Scholar profile, which lists University of Hawaii as my affiliation, with an Emory University email, The list of “articles” is mostly (I have never known how much is “mostly”) accurate, with missing articles (including articles from ACCESS) and non-articles and repeat entries but it does not matter because,
2. Only data from the top10 public profiles of each university are collected. This is to allow for a size independent comparison. According to Webometrics, more profiles will probably be added.
3. The top author is excluded, leaving 9 profiles. If an author has multiple profiles, the database includes only the “best”.
4. The “figures are valid only at the time of collection.
5. Since Google metrics are derived automatically by algorithm, individual authors are responsible for setting up their profiles keeping them up to date. It is the responsibility of the institutions to monitor the list to prevent fake entries. Checking a university with which I am familiar, one of the top faculty is no longer there and the Asian faculty members are inconsistent in how they present their names in the profile – western or eastern protocol
The display includes rank, institution, country and number of citations.
While accounting for size, there is no normalization by subject or fractionalized count for multiple authors. There is no mention of date range. Seven of the top ten universities are from the United States and the other three from the UK. The first Asian university is National University of Singapore coming in at 40 and the first continental European university is Lund (Sweden) at 51. See Table 17.4 for a list of top Asian universities compared with the two commercial citation sources, WOS and SCOPUS. According to Isidro F. Aguillo “we have a lot of GSC based rankings: by countries, by people with h>100 and of course this one experimental for institutions. There will be more editions, probably on a regular basis.”
These rankings, along with the other alternative metrics encourage us to think about the purpose of rankings and what is important to whom. Wikipedia rankings do not claim to measure scholarly output. Rather, the authors talk about influence, popularity and importance, as measured by the crowd rather than the scholars. The Wikipedia-based rankings do not filter by subject or size. The result is more inclusion of universities in social sciences and also older universities to whom others link for historical information. WRWU is appealing because it incorporates universities from 24 language versions of Wikipedia. Scholar is always popular because it has more citations. However, it depends on institutions and authors providing stable, disambiguated entries for themselves. The authors of the two articles are not sure they will update their results. Webometrics acknowledged that it will continue testing Google Scholar citations and other Scholar metrics on their existing Ranking Web of Universities platform. Comparing the three, we see that the world universities at the very top do not change based on methodology. Are scholarly output and Nobel Prize winners (as in ARWU, for example) or inclusion of international students and faculty (THE) the only ways to define the very best? Do we move with the times and integrate the input of the crowd and consider influence and popularity as a measurement of importance in the real world?
See Appendix 17.1 for comparison of the three rankings, including a comparison with reputation rankings.
For background information on rankings discussed in this article see:
Ruth’s Rankings 5 – Times Higher Education
Ruth’s Rankings 6 – ARWU
Ruth’s Rankings 9 – Webometrics
Ruth’s Rankings 10 -Incites
Ruth’s Rankings 14 -SciVal
Aguillo, Isidoro F., email received February 3, 2016
Katz, G. email received 13 January 2016, including a copy of the dataset
Lages, J. email received 24 January 2016
Hughes, R.M. (1925). A study of the graduate schools of America. Oxford. Ohio ranking republished in Magoun, H.W. (1966). The Carter Report on quality in graduate education: Institutional and divisional rankings standings compiled from the report. Journal of Higher Education, 37 (9) pg. 484.
Katz, G and Rokach, L. (8 Jan 2016). Wikiometrics: A Wikipedia based ranking system. Accessed 28 Jan 2016 at http://arxiv.org/abs/1601.01058
Lages, J, Patt, A Shepelyansky, D.L. (29 November, 2015. Wikipedia ranking of world universities. Arxiv (to be published in European Physical Journal) accessed January 28 at http://arxiv.org/abs/1511.09021
MIT Technology Review (7 Dec 2015) Wikipedia-Mining Algorithm reveals worlds’ most influential universities. http://www.technologyreview.com/view/544266/wikipedia-mining-algorithm-reveals-worlds-most-influential-universities/
Ranking Web of Universities (Dec 2015). Top 2000 Universities by Google Citations. First edition, afhb.com accessed 22 January, 2106 at http://www.webometrics.info/en/node/169
A list of Ruth’s Rankings is here.
*Ruth A. Pagell is currently an adjunct faculty [teaching] in the Library and Information Science Program at the University of Hawaii. Before joining UH, she was the founding librarian of the Li Ka Shing Library at Singapore Management University. She has written and spoken extensively on various aspects of librarianship, including contributing articles to ACCESS –https://orcid.org/0000-0003-3238-9674 .