Mining 50 years of astronomy and astrophysics publications data

by Lucía Santamaría

A goal of Task 2 of the Gender Gap project is the understanding of publication patterns in diverse research fields and across countries and regions. To realize this objective and to be able to extract discipline-specific conclusions from bibliographic data, it is crucial to have access to high-quality, curated, comprehensive bibliographic collections on the fields of interest.

Without any doubt, the Astrophysics Data System (ADS) is the reference database in astronomy and astrophysics worldwide. Established in the early 1990s, it was funded by NASA and is developed and managed by the Harvard-Smithsonian Center for Astrophysics. What started as a mere listing of papers containing astronomical references is now a powerful online database that indexes peer-reviewed and non-peer-reviewed publications on astronomy and astrophysics, including planetary sciences and solar physics; physics and geophysics; as well as preprints published on the arXiv. Of those three collections, the astronomy one is by far the most advanced and its use accounts for about 85% of the total ADS usage.

The astronomy and astrophysics ADS database contains essentially all relevant publications in those fields, with complete coverage back to Volume 1 of most journals. An overview of the indexed and scanned sources is given here.

We have queried the ADS database and retrieved about 900,000 ADS records from peer-reviewed sources from the astronomy and astrophysics database dating back to 1970. This data set, together with its ongoing updates, will be the basis for our data-backed analysis of publication trends in the field of astronomy and astrophysics, one of the research areas of interest for the Gender Gap in Science project. Here we offer a sneak peek of the data for the interested reader.

As expected, the number of records from refereed sources in astronomy and astrophysics indexed by ADS has been growing since the 1970s, although it seems to have stabilized since the last decade. It currently amounts to roughly 25,000 publications per year.


Another easily identifiable trend is the increasing number of authors per paper, as shown below. While in the 1970s about half of all papers were single authored and over 90% of them had been written by at most three people, currently the publications with one author represent less than 15% of the total, and about half of them have four authors or more. Astronomy and astrophysics have undergone the same route than other “big science” fields such as high-energy physics, and nowadays articles from collaborations with more than thousand authors are not unheard of in ADS. This trend poses new challenges for the analysis of scientific output and the meaning of academic credit and contributions.


A good proxy to identify high-quality astronomy and astrophysics research is the academic outlet in which articles are published. A look into the journals by number of publications gives us a good overview of the most popular ones. Below we see that, in addition to traditional astronomy journals, the database contains periodicals with a broader focus too. Generally speaking, the most relevant “astro-journals” are “The Astrophysical Journal”, “Astronomy and Astrophysics”, “Monthly Notices of the Royal Astronomical Society”, “The Astronomical Journal”, “Publications of the Astronomical Society of the Pacific”, and “Publications of the Astronomical Society of Japan”. Of all peer-reviewed publications indexed in the ADS astronomy and astrophysics database, almost 30% belong to those six journals. This characteristic tradition of publishing in a relatively small set of venues helps tremendously to narrow down who is who within the astronomical scientific community, and will be relevant for our further analyses.


ADS_affsA fundamental task that we intend to tackle on this data set is the disambiguation of authors based on their names. This is not an easy task, as explained here. The availability of the authors’ affiliation is thus a powerful feature that can be used to increase the confidence on a correct author identification. Additionally, affiliations will help us realize our goal of analyzing geographical regions and countries. Fortunately, the ADS data contains author names plus their affiliations for almost 80% of all authorship instances.

Over the next months we will delve into the ADS data to extract the most meaningful insights from almost 50 years of astronomy and astrophysics publications. Stay tuned!