Finding high quality articles from PLOS
Today I had to find a great research article for journal club. I wanted something influential and topical. PLOS made me happy by posting article level statistics as an excel file. I analyzed the data and produced a list of articles that were more cited than expected. They look interesting, so I am sharing them here. Enjoy!
Plos Papers with higher than expected citations before 2009
- Projections of Global Mortality and Burden of Disease from 2002 to 2030. 11/28/2006
- Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration. 2/26/2008
- Acquired Resistance of Lung Adenocarcinomas to Gefitinib or Erlotinib Is Associated with a Second Mutation in the EGFR Kinase Domain. 2/22/2005
- Human MicroRNA Targets. 10/5/2004
- Why Most Published Research Findings Are False. 8/30/2005
- KRAS Mutations and Primary Resistance of Lung Adenocarcinomas to Gefitinib or Erlotinib. 1/25/2005
- Mapping the Structural Core of Human Cerebral Cortex. 7/1/2008
- Randomized, Controlled Intervention Trial of Male Circumcision for Reduction of HIV Infection Risk: The ANRS 1265 Trial. 10/25/2005
- Senescence-Associated Secretory Phenotypes Reveal Cell-Nonautonomous Functions of Oncogenic RAS and the p53 Tumor Suppressor. 12/2/2008
- The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum. 8/18/2003
I did not know whether to use page views or citations as measure of influence. When I plotted them against each other it became clear that lots of citations implied lots of page views, but lots of page views did not imply lots of citations. I wanted articles that had both, so I used citations as my measure.
Articles are cited more over time, but I did not want to bias toward older articles. So I determined the best fit line for number of citations vs date and then used that to calculate the expected number of citations for a given date.
Last, I divided the observed number of citations by the expected number of citations for a given date. Unfortunately, the newest papers in the data set all appeared at the top of the list. This is because there is a low expected number of citations for recent papers and therefore noise is amplified. To correct for this, I eliminated all of the papers that appeared after 2008.