Most of the analysis of NIH data that I have done with NIH data has been done using Excel. While Excel does have some useful features, it has many limitations. My son who, as an actuary, does considerable data analysis for a living, urged me to migrate to a more powerful platform, R, for my analyses. He can be quite convincing and I have spent time over the past month developing some rudimentary R skills (in part through an on-line course). I am now fully convinced that he was right.
I downloaded all of the data used by NIH RePORTER (from NIH ExPORTER) and wrote R scripts to parse the data into a forms that could be easily analyzed by R. The full file has 1,907,841 grant records with readable contact PI numbers for fiscal years 1985 to 2014. These correspond to 216,521 unique contact PIs.
As an initial exercise with these data, I decided to plot the number of unique contact PIs as a function of fiscal years. The result is shown below:
What I attempted as a test of my data analysis skills revealed a striking result. The number of unique contact PIs had grown almost linearly from 1985 to about 2009-2010 (the ARRA years) but subsequently dropped quite sharply from 2010 to 2014. This graph provide much clearer evidence for "the cull" than I anticipated.
Despite this bottom line, considerable work remains to be done to probe this further since this includes a wide variety of mechanisms. With the powerful file manipulation and analysis tools in R, this should be relatively straightforward.
Let the analysis begin!