The Demise of 38,000 NIH-funded Investigators

In my first post using R to analyze NIH data, I examined the number of unique investigators funded by NIH per year as a function of time. The definition of "unique PIs" was based on the number of unique "Contact PI Person ID" numbers in the NIH RePORT database from 1985 to 2014. Overall, this number was 216,521.

As I prepared my data set for more analysis, I discovered that some investigators had more than one Contact PI Person ID number. I have spent the past 2 months trying to sort this out and I am still not done. An investigator in the intramural program has well over 1oo ID numbers over time! Getting this sorted out is crucial for future analyses, particularly longitudinal ones that are so important. Otherwise, an investigator might appear to have a gap or termination in funding just because their ID number changed.

In addition, there are problems the other direction with multiple names associated with one ID number. A very small number of these appear to be cases where different people have been assigned to the same ID number. Most are related to non-uniformity in how names are entered (e.g. with or without a middle initial, with or without a period on the middle initial). Some are good to have been captured such as PI name changes associated with changes in marital status.

At this point, I am down to 178,122 unique ID numbers and I expect this number to fall further. While this has been a great exercise in learning R as well as examining creative practices in data entry (I did not previously know that NMN would entered in some cases where an individual gives No Middle Name), I am ready to finish up this stage and get on with more interesting analyses. But, with "data science" as with other types of science, time spent checking the validity of raw data before other analyses are done is time well spent.

First Outstanding Investigator (R35) Awards from NCI

The R35 mechanism is emerging at NIH as a mechanism for providing more stable (i.e. longer-term and for research programs rather than projects) support for selected investigators. The first R35 program out of the box was the NCI Outstanding Investigator Award, followed by the NIGMS MIRA Award. NINDS has also recently announced an outstanding program as well.

The first 17 R35 awards from NCI appeared in NIH RePORTER recently. These investigators cover the NCI mission fairly well (biology, genomics, surveillance, prevention including behavior, treatment). These investigators also have a wide range of funding with core support for FY2014 ranging from $230 K annual total costs to $5.8 M with a median of approximately $700 K total costs (although these values are somewhat subject to judgment since considerable support comes from P30 Cancer Center grants and program project grants (P01s)). I tried to provide lower estimates. The investigators are relatively diverse with regard to age with estimated ages ranging from 41 to 74 with an estimated median age of 56. The initial group includes 13 men and 4 women.

More awards are appearing in RePORTER; 4 additional awards have appeared since I did this initial analysis so expect updates.

