The ProliferativeIndex R package1 provides users with R functions for calculating and analyzing the proliferative index (PI) from an RNA-seq dataset.
The PI was adapted from Venet, et al.2:
“The proliferating cell nuclear antigen, PCNA, is a ring-shaped protein that encircles DNA and regulates several processes leading to DNA replication. As suggested by its name, this is one of the most widely used antigen target for immunohistochemical measures of the fraction of proliferating cells in tissues. Ge et al. profiled with microarrays 36 tissues from normal, healthy individuals encompassing 27 organs. We call ‘meta-PCNA’ the signature composed of the 1% genes the most positively correlated with PCNA expression across these 36 tissues. In plain language, meta-PCNA genes are consistently expressed when PCNA is expressed in normal tissues and consistently repressed when PCNA is repressed. We define the meta-PCNA index as the median expressin of meta-PCNA genes.”
IMPORTANT: Proliferative Indices are only interpretable relative to other PIs. For example, higher/lower PI in tumors compared to normal tissues or in post-mitotic tissues compared to in tissues with high rates of cell turnover. Additionally, PI is measuring proliferation associated with expression (as described above) and not necessarily proliferation itself.
ProliferativeIndex contains the following functions:
readDataForPI
: Read in user data for use with package
functionscalculatePI
: Calculate PI for user datacomparePI
: Compare PI across user data setcompareModeltoPI
: Compare PI to model PCsIncluded with ProliferativeIndex specifically for use with this vignette is data from the The Cancer Genome Atlas (TCGA) Adrenocortical Carcinoma (ACC) dataset.3
After first loading the ProliferativeIndex library:
This dataset, vstTCGA_ACCData_sub
can be accessed from
the package:
data(vstTCGA_ACCData_sub)
#Examine only the first few columns and rows because the dataset is large (20501 genes x 10 samples):
dim(vstTCGA_ACCData_sub)
## [1] 20501 10
#Note that sample IDs are column names and HGNC gene IDs (http://www.genenames.org) are rownames and that vst data is numeric.
str(vstTCGA_ACCData_sub)
## 'data.frame': 20501 obs. of 10 variables:
## $ TCGA.OR.A5J1: num 5.87 4.19 5.92 8.43 6.99 ...
## $ TCGA.OR.A5J2: num 5.49 4.19 5.2 8.74 4.19 ...
## $ TCGA.OR.A5J3: num 6.04 4.52 5.44 8.04 4.76 ...
## $ TCGA.OR.A5J5: num 11.4 4.71 5.22 7.08 6.8 ...
## $ TCGA.OR.A5J6: num 10.07 4.19 5.11 8.8 4.66 ...
## $ TCGA.OR.A5J7: num 5.57 4.19 4.96 7.52 4.91 ...
## $ TCGA.OR.A5J8: num 6.86 4.19 4.19 6.91 5.1 ...
## $ TCGA.OR.A5J9: num 5.4 4.19 6.46 8.94 6.34 ...
## $ TCGA.OR.A5JA: num 6.8 4.19 5.25 8.77 6.36 ...
## $ TCGA.OR.A5JB: num 8.53 4.19 4.19 6.84 4.19 ...
TCGA.OR.A5J1 | TCGA.OR.A5J2 | TCGA.OR.A5J3 | TCGA.OR.A5J5 | TCGA.OR.A5J6 | |
---|---|---|---|---|---|
A1BG | 5.871339 | 5.490145 | 6.036080 | 11.397348 | 10.065106 |
A1CF | 4.190503 | 4.190503 | 4.523434 | 4.713955 | 4.190503 |
A2BP1 | 5.915039 | 5.196520 | 5.443088 | 5.221104 | 5.112238 |
A2LD1 | 8.431843 | 8.741279 | 8.043286 | 7.075708 | 8.798831 |
A2ML1 | 6.986670 | 4.190503 | 4.764641 | 6.798125 | 4.657211 |
Functions in the ProliferativeIndex package come with help pages that
can be accessed as usual (for example, ?readDataForPI
).
The function readDataForPI
is used to read data in for
use with the ProliferativeIndex package.
#Inputs are the user's vst dataframe and a model of interest for examining PI:
exampleTCGAData<-readDataForPI(vstTCGA_ACCData_sub, c("AIFM3", "ATP9B", "CTRC", "MCL1", "MGAT4B", "ODF2L", "SNORA65", "TPPP2"))
#examine output which is a list of two objects:
# exampleTCGAData$vstData is the user's vst dataframe and exampleTCGAData$modelIDs is a character string of the user's gene IDs for their model of interest
str(exampleTCGAData)
## List of 2
## $ vstData :'data.frame': 20501 obs. of 10 variables:
## ..$ TCGA.OR.A5J1: num [1:20501] 5.87 4.19 5.92 8.43 6.99 ...
## ..$ TCGA.OR.A5J2: num [1:20501] 5.49 4.19 5.2 8.74 4.19 ...
## ..$ TCGA.OR.A5J3: num [1:20501] 6.04 4.52 5.44 8.04 4.76 ...
## ..$ TCGA.OR.A5J5: num [1:20501] 11.4 4.71 5.22 7.08 6.8 ...
## ..$ TCGA.OR.A5J6: num [1:20501] 10.07 4.19 5.11 8.8 4.66 ...
## ..$ TCGA.OR.A5J7: num [1:20501] 5.57 4.19 4.96 7.52 4.91 ...
## ..$ TCGA.OR.A5J8: num [1:20501] 6.86 4.19 4.19 6.91 5.1 ...
## ..$ TCGA.OR.A5J9: num [1:20501] 5.4 4.19 6.46 8.94 6.34 ...
## ..$ TCGA.OR.A5JA: num [1:20501] 6.8 4.19 5.25 8.77 6.36 ...
## ..$ TCGA.OR.A5JB: num [1:20501] 8.53 4.19 4.19 6.84 4.19 ...
## $ modelIDs: chr [1:8] "AIFM3" "ATP9B" "CTRC" "MCL1" ...
*note, the R package includes a data object, ‘exReadDataObj’ that is the output from the readDataForPI function for comparison
The function calculatePI
calculates PI for all sample’s
in the users vst dataframe using a list of PCNA-associated genes
collected from Venet et al. (including alternative gene names).
*note, the function will print to the screen how many genes used to calculate the PI were found in the vstData
## [1] "vstData contained 131/131 of the PI-associated genes"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.454 8.480 9.220 9.246 10.016 10.556
*note, the R package includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison
This function will summarize the PI values within the user’s dataset.
Min. 1st Qu. Median Mean 3rd Qu. Max. 7.454 8.480 9.220 9.246 10.016 10.556 *note, the R package includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison
The function compareModeltoPI
will take, as input, the
user’s data and model identifiers and compare to PI:
SpearmanRho | SpearmanPvalue | PCAPropOfVariance | |
---|---|---|---|
PC1 | 0.9878788 | 0.0000000 | 0.51527 |
PC2 | 0.0181818 | 0.9728412 | 0.11587 |
PC3 | -0.0909091 | 0.8114170 | 0.07491 |
PC4 | 0.1151515 | 0.7588331 | 0.06558 |
PC5 | -0.1757576 | 0.6319674 | 0.05897 |
PC6 | -0.0424242 | 0.9186333 | 0.05068 |
PC7 | 0.0424242 | 0.9186333 | 0.05002 |
PC8 | -0.0909091 | 0.8114170 | 0.03992 |
PC9 | 0.0424242 | 0.9186333 | 0.02878 |
PC10 | -0.4181818 | 0.2324181 | 0.00000 |
Ramaker and Lasseigne, et al. bioRxiv, 2016.↩︎
Venet, et al. PLoS Computational Biology, 2011 and Ge, et al. Genomics, 2005.↩︎
The TCGA ACC dataset was obtained from the TCGA data portal (tcga-data.nci.nih.gov) in June 2015. Level 3 RNASeqV2 raw count data was variance stabalized with the DESeq2 v1.8.2 ‘varianceStabilizingTransformation’.↩︎