Set the data directory. Note you have to use one on your own system! When it is set correctly ‘Run->all’ in the menu will recompute everything.

getwd()
## [1] "/export/iwrk/closed/kemri/Francis_Final_TregData_Jan2020/Data"

Read individuals and attributes

load a table

ind_attr=read.csv("Individual_attributes.csv")
ind_attr[1:3,1:3]
##   SampleID   Phenotype Time_to_diagnosis
## 1  16K0021 Slow growth                21
## 2  16K0053 Slow growth                21
## 3  16K0109 Slow growth                21

Show data structure

summary(ind_attr)
##     SampleID          Phenotype  Time_to_diagnosis      Age         Gender  
##  16K0019: 1   Cleared      :12   Min.   :12.00     Min.   :19.00       :14  
##  16K0021: 1   Highly immune:25   1st Qu.:17.00     1st Qu.:24.00   F   :18  
##  16K0037: 1   Slow growth  :20   Median :21.00     Median :29.00   M   :53  
##  16K0053: 1   Treated      :28   Mean   :18.98     Mean   :29.27   NA's:11  
##  16K0054: 1   NA's         :11   3rd Qu.:21.00     3rd Qu.:33.00            
##  16K0061: 1                      Max.   :21.00     Max.   :43.00            
##  (Other):90                      NA's   :11        NA's   :25               
##      ELISA                  Location        New.phenotype
##  Min.   :    98.5   Ahero       :14   Cleared      :15   
##  1st Qu.:  1598.7   Kilifi North:18   Highly immune:28   
##  Median :  5838.4   Kilifi South:53   Slow growth  :21   
##  Mean   : 23595.7   NA's        :11   Treated      :30   
##  3rd Qu.: 17429.0                     NA's         : 2   
##  Max.   :402114.3                                        
##  NA's   :11
colnames(ind_attr)
## [1] "SampleID"          "Phenotype"         "Time_to_diagnosis"
## [4] "Age"               "Gender"            "ELISA"            
## [7] "Location"          "New.phenotype"

Three elements of phenotype column

ind_attr[["Phenotype"]][0:3]
## [1] Slow growth Slow growth Slow growth
## Levels: Cleared Highly immune Slow growth Treated

or

ind_attr$Phenotype[0:3]
## [1] Slow growth Slow growth Slow growth
## Levels: Cleared Highly immune Slow growth Treated

Let’s do a simple plot. Plot ELISA values against inds:

plot(ind_attr$ELISA)

Let’s plot ELISA vs Time to diagnosis

plot(ind_attr$ELISA ~ ind_attr$Time_to_diagnosis)

So, it looks like late diagnosis has an effect. This is just a quick example, let’s continue loading sets from

cytokines.csv
final_outcome_jan2020.csv
Individual_attributes.csv
pcr.csv
supernatant.csv
transcriptomics.csv
treg_phenotype_data.csv
cytokines = read.csv("cytokines.csv")
final = read.csv("final_outcome_jan2020.csv")
pcr = read.csv("pcr.csv")
supernatant = read.csv("supernatant.csv")
transcriptomics = read.csv("transcriptomics.csv")
treg = read.csv("treg_phenotype_data.csv")

when they load you can explore the data in the top right enviroment or

show(pcr$day[1:3])
## [1] 7.0 7.5 8.0

It will show that not all rows are labeled. That means we will need a way to cross-reference by ID.