Application of Object Oriented System of R for generating descriptive statistics group-wise

Introduction

There are many packages available in R like data.table, tables, psych etc. to provide descriptive statistics like mean, standard deviation etc.  group-wise(factor-wise) for number variables.  In this article, an attempt is made to generate similar type of tabulated results utilizing  the  functions available in the base package and the concepts of object oriented system available with R. The main purpose of this type of exercise is to illustrate the application of object oriented system available in R to generate the output results as per our requirement. For the purpose of illustration, the iris data is considered, which consists of the data of four variables Sepal Length, Sepal Width, Petal Length and Petal Width for three species(factors). The following algorithm and R code illustrate the  calculation of   mean and standard deviations of these four variables for each specie and generate a tabulated results, which is similar to those obtained from the above packages.

Algorithm and Code

1.For generating the mean and standard deviation of any vector, a user defined function meansd() is defined as follows :

meansd<-function(x) {
l<-list()
l$Mean<-mean(x)
l$SD<-sd(x)
return(l)
}

The above function receives any vector  as input, calculates the mean and standard deviation and returns the results as a list l.

2.Initially, the execution starts by calling a function with just two arguments viz., i).the data frame containing all the variables for which mean and standard deviation are required and ii). a vector containing the factor variable. So a new function basstat() is defined with just two arguments, the first containing the data frame all the variables and the second containing the factor variable.  For the iris data we can call this function  as given below.

res<-basstat(iris[,1:4],iris[,5])

3. The basstat() function will split the data iris specie-wise into a list containing three sub data frames, one for each specie.  We will now use lapply function, which in turn calls another function result()  for each of these sub data frames and obtains the aggregated results in a variable “bres”. For the purpose of printing these aggregated results in a neat tabular fashion, we will take the help of object oriented programming concepts of R. For this purpose, we will change the class of bres as “myclass” and return this bres object. The basstat () function code is given below :

basstat<-function(df,f) {
l<-split(df,f)
res<-lapply(l,result)
class(res)<-“myclass”
return(res)
}

4.The lapply function in step 3 in turn, is calling the function result(), using each sub data frame as input argrument. The result function, contains a sapply() function. This function in turn will call the meansd function, with each of these sub data frames one at a time and receives the mean and standard deviation results for all  the variables in the sub data frames. It will capture them in the object  “tres” and returns these results to the calling function lapply. The code of the result() function is given below :

result<-function(x) {
tres<-sapply(x,meansd)
return(tres)
}

Through all these function calls, all the results are now available in the object res, which is of class “myclass”. The  results obtained from all these function calls are available specie-wise but not in a neat compact tabular fashion  as shown below.

raw-results

5.To facilitate the printing in a compact neat tabular fashion, the print function of myclass is defined as follows. This function, in turn cbinds all the results, does the required string manipulations and finally prints the results in a neat tabular fashion. The code of the  print.myclass() function is given below :

print.myclass<-function(x) {
nm<-names(x)
options(digits=4)
finres<-vector()
for(i in 1:length(x)) {
finres<-cbind(finres,t(x[[i]]))
}
cat(” “)
tsp<-max(nchar(names(x)))
isp<-paste(rep(” “,tsp),collapse=””)
cat(isp)
for(i in 1:length(nm)) {
tt<-nchar(nm[i])
ifelse((tt<12),esp<-(12-tt),esp<-1)
rsp<-paste(rep(” “,esp),collapse=””)
nm[i]<-paste(nm[i],rsp)
cat(nm[i])
}
cat(“\n”)
print(finres)
}

6.These results can be printed by just typing the res object of step 2

>res

The code for using the basstat() function and the results obtained are given below :

irisresults

Some more Results :

i).Descriptive Statistics of six variables mpg,disp,hp,drat,wt,qsec  for the factor cyl consisting of the levels/groups viz., cylinder 4, 6 and 8 of the dataset mtcars of MASS package

res1<-basstat(mtcars[,c(1,3,4,5,6,7)],mtcars[,2])
res1

mtcarsresults

ii).Descriptive Statistics of two variables Prewt and Postwt for the groupsCBT, Cont and FT of the dataset anorexia of MASS package

anorexia-results

iii).Desriptive Statistics of three variables Price, MPG.city and MPG.highway  for the groups Compact, Large, Midsize, Small, Sporty and Van  of the dataset Cars93 of MASS package

cars93-results

Conclusions

The tabulations of the output results obtained from all the above examples are found to be  similar to those  obtained from the data tables and tables packages. We could achieve this by using object oriented concepts of R language. In this exercise, I have   obtained the mean and standard deviations of number of variables group-wise. It is also possible to modify the program to obtain the other statistics like min, max, median, 1st and 3rd quartiles etc. for number of variables group-wise.

Leave a Reply