LearningR-数据处理
副标题[/!--empirenews.page--]
1. R自带函数1.1 转置使用函数t()可对一个矩阵或数据框进行转置,对于数据框,行名将变成变量(列)名。 cars <- mtcars(1:5,1:4) cars t(cars) 数列array进行维度转换 aperm x <- array(1:24,2:4) xt <- aperm(x,c(2,1,3)) dim(x) dim(xt) 1.2 整合数据aggregate在R中使用一个或多个by变量和一个预先定义好的函数来折叠(collapse)数据。调用格式为: aggregate(x,by,FUN) 其中x是待折叠的数据对象,by饰一个变量名组成的列表,这些变量将被去掉以新的观测,而FUN则是用来计算表述性统计量的标量函数,它将被用来计算新观测中的值。 options(digits=2) attach(mtcars) mydata <- aggregate(mtcars,by=list(cyl,gear),FUN=mean,na.rm=TRUE) mydata by中的变量必须在一个列表中(即使只有一个变量)。也可以在列表中为各组声明自定义的名称,例如by=list(Group.cyl=cyl,Group.gears=gear)。 ## example with character variables and NAs testDF <- data.frame(v1 = c(1,3,5,7,8,NA,4,9),v2 = c(11,33,55,77,88,44,99) ) by1 <- c("red","blue",2,"big","red",12) by2 <- c("wet","dry",99,95,"damp",NA) aggregate(x = testDF,by = list(by1,by2),FUN = "mean") # and if you want to treat NAs as a group fby1 <- factor(by1,exclude = "") fby2 <- factor(by2,exclude = "") aggregate(x = testDF,by = list(fby1,fby2),FUN = "mean") ## Formulas,one ~ one,one ~ many,many ~ one,and many ~ many: aggregate(weight ~ feed,data = chickwts,mean) aggregate(breaks ~ wool + tension,data = warpbreaks,mean) aggregate(cbind(Ozone,Temp) ~ Month,data = airquality,mean) aggregate(cbind(ncases,ncontrols) ~ alcgp + tobgp,data = esoph,sum) ## Dot notation: aggregate(. ~ Species,data = iris,mean) aggregate(len ~ .,data = ToothGrowth,mean) ## Often followed by xtabs(): ag <- aggregate(len ~ .,mean) xtabs(len ~ .,data = ag) ## Compute the average annual approval ratings for American presidents. aggregate(presidents,nfrequency = 1,FUN = mean) ## Give the summer less weight. aggregate(presidents,FUN = weighted.mean,w = c(1,0.5,1)) 1.3 apply待整理 1.4 union和intersectx <- c(sort(sample(1:20,9)),NA) y <- c(sort(sample(3:23,7)),NA) union(x,y) intersect(x,y) setdiff(x,y) setdiff(y,x) setequal(x,y) #%in% (1:10) %in% c(3,12) "%w/o%" <- function(x,y) x[!x %in% y] (1:10) %w/o% c(3,12) sstr <- c("c","ab","B","bba","c","@","bla","a","Ba","%") sstr %in% c(letters,LETTERS) 1.5 合并 cbind和rbind纵向合并数据通常用于向数据框中添加观测。
注:两个数据框行(列)数必须相同。如果x中拥有y中没有的变量,在合并它们之前需做以下处理: (1)删除dataframeA中的多余变量; (2)在dataframeB中创建追加的变量并将其值设为NA(缺失)。 x1 <- c(1:5) x2 <- c(21:25) x3 <- c(31:35) r1 <- cbind(x1,x2) r2 <- rbind(x1,x2) r31 <- cbind(r1,x3) r32 <- rbind(r2,x3) 1.6 匹配合并 mergemerge效果同dplyr的join,join的效力更高。
#authors和books authors <- data.frame( surname = I(c("Tukey","Venables","Tierney","Ripley","McNeil")),nationality = c("US","Australia","US","UK","Australia"),deceased = c("yes",rep("no",4))) books <- data.frame( name = I(c("Tukey","McNeil","R Core")),title = c("Exploratory Data Analysis","Modern Applied Statistics ...","LISP-STAT","Spatial Statistics","Stochastic Simulation","Interactive Data Analysis","An Introduction to R"),other.author = c(NA,"Venables & Smith")) m1 <- merge(authors,books,by.x = "surname",by.y = "name") m2 <- merge(books,authors,by.x = "name",by.y = "surname") #m1和m2结果相同,只是结果的列名不同。 #left_join m3 <- merge(authors,by.y = "name",all.x = T,all.y = F) #right_join m4 <- merge(authors,all.x = F,all.y = T) #full_join m5 <- merge(authors,all = TRUE) m11 <- inner_join(authors,by=c("surname"="name")) m22 <- inner_join(books,by=c("name"="surname")) m33 <- left_join(authors,by=c("surname"="name")) m44 <- right_join(authors,by=c("surname"="name")) m55 <- full_join(authors,by=c("surname"="name")) 1.7 排除重复数据 uniqueunique 函数可以去掉向量、数据框或类似数列的数据中重复的元素。 x <- c(9:20,1:5,3:7,0:8) y <- unique(x) #下列方式业可以,但unique方式效率更高. #duplicated 函数返回了元素是否重复的逻辑值. y1 <- x[!duplicated(x)] 2. reshape2包首先将数据“融合”(melt),以使每一行都是一个唯一的标识符-变量组合。 (编辑:阜新站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |