regex - gsub replace and preserve case -
i've been using gsub abbreviate words in longer strings. i'd abbreviate word , inherit of capitalization of input can.
example, turn hello hi in this:
x <- c("hello world", "hello world", "hello world", "hello world")
but respect case of hello in original
c("hi world", "hi world", "hi world", "hi world")
most of examples want match "hi" "hi" , "hi". don't care "hi", completeness, leave possibility.
to done until now, have tedious approach of maintaining vectors of strings of targets , replacements
xin <- c("hello\ ", "hello\ ", "hello\ ", "hello\ ") xout <- c("hi ", "hi ", "hi ", "hi ") mapply(gsub, xin, xout, x)
that gives correct answer, see:
hello hello hello hello "hi world" "hi world" "hi world" "hi world"
but embarrassing , time consuming , inflexible! far, have family of 50 words seek abbreviation, , keeping of case combinations tiresome.
the data full of mixed-case data chaos because humans typed in 78000 records , capitalized words department , university in every conceivable way. long sentences typed don't fit in space allowed on printed page, , asked shorten them "dept" , "univ". want preserve capitalization if possible.
the idea have looks not r me. split original input, tabulate existing capitalization first 2 letters.
xcap <- sapply(strsplit(x, split = ""), function(x) x %in% letters)[1:2, ] > t(xcap) [,1] [,2] [1,] true false [2,] true true [3,] false false [4,] false true
i'm pretty sure use capitalization information make work right. haven't yet succeeded. i've become aware of g grothendieck's package gsubfn might work, terminology there ("proto" objects) new me.
i'll keep going in direction, probably, asking if there more direct route.
pj
your idea inspired me write code. done in 1 sapply block. toupper function used capitalize splitted characters of xout string.
x <- c("hello world", "hello world", "hello world", "hello world") sapply(x, function(x,xout) { xcap<-(unlist(strsplit(unlist(strsplit(x," "))[1],"")) %in% letters) n<-nchar(xout) if(length(xcap)>=n) { xcap<-xcap[1:n] }else { xcap<-c(xcap,rep(tail(xcap,1),n-length(xcap))) } xout<-paste(sapply(1:n,function(x) { if(xcap[x]) toupper(unlist(strsplit(xout,""))[x]) else unlist(strsplit(xout,""))[x] }),sep = "",collapse = "") xin<-"hello" gsub(xin,xout,x[1],ignore.case = t) },xout="selamlar") [output "selamlar"] hello world hello world hello world hello world "selamlar world" "selamlar world" "selamlar world" "selamlar world" [output "hi"] hello world hello world hello world hello world "hi world" "hi world" "hi world" "hi world"
Comments
Post a Comment