regex - gsub replace and preserve case -


i've been using gsub abbreviate words in longer strings. i'd abbreviate word , inherit of capitalization of input can.

example, turn hello hi in this:

x <- c("hello world", "hello world", "hello world", "hello world") 

but respect case of hello in original

c("hi world", "hi world", "hi world", "hi world") 

most of examples want match "hi" "hi" , "hi". don't care "hi", completeness, leave possibility.

to done until now, have tedious approach of maintaining vectors of strings of targets , replacements

xin <- c("hello\ ", "hello\ ", "hello\ ", "hello\ ") xout <- c("hi ", "hi ", "hi ", "hi ") mapply(gsub, xin, xout, x) 

that gives correct answer, see:

     hello      hello      hello      hello "hi world" "hi world" "hi world" "hi world" 

but embarrassing , time consuming , inflexible! far, have family of 50 words seek abbreviation, , keeping of case combinations tiresome.

the data full of mixed-case data chaos because humans typed in 78000 records , capitalized words department , university in every conceivable way. long sentences typed don't fit in space allowed on printed page, , asked shorten them "dept" , "univ". want preserve capitalization if possible.

the idea have looks not r me. split original input, tabulate existing capitalization first 2 letters.

xcap <- sapply(strsplit(x, split = ""), function(x) x %in% letters)[1:2, ] > t(xcap)       [,1]  [,2] [1,]  true false [2,]  true  true [3,] false false [4,] false  true 

i'm pretty sure use capitalization information make work right. haven't yet succeeded. i've become aware of g grothendieck's package gsubfn might work, terminology there ("proto" objects) new me.

i'll keep going in direction, probably, asking if there more direct route.

pj

your idea inspired me write code. done in 1 sapply block. toupper function used capitalize splitted characters of xout string.

x <- c("hello world", "hello world", "hello world", "hello world")  sapply(x, function(x,xout) {   xcap<-(unlist(strsplit(unlist(strsplit(x," "))[1],"")) %in% letters)   n<-nchar(xout)   if(length(xcap)>=n) {    xcap<-xcap[1:n]   }else {     xcap<-c(xcap,rep(tail(xcap,1),n-length(xcap)))     }   xout<-paste(sapply(1:n,function(x) {     if(xcap[x]) toupper(unlist(strsplit(xout,""))[x])     else unlist(strsplit(xout,""))[x]     }),sep = "",collapse = "")   xin<-"hello"   gsub(xin,xout,x[1],ignore.case = t)   },xout="selamlar")  [output "selamlar"]  hello world      hello world      hello world      hello world  "selamlar world" "selamlar world" "selamlar world" "selamlar world"   [output "hi"] hello world hello world hello world hello world  "hi world"  "hi world"  "hi world"  "hi world"  

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -