How to create bigrams using dictionary in R? -


i have dictionary of words have stored in dictionary.txt file. contains trigrams , bigrams. given paragraph:

"in order perform operations inside abdomen, surgeons must make incision large enough offer adequate visibility, provide access abdominal organs , allow use of hand-held surgical instruments.  these incisions may placed in different parts of abdominal wall.  depending on size of patient , type of operation, incision may 6 12 inches in length.  there significant amount of discomfort associated these incisions can prolong time spent in hospital after surgery , can limit how patient can resume normal daily activities.  because traditional techniques have long been used , taught generations of surgeons, available , considered standard treatment newer techniques must compared." 

the dictionary.txt file includes following words:

hand-held surgical instruments intensive care unit traditional techniques 

now want create bigrams words not present in dictionary.txt.

i have used following code in r:

bigramtokenizer <- function(x) ngramtokenizer(x, weka_control(min=2,max=2)) 

can me tell code same in r

based on text , dictionary, created bigrams of both, , removed bigrams dictionary bigrams of paragraph.

t <- "in order perform operations inside abdomen, surgeons must make incision large enough offer adequate visibility, provide access abdominal organs , allow use of hand-held surgical instruments.  these incisions may placed in different parts of abdominal wall.  depending on size of patient , type of operation, incision may 6 12 inches in length.  there significant amount of discomfort associated these incisions can prolong time spent in hospital after surgery , can limit how patient can resume normal daily activities.  because traditional techniques have long been used , taught generations of surgeons, available , considered standard treatment newer techniques must compared."   dictionary <- c("hand-held surgical instruments", "intensive care unit", "traditional techniques")  bigrams_dict <- bigramtokenizer(dictionary) bigrams_text <- bigramtokenizer(t)  bigrams_text[!bigrams_text %in% bigrams_dict] 

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

how to prompt save As Box in Excel Interlop c# MVC 4 -

xslt 1.0 - How to access or retrieve mets content of an item from another item? -