How to create bigrams using dictionary in R? -


i have dictionary of words have stored in dictionary.txt file. contains trigrams , bigrams. given paragraph:

"in order perform operations inside abdomen, surgeons must make incision large enough offer adequate visibility, provide access abdominal organs , allow use of hand-held surgical instruments.  these incisions may placed in different parts of abdominal wall.  depending on size of patient , type of operation, incision may 6 12 inches in length.  there significant amount of discomfort associated these incisions can prolong time spent in hospital after surgery , can limit how patient can resume normal daily activities.  because traditional techniques have long been used , taught generations of surgeons, available , considered standard treatment newer techniques must compared." 

the dictionary.txt file includes following words:

hand-held surgical instruments intensive care unit traditional techniques 

now want create bigrams words not present in dictionary.txt.

i have used following code in r:

bigramtokenizer <- function(x) ngramtokenizer(x, weka_control(min=2,max=2)) 

can me tell code same in r

based on text , dictionary, created bigrams of both, , removed bigrams dictionary bigrams of paragraph.

t <- "in order perform operations inside abdomen, surgeons must make incision large enough offer adequate visibility, provide access abdominal organs , allow use of hand-held surgical instruments.  these incisions may placed in different parts of abdominal wall.  depending on size of patient , type of operation, incision may 6 12 inches in length.  there significant amount of discomfort associated these incisions can prolong time spent in hospital after surgery , can limit how patient can resume normal daily activities.  because traditional techniques have long been used , taught generations of surgeons, available , considered standard treatment newer techniques must compared."   dictionary <- c("hand-held surgical instruments", "intensive care unit", "traditional techniques")  bigrams_dict <- bigramtokenizer(dictionary) bigrams_text <- bigramtokenizer(t)  bigrams_text[!bigrams_text %in% bigrams_dict] 

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -