How to create bigrams using dictionary in R? -
i have dictionary of words have stored in dictionary.txt file. contains trigrams , bigrams. given paragraph:
"in order perform operations inside abdomen, surgeons must make incision large enough offer adequate visibility, provide access abdominal organs , allow use of hand-held surgical instruments. these incisions may placed in different parts of abdominal wall. depending on size of patient , type of operation, incision may 6 12 inches in length. there significant amount of discomfort associated these incisions can prolong time spent in hospital after surgery , can limit how patient can resume normal daily activities. because traditional techniques have long been used , taught generations of surgeons, available , considered standard treatment newer techniques must compared."
the dictionary.txt file includes following words:
hand-held surgical instruments intensive care unit traditional techniques
now want create bigrams words not present in dictionary.txt.
i have used following code in r:
bigramtokenizer <- function(x) ngramtokenizer(x, weka_control(min=2,max=2))
can me tell code same in r
based on text , dictionary, created bigrams of both, , removed bigrams dictionary bigrams of paragraph.
t <- "in order perform operations inside abdomen, surgeons must make incision large enough offer adequate visibility, provide access abdominal organs , allow use of hand-held surgical instruments. these incisions may placed in different parts of abdominal wall. depending on size of patient , type of operation, incision may 6 12 inches in length. there significant amount of discomfort associated these incisions can prolong time spent in hospital after surgery , can limit how patient can resume normal daily activities. because traditional techniques have long been used , taught generations of surgeons, available , considered standard treatment newer techniques must compared." dictionary <- c("hand-held surgical instruments", "intensive care unit", "traditional techniques") bigrams_dict <- bigramtokenizer(dictionary) bigrams_text <- bigramtokenizer(t) bigrams_text[!bigrams_text %in% bigrams_dict]
Comments
Post a Comment