google search appliance - Extracting directory structure as metadata using entity recognition -
i attempting extract values within url pattern , apply them metadata using regular expressions , entity recognition applied url
.
url: https://example.com/folder1/folder2/folder3/folder4/page.html
regex: https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/
this should extract folder3
. has been tested , works on regex101
, using reggyapp.com
(which uses google re2
engine, gsa
uses)
however when uploading gsa
entity recognition file not recognise it.
<?xml version="1.0"?> <instances> <instance> <name>ignoredname</name> <pattern>https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/</pattern> <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name> </instance> </instances>
by default, gsa's entity recognition stores text extracted pattern. can remove following portion xml. <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name>
. try without store_regex_or_name element.
Comments
Post a Comment