google search appliance - Extracting directory structure as metadata using entity recognition -
i attempting extract values within url pattern , apply them metadata using regular expressions , entity recognition applied url.
url: https://example.com/folder1/folder2/folder3/folder4/page.html
regex: https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/ this should extract folder3. has been tested , works on regex101 , using reggyapp.com (which uses google re2 engine, gsa uses)
however when uploading gsa entity recognition file not recognise it.
<?xml version="1.0"?> <instances> <instance> <name>ignoredname</name> <pattern>https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/</pattern> <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name> </instance> </instances>
by default, gsa's entity recognition stores text extracted pattern. can remove following portion xml. <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name>. try without store_regex_or_name element.
Comments
Post a Comment