google search appliance - Extracting directory structure as metadata using entity recognition -


i attempting extract values within url pattern , apply them metadata using regular expressions , entity recognition applied url.

url: https://example.com/folder1/folder2/folder3/folder4/page.html

regex: https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/ 

this should extract folder3. has been tested , works on regex101 , using reggyapp.com (which uses google re2 engine, gsa uses)

https://regex101.com/r/af2jr0/2

however when uploading gsa entity recognition file not recognise it.

<?xml version="1.0"?> <instances>  <instance>     <name>ignoredname</name>     <pattern>https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/</pattern>     <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name> </instance>  </instances> 

by default, gsa's entity recognition stores text extracted pattern. can remove following portion xml. <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name>. try without store_regex_or_name element.


Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

how to prompt save As Box in Excel Interlop c# MVC 4 -

xslt 1.0 - How to access or retrieve mets content of an item from another item? -