google search appliance - Extracting directory structure as metadata using entity recognition -

i attempting extract values within url pattern , apply them metadata using regular expressions , entity recognition applied url.

url: https://example.com/folder1/folder2/folder3/folder4/page.html

regex: https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/

this should extract folder3. has been tested , works on regex101 , using reggyapp.com (which uses google re2 engine, gsa uses)

https://regex101.com/r/af2jr0/2

however when uploading gsa entity recognition file not recognise it.

<?xml version="1.0"?> <instances>  <instance>     <name>ignoredname</name>     <pattern>https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/</pattern>     <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name> </instance>  </instances>

by default, gsa's entity recognition stores text extracted pattern. can remove following portion xml. <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name>. try without store_regex_or_name element.

Search This Blog

Guide

google search appliance - Extracting directory structure as metadata using entity recognition -

Comments

Post a Comment

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -