google search appliance - Extracting directory structure as metadata using entity recognition -

i attempting extract values within url pattern , apply them metadata using regular expressions , entity recognition applied url.

url: https://example.com/folder1/folder2/folder3/folder4/page.html

regex: https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/

this should extract folder3. has been tested , works on regex101 , using reggyapp.com (which uses google re2 engine, gsa uses)

https://regex101.com/r/af2jr0/2

however when uploading gsa entity recognition file not recognise it.

<?xml version="1.0"?> <instances>  <instance>     <name>ignoredname</name>     <pattern>https:\/\/example\.com\/folder1\/folder2\/([^\/]*).*[^\/]*\/</pattern>     <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name> </instance>  </instances>

by default, gsa's entity recognition stores text extracted pattern. can remove following portion xml. <store_regex_or_name> regex_tagged_as_first_group </store_regex_or_name>. try without store_regex_or_name element.

Search This Blog

Guide

google search appliance - Extracting directory structure as metadata using entity recognition -

Comments

Post a Comment

Popular posts from this blog

swift - Button on Table View Cell connected to local function -

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

c# - ajax - How to receive data both html and json from server? -