USE CASE:
Identifying English keywords on a non-English site such as Thailand (Th-th) where the ILG Exclusion filter can’t catch them and perhaps the Specialist wishes to remove them from SWXL.
CONTRAINDICATIONS:
Be aware that some templates such as 1 and 115 do not have translated hotel or poi names, thus they will have a large number of English keywords generated. By removing all English keywords, you could be leaving a large number of URLs without associated keywords, effectively dropping the URLs from SWXL.
QUERY:
select top 1000
kw.id,
kw.keyword
from
dbo.kw kw with(nolock)
join kw_collection kwc with(nolock) on kwc.kw_id = kw.id
join collection c with(nolock) on kwc.collection_id = c.id
where
--Contains all normal English characters.
patindex(N'%[^ !-~()"]%' COLLATE Latin1_General_BIN, keyword) = 0
--In ILG Inclusion List
and c.name = 'ILG Inclusion List'
--Not in ILG Exclusion List
and not exists (
select 1 from
kw_collection kwc2 with(nolock)
join collection c2 with(nolock) on c2.id = kwc2.collection_id
where
kwc2.kw_id = kw.id
and c.name = 'ILG Exclusion List')
;
NOTES:
This query only differentiates between Latin characters and non-Latin characters. It can distinguish an English keyword like ‘hello’ from a Thai keyword like ‘เกสท์เฮาส์ อิแทวอน’, but it cannot distinguish a keyword like ‘Hoteles en Umeda’ as being either English or Spanish since there are no accents or wide characters.
Characters with accents will not be selected and this will need attention if you need to select for those as well. You can change the patindex clause to “>0” instead of “=0” to see the behavior.