SUBDOMAINS:
For multi-language POS, URLs should be segregated into their respective GKMS’; all others should be marked deleted. For example, gkms_th_en (a secondary-language POS), should have ONLY www.expedia.co.th/en/* URLs, and gkms_th_th (the primary-language POS) should NOT have any www.expedia.co.th/en/* URLs.
Also, there will likely be a lot of strange sub-domains or even full domains outside of the given POS, after STAT and EDW URLs are imported. These should be marked “deleted” in the ds.SEO_StaticURLs feed and SMG re-run to get rid of them. Note that some sub-domains are likely acceptable; discretion required.
SEARCH PAGES:
Search pages within the POS’ domain are likely undesirable; e.g. http://www.expedia.de/Hotel-Search. Remove this similarly.
An easy way to search for these is with a query like
select * from Site_URL with(nolock) where template_id = 102 and url_no_protocol like 'www.expedia.de/%' order by len(url)
or
select * from Site_URL with(nolock) where template_id = 102 and url_no_protocol like '%search' and url_no_protocol not like '%.packagesearch' order by len(url)
MOBILE PAGES:
Mobile pages are generally undesirable; e.g. https://www.expedia.se/m/trips. Remove these pages by marking as deleted in ds.seo_staticURLs
An easy way to search for these is with a query like
select * from Site_URL where template_id = 102 and url_no_protocol like 'www.expedia.se/m/%' order by len(url);
CMS URLs:
CMS URLs canonically end with slashes, but often show up in EDW and STAT without. These should be filtered out manually with something like
For VC deployments:
update ds.SEO_StaticURLs set deleted = 1 where deleted = 0 and template_id = 102 and url like 'http://{domain}/vc/{cms_top_dir}/%[^/]'
For non-VC deployments:
update ds.SEO_StaticURLs set deleted = 1 where deleted = 0 and template_id = 102 and url like 'http://{lob}.expedia.%[^/]'
“expedia-de” completed.
“travelocity-com” completed.
“lastminute-au” completed.
“expedia-th-en” completed.
“expedia-th-th” completed.