Fetch-TG Crawl & Import

DEPENDENCIES:

  • Import/Update: Atlas Region Data
  • Import/Update Data: Hotel Property Data
  • Current URL template_elements for all travel guide (TG) URLs

OVERVIEW:
Crawl the site and extract the correct TG names and URLs for the following:

  • TG Hotels (hotel properties)
  • TG POI
  • TG Airports
  • TG Region Names for each LOB: Hotels, Flights, Car Rentals and Vacations

We run all regions through key URL templates for each LOB. We run each LOB just in case there are differences in the region names per LOB (there have been in the past).

Note that each crawl will very likely result in 400’s or 500’s status codes, which is probably a failure due to high load. These URLs should be retried (set is_processed = 0 and re-run 2-fetch.pl).

Reference: https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg

PROCESS:

  1. Crawl site and extract correct TG names and URLs
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg/all-create-and-populate.sh (which is destructive)
  2. Fetch and parse:
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg/all-fetch-and-parse.sh
  3. Monitor status with
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg/all-status.pl
  4. Requeue failures with “-r” and retry at step 2.
     
  5. Update sitemap with correct TG names for each LOB. Run:
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/site_map_validator/queries/import-validation-from-fetch-tg.sql

History

Leave a Reply