Fetch-TG Crawl & Import

DEPENDENCIES:

  • Import/Update: Atlas Region Data
  • Import/Update Data: Hotel Property Data
  • Current URL template_elements for all travel guide (TG) URLs

OVERVIEW:
Crawl the site and extract the correct TG names and URLs for the following:

  • TG Hotels (hotel properties)
  • TG POI
  • TG Airports
  • TG Region Names for each LOB: Hotels, Flights, Car Rentals and Vacations

We run all regions through key URL templates for each LOB. We run each LOB just in case there are differences in the region names per LOB (there have been in the past).

Note that each crawl will very likely result in 400’s or 500’s status codes, which is probably a failure due to high load. These URLs should be retried (set is_processed = 0 and re-run 2-fetch.pl).

Reference: https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg

PROCESS:

  1. Crawl site and extract correct TG names and URLs
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg/all-create-and-populate.sh (which is destructive)
  2. Fetch and parse:
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg/all-fetch-and-parse.sh
  3. Monitor status with
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/fetch-tg/all-status.pl
  4. Requeue failures with “-r” and retry at step 2.
     
  5. Update sitemap with correct TG names for each LOB. Run:
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/site_map_validator/queries/import-validation-from-fetch-tg.sql

Client SWXL Workbook Integration

If the specialist specifies that URL templates should be added/enabled, check the following:

  • Ensure that root-variants are set correctly and reference the intended datafeeds.
  • Confirm that we have received datafeeds for all new custom templates.
  • Confirm that all custom datafeeds received contain the required information.

 

Make Main Assignments

OVERVIEW:

Once all associators have been updated, the main assignments need to be updated, which picks winning associations from all associators. This must be run for all languages.

PROCESS:

exec kua.usp_make_main_assignments;

Review KW/URL Associations

OVERVIEW:

Ask the client to review and approve/notate the KW-URL associations for at least the G1 terms. The full review should happen on the primary language; Secondary language associations that need to be reviewed should focus on STAT Google associations; the rest will all be KWG and User Associations and should be assumed to be trustworthy.

PROCESS:

  • Review KW-URL Associations:
    https://docs.google.com/a/myersmediagroup.com/document/d/1gtUgKgA4YYxhm6JVTp4bC4lYKhwXHgNrdTO1EJFb3Rg/edit#heading=h.66578c6fc556
  • Adjust Associations based upon Specialist/analyst review.

NOTES:

https://mmgteam.atlassian.net/wiki/display/SWXL/PD%3A+SWXL+QA+Processes+and+Procedures
For specifics, see the Powerpoint referenced on the above link to Confluence.

 

Generate XML Sitemaps

OVERVIEW:

  • Generate updated XML Sitemap – All Languages
  • Upload all sitemap files – All Languages
  • Notify PM to inform SEO Specialist of updated Sitemap drop – All Languages

PROCESS:

  1. Generate Updated XML Sitemap, creating a new sitemap_index file.
https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SiteMapManager/trunk/SMMXMLConverter/bin

 

  1. Download existing sitemap_index file from E.com, so you can open it and see what other sitemaps Expedia devs have added to that file.
     
  2. Copy those out of there and put them into the new sitemap_index file you have just created.
     
  3. Place in the POS’ Google Drive (client shared) folder, in a directory called “XML Sitemap”. Notify POS SEO Specialist. For non-primary languages, add a suffix to the .zip file created to distinguish it from the primary-language .zip file and upload all sitemap files to the project shared folder in Google Drive.
     
  4. Notify PM to inform SEO Specialist of updated Sitemap drop.

Update Product-Hotels

 

NEW METHOD

  1. Go to
    https://expediaintegration.zendesk.com/hc/en-us/restricted?return_to=https%3A%2F%2Fexpediaintegration.zendesk.com%2Fhc%2Fen-us
    • Email: it-services-signup@myersmediagroup.com
    • Username: MMG IT
    • Password: markthequark
  2. Go to “Hotels”
  3. Go to “Hotel Inventory (Hotel ID) List” and download Agency+Merchant for your Point of Sale (four files, everything but descriptions)
  4. Go to “Hotel Descriptions & Amenities” and download Agency+Merchant for your Point of Sale (Descriptions file)
  5. Import downloaded files manually
    • You should have five .csv.gz files (Added, Deleted, Updated, All and Description)
    • Unzip all files to .csv files
    • Run https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/DataSources/Product/trunk/script/1-download-and-import.pl
      • Note the “Download” and “Unzip” part of the script has been commented out
    • QA to make sure:
      • Product Hotels view to refer to new files
      • Product Hotels synonym to refer to new view
      • dbo.config has new Product Hotels version
  6. Clone updated hotels from Primary to 2nd language.
  7. Clone updated hotels from Primary to 3rd language.

 

OLD METHOD (OBSOLETE)

Download and import latest Hotels property data from Expedia API

Run https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/DataSources/Product/trunk/script/1-download-and-import.pl .

This script handles downloading and importing of Hotel Property data.

 

Review LOB Link Distributions

OVERVIEW:

  • Review Link Distributions – Primary Language
  • Review Link Distributions – 2nd language
  • Review Link Distributions – 3rd language

PROCESS:

An important element of any SWXL update is to ensure that the link distributions across Lines of Business are in-line with the business objectives identified by the Specialist. These objectives are usually indicated / discussed in the kickoff meeting at the beginning of the project.

Specifically, we should take into consideration the relative % of links that are assigned to Hotels, Flights, and Vacation Packages, and compare them to less-profitable LOBs or those that are not as heavily targeted such as Cruises, Cars, and General.

While flights are not as profitable, they typically will have a fairly high relative % of links (and URLs and KWs) because there is typically a high count of Flights-related URLs in inventory (especially if the POS has a Flights CMS). Keyword counts tend to be fairly high as well, as there are a number of variations to the core Keyword “flights to _”.

The QA team should also take into consideration live versus update statistics, to see that growth across LOBs in terms of URLs, KWs and Links makes sense – did new pages (inventory) get added? Did the ratio of KW:URL change? Did the specialist add or delete KW variants?