Run ILG Inclusion/Exclusion Filter

OVERVIEW:

  • Define ILG Inclusion List criteria
  • Define ILG Exclusion List criteria
  • Run ILG Inclusion / Exclusion process to filter out any brand or bad keywords

PROCESS:

  • Define ILG Inclusion List criteria
    • The criteria are stored in table dbo.ILG_Inclusion_List_Config. Generally,
      • If keyword is from user or KWG, then it should be in ILG inclusion list
      • If keyword is from EDW, it needs to have Google SEO clicks >= 3 and is not longer than 40 characters
  • Define ILG Exclusion List criteria
  • There are three places script finds excluded keywords automatically:
    • Site_Brand_KW_Pattern: Patterns for brand keywords
      • Bad_KW_Pattern: Patterns that are bad for using as keywords
      • Negative_KW_Pattern: Negative strings like ‘sex’

When each site was set up it gets default lists of Site_Brand_KW_Pattern and Bad_KW_Pattern. After that, we don’t maintain the same list across different POS, i,e. each POS can have its own lists.

  • Run ILG Inclusion / Exclusion process to filter

Rebuild the keyword index and execute stored procedure to update ILG inclusion/exclusion list.

https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/ILG/trunk/util/update-ilg-inclusion-exclusion-list.sql

NOTES:

To check how many keywords need to be checked overall:

USE gkms_{POS}

declare @max int, @value int

set @max = (select top 1 id from kw with(nolock) order by id desc)
set @value = (select value from config with(nolock) where name = 'Exclusion_Checked_Max_KW_ID')

select
 (@max - @value) diff

Generate ILG QA Report

OVERVIEW:

Review ILG results and confirm no outstanding changes, issues or anomalies.

PROCESS:

https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/ILG/trunk/qa/ILG_QA_Report.sql

NOTES:

https://drive.google.com/drive/u/1/folders/0BxGkM7mSsh-4bUxUMHBWYnBieDQ, “{pos} QA ILG yyyymmdd.xlsx”

KWG QA

OVERVIEW:

QA the eSV generated from the KWG run. Look at eSV factors if they need adjusting. Check Hotel Properties eSV calculation because the eSV for T1 terms is generated from Hotels GB and not rSV. This is an internal QA review of the report itself; it is a preliminary review before we ask the SEO Specialist to review Keyword Generation in the GSI.

PROCESS:

The scripts needed to generate the KWG QA report can be found in:
https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/KWGeneration/trunk/KWExplosion/qa

Particularly, the following scripts may be most useful:
https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/KWGeneration/trunk/KWExplosion/qa/kw-count-by-template.sql

CHECK:

  1. All URL templates have had the correct KW variants successfully generated for them.
  2. That Multi-City and Neighborhood URL templates have had the most important terms generated and assigned to them.

NOTES:

https://docs.google.com/a/myersmediagroup.com/document/d/1gtUgKgA4YYxhm6JVTp4bC4lYKhwXHgNrdTO1EJFb3Rg/edit#heading=h.f8220438e5cc

Import Default SWXL Data

Import Default SWXL Data from another POS with the same language.

  • URL template definitions and priorities
  • Sitemap datafeeds
  • Root variants
  • KW variants
  • KW-URL associators
  • Datasources
  • Other, as needed
  • Check to see if any localization needs to be done

Consider the size, location and language of the POS to others like or nearby it.

Compute KW Variant eSV Factors

DEPENDENCIES:

  • Request review of Top 100 Regions
  • Request review of LOB KW Variants
  • Run 03a-Updated_Latest_Data.sql

OVERVIEW:

Generate eSV Factors after the average SV has been calculated for all keyword variants. This includes standard and new URL templates. eSV Factors will be used to
generate KW priority in future steps.

PROCESS:

  1. After getting SEOClarity data back, to update eSV factors, run https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/KWGeneration/trunk/KWExplosion/sql/update-eSV-factor.sql
    (This is also step 5B of eSV-Research-process.txt. Run first with defaults to do a dry-run and review.)
  2. Review all eSV factors on kw variants for all URL templates. This should be done once on any new SWXL to ensure the data didn’t incorrectly assign any factors.
    https://docs.google.com/a/myersmediagroup.com/document/d/1gtUgKgA4YYxhm6JVTp4bC4lYKhwXHgNrdTO1EJFb3Rg/edit#heading=h.f8220438e5cc

NOTES:

When manually setting eSV factors, avoid using the same value. For example, you can set one factor to be 0.011 and the other one 0.009, instead of setting them both to be 0.01.

Run URL Prioritization (Steps 1-6)

OVERVIEW:

https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/URLPriority/note.txt (Steps 1-6)

PROCESS:

1. Apply migrations under https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/URLPriority/db/migration

2. Make sure the following columns are populated for every template. Note the default is 0 for those columns, so 0-values could mean the correct value was not set, especially for custom templates.

  • url_priority_value
    • Higher value means higher priority.
    • Avoid using the same value across different templates.
  • url_region_priority_damping_factor
    • The multiplying factor for region priority of the URLs in this template.
    • For example Hotels Reviews (t5) should have a very low value, to get lower region priority.
  • url_region_priority_default
    • Default value when region priority of the URLs in this template cannot be calculated.
    • If all URLs in a given template should have region information (region_id), then the url_region_priority_default should be 0.

3. Create entries in URL_Priority_Factor

insert into up.URL_Priority_Factor (name, description, query, need_normalized, weight) values ('Visits', 'Total visits from SEO, Google channel only during last 12 months.', 'select suv.site_url_id as url_id, isnull(e.visitor_count, 0) as value from dbo.SMM_URL_Validation suv with(nolock) left outer join (select site_url_id, sum(visitor_count) as visitor_count from dbo.EDW_URLBased_12mo with(nolock) where site_url_id is not null and search_engine_id = 1 and search_type_id = 1 group by site_url_id) e on suv.site_url_id = e.site_url_id', 1, 10001);
 insert into up.URL_Priority_Factor (name, description, query, need_normalized, weight) values ('Transactions', 'Total transactions from SEO, Google channel only during last 12 months.', 'select suv.site_url_id as url_id, isnull(e.transactions, 0) as value from dbo.SMM_URL_Validation suv with(nolock) left outer join (select site_url_id, sum(transactions) as transactions from dbo.EDW_URLBased_12mo with(nolock) where site_url_id is not null and search_engine_id = 1 and search_type_id = 1 group by site_url_id) e on suv.site_url_id = e.site_url_id', 1, 9001);
 insert into up.URL_Priority_Factor (name, description, query, need_normalized, weight) values ('Gross Bookings', 'Total gross bookings from SEO, Google channel only during last 12 months.', 'select suv.site_url_id as url_id, isnull(e.gross_bookings_usd, 0) as value from dbo.SMM_URL_Validation suv with(nolock) left outer join (select site_url_id, sum(gross_bookings_usd) as gross_bookings_usd from dbo.EDW_URLBased_12mo with(nolock) where site_url_id is not null and search_engine_id = 1 and search_type_id = 1 group by site_url_id) e on suv.site_url_id = e.site_url_id', 1, 7001);
 insert into up.URL_Priority_Factor (name, description, query, need_normalized, weight) values ('Shoppers', 'Total shoppers from SEO, Google channel only during last 12 months.', 'select suv.site_url_id as url_id, isnull(e.shopper_counts, 0) as value from dbo.SMM_URL_Validation suv with(nolock) left outer join (select site_url_id, sum(shopper_counts) as shopper_counts from dbo.EDW_URLBased_12mo with(nolock) where site_url_id is not null and search_engine_id = 1 and search_type_id = 1 group by site_url_id) e on suv.site_url_id = e.site_url_id', 1, 6001);
 insert into up.URL_Priority_Factor (name, description, query, need_normalized, weight) values ('Region Priority', 'URL''s region priority, calculated from multiplying region priority with dbo.Template.url_region_priority_damping_factor.', 'select suv.site_url_id as url_id, t.url_region_priority_damping_factor * isnull(rpf.priority_factor, t.url_region_priority_default) as value from dbo.SMM_URL_Validation suv with(nolock) join dbo.Template t with(nolock) on suv.template_id = t.template_id left outer join dbo.Site_URL_Contextual suc with(nolock) on suv.site_url_id = suc.site_url_id left outer join dbo.Region_Priority_Factor rpf with(nolock) on suc.region_id = rpf.region_id where suv.template_id <> 3008 union select suv.site_url_id as url_id, t.url_region_priority_damping_factor * isnull((r1.priority_factor+r2.priority_factor)/2, t.url_region_priority_default) as value from dbo.SMM_URL_Validation suv with(nolock) join dbo.Template t with(nolock) on suv.template_id = t.template_id left outer join ds.Route_Region rr with(nolock) on suv.site_url_id = rr.site_url_id left outer join dbo.Region_Priority_Factor r1 with(nolock) on rr.origin_region_id = r1.region_id left outer join dbo.Region_Priority_Factor r2 with(nolock) on rr.destination_region_id = r2.region_id where suv.template_id = 3008', 0, 19.1);
 insert into up.URL_Priority_Factor (name, description, query, need_normalized, weight) values ('Template Priority', 'URL''s template priority calculated from dbo.Template.url_priority_value.', 'select suv.site_url_id as url_id, t.url_priority_value as value from dbo.SMM_URL_Validation suv with(nolock) join dbo.Template t with(nolock) on suv.template_id = t.template_id', 0, 10.1);
 insert into up.URL_Priority_Factor (name, description, query, need_normalized, weight) values ('eSV', 'Estimated Search Volume for URL.', 'select site_url_id as url_id, sv as value from (select kuma.*, isnull(vks.monthly_sv, -1) as sv, row_number() over (partition by kuma.site_url_id order by isnull(vks.monthly_sv, -1) desc) as rn from dbo.KW_URL_Main_Assignment_Distribution kuma with(nolock) left outer join kwg.vw_KW_SV vks with(nolock) on kuma.kw_id = vks.kw_id and vks.row_num = 1 where exists (select 1 from dbo.KW_Collection kc with(nolock) join dbo.Collection c with(nolock) on kc.collection_id = c.id and c.name = ''ILG Inclusion List'' where kc.kw_id = kuma.kw_id) ) t where rn = 1', 1, 5001);

* Please validate the “query” column if necessary, e.g. different (custom) templates might need ways to calculate ‘Region Priority’

4a. (Optional) Purge SEO_StaticURLs table to improve performance of the next step.

delete ds.SEO_StaticURLs
from ds.SEO_StaticURLs s
join SMM_URL_Validation v on dbo.udf_strip_url_protocol(s.url) = v.url_no_protocol and s.template_id != v.template_id;

4. Repopulate edw_urlbased_12mo table, which used to calculate URL priority.

4.1. For non-primary-language POS, import EDW data from the primary-language POS first with something like:

4.1.1 Method A

-- If one does not already exist, create a dummy EDW_File entry:
 insert into dbo.EDW_File (name, status, updated, note) values (
 '[dummy]', 0, getdate(), 'created manually by {user}');

-- Populate dbo.EDW_URL primary table:
 set identity_insert EDW_URL on;
 insert into dbo.EDW_URL (id, url)
 select distinct id, url
 from {primary_gkms}.dbo.EDW_URL u
 where url_no_protocol like '{domain}/{secondary_site_identifier}/%';

set identity_insert EDW_URL off;

-- Populate EDW_URLBased_Weekly table:
 insert into dbo.EDW_URLBased_Weekly (
 edw_file_id, rpt_wk_begin_date, url, edw_url_id, omniture_market_clicks_page, search_engine, search_engine_id, search_type_id, mktg_sub_chnnl_name, is_branded, lob, visitor_count, shopper_counts, bounce_visitor_cnt, transactions, gross_bookings_usd, gross_profit_usd)
 select {edw_file_id}, rpt_wk_begin_date, url, edw_url_id, omniture_market_clicks_page, search_engine, search_engine_id, search_type_id, mktg_sub_chnnl_name, is_branded, lob, visitor_count, shopper_counts, bounce_visitor_cnt, transactions, gross_bookings_usd, gross_profit_usd
 from {primary_gkms}.dbo.EDW_URLBased_Weekly
 where url_no_protocol like '{domain}/{secondary_site_identifier}/%'

4.1.1 Method B

-- Copy EDW_File entry:
 :setvar primary_gkms ""

SET IDENTITY_INSERT EDW_File ON
 go

declare @last_id int;
 select @last_id = max(id) from EDW_File;

insert into EDW_File (id, name, status, updated, note, year_month)
 select id, name, status, updated, note, year_month
 from $(primary_gkms).dbo.EDW_File
 where id > @last_id;

SET IDENTITY_INSERT EDW_File OFF
 go

-- Populate dbo.EDW_URL primary table:
truncate table edw_urlbased_weekly;
 delete from EDW_URL;

set identity_insert EDW_URL on;

insert into dbo.EDW_URL (id, url)
 select distinct id, url
 from $(primary_gkms).dbo.EDW_URL u
 where -- list all posibilities
 (url_no_protocol like 'www.expedia.ca/fr/%' or
 url_no_protocol like 'flights.expedia.ca/ca-fr/%' or
 url_no_protocol like 'cruise.expedia.ca/Cruise/fr-CA/%' or
 url_no_protocol like 'thingstodo.expedia.ca/ca-fr/%' or
 url_no_protocol like 'www.expedia.ca/things-to-do/fr/%')

set identity_insert EDW_URL off;
 
 insert into dbo.EDW_URLBased_Weekly (
 edw_file_id, rpt_wk_begin_date, url, edw_url_id, omniture_market_clicks_page, search_engine, search_engine_id, search_type_id, mktg_sub_chnnl_name, is_branded, lob, visitor_count, shopper_counts, bounce_visitor_cnt, transactions, gross_bookings_usd, gross_profit_usd)
 select edw_file_id, rpt_wk_begin_date, url, edw_url_id, omniture_market_clicks_page, search_engine, search_engine_id, search_type_id, mktg_sub_chnnl_name, is_branded, lob, visitor_count, shopper_counts, bounce_visitor_cnt, transactions, gross_bookings_usd, gross_profit_usd
 from $(primary_gkms).dbo.EDW_URLBased_Weekly
 where -- list all posibilities
 (dbo.udf_strip_url_protocol(url) like 'www.expedia.ca/fr/%' or
 dbo.udf_strip_url_protocol(url) like 'flights.expedia.ca/ca-fr/%' or
 dbo.udf_strip_url_protocol(url) like 'cruise.expedia.ca/Cruise/fr-CA/%' or
 dbo.udf_strip_url_protocol(url) like 'thingstodo.expedia.ca/ca-fr/%' or
 dbo.udf_strip_url_protocol(url) like 'www.expedia.ca/things-to-do/fr/%');

4.2. For all languages, run

exec usp_populate_edw_urlbased_12mo
  • Note that the output message might indicate a need to run SMG on template 102.
  • Check to see if most of the URLs in edw_urlbased_12mo have been generated:
select count(*), count(distinct site_url_id) from edw_urlbased_12mo;
select count(*) from edw_urlbased_12mo where site_url_id is null;

5. Scrub Foreign Domain URLs.

  • Check and remove all URLs that do match the correct domain. i.e. hotels.com URLs in expedia.co.th

5.1. Check sv_factor values in kwg.KW_Variant.

  • Check variants (in variant_group 0) with sv_factor > 1. Be aware of following issues:
    • sv_factor should not be too high.
    • For variants which exist in multiple templates, the highest-priority region-type should get the highest sv_factor.
  • Check variants for Flights Routes template (t3008).
  • From experience, the highest sv_factor value of all variants should be only around 0.005.
  • If necessary, make change and re-run KWG to recalculate eSV. Update Keyword Priority afterwards.

5.2. Use up.Collection.url_priority_multiplier to adjust URL priorities.

  • Create an entry in up.Collection
  • Group URLs into up.URL_Collection with given collection_id
  • Set up url_priority_multiplier value. A value > 1 will boost URL priority. Values less than 1 but great than 0 will damp URL priority. Value 0 or 1 has no effect. Negative values are not allowed.
  • If a URL has multiple url_priority_multiplier values from different collections, the highest value will be used.
  • Rule of thumb is to apply the boost gradually (e.g. start with url_priority_multiplier = 1.5) but damper more (e.g. url_priority_multiplier = 0.01)

6. Populate up.URL_Priority_Value. URL priority (up.vw_URL_Priority) will be updated automatically.

exec [up].[usp_populate_url_priority_value];

Copy SWXL Links to CMS Tables

OVERVIEW:

  • Export SWXL tables to data.myersmediagroup.com
  • Import SWXL tables to CMS
  • Publish CMS with new SWXL links

PROCESS:

Export SWXL tables to data.myersmediagroup.com

  1. Connect to cms staging server
$ cmssh staging01-cms
  1. Copy and customize db/sqlserver/conf.sample.mk
    (“SWXL_TEMPLATE_{START,END}”should be the SWXL CMS template ID range)
  2. Generate the CMS files
$ make swxl_{categories,keywords,relations,urls}.csv.gz

NOTE: Do not run ‘svn update’ on CMS servers. These production sites are managed by the CMS team. You may need to replace “MakeFile” and “conf.mk” first with a newer version temporarily, ie.

$ svn cat https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SEOCluster/trunk/cms-v2/db/sqlserver/Makefile > Makefile

$ svn cat https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SEOCluster/trunk/cms-v2/db/sqlserver/conf.sample.mk > conf.mk
  1. On prod02> D:\MMG\data\public\cluster\{pos}\{lob}\
    • Move existing swxl_*.csv.gz files to the _old directory and date-stamp the file names with the original file date (not the current date).
    • Remove old swxl_*.ok files
    • Remove old swxl_*.db files
  2. Copy new swxl_*.csv.gz files to prod02 (D:\MMG\data\public\cluster\{pos}\{lob})
  3. Commit a new entry in:
    https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/SEOCluster/trunk/cms-v2/doc/data-repo/CHANGELOG.md
  4. Repeat for each LOB

Import SWXL tables to CMS

  1. $ cmssh {lob}-staging-{pos}
  2. Delete old files
    • db/swxl_*.db files
    • db/data/swxl_*.csv files
    • db/.deps/swxl_*.d files
  3. Create new db files
    $ make db/swxl_{categories,keywords,relations,urls}
  4. Confirm data has been imported
$ make mysql
mysql> show tables;
mysql> select 'swxl_categories' as tbl, count(*) as cnt from swxl_categories
union all select 'swxl_keywords' as tbl, count(*) as cnt from swxl_keywords
union all select 'swxl_relations' as tbl, count(*) as cnt from swxl_relations
union all select 'swxl_urls' as tbl, count(*) as cnt from swxl_urls;

Publish CMS with new SWXL links

This is managed by the CMS team; simply notify the PM in charge that the given sites are ready.

Run KWG

OVERVIEW:

Process KWG script to generate and assign KWs to URL templates and to generate eSV for all keywords.

Review KWG Order and set Statuses for all URL templates. Need to ensure that URLs of different region types, but the same kind, are generating and getting best keywords correctly. Also must make sure each template has at least one unique keyword variant so that it retains some keywords assigned to it and keeps all URLs.

PROCESS:

Reset all templates to run KWG

update template set kw_explosion_pid = 0, kw_explosion_status_id = 0 where is_active = 1

https://dev.myersmediagroup.com/mmgsvn/KMS/trunk/KWGeneration/trunk/KWExplosion/script/KWExplosion.pl

$ ./KWExplosion.pl --run --nozip --conn_name={gkms_pos}

A nice way to see real-time progress of KWG is by running the following from another shell. Note, you may need to kill and re-execute it after a while if you see no activity.

$ watch -cd -n 2 ls -la log*log log*err

NOTE:

KWG is not finished until you see “log.log”

Request Live SWXL Service Activation

OVERVIEW:
Send an email to the Expedia SEO dev team requesting that they schedule turning on the SWXL service in an upcoming sprint.

EMAIL TO SEND:

TO:
Oscar Carreras (ocarreras@expedia.com)
Matt Lindsay (mlindsey@expedia.com)

CC: The SEO Specialist

SUBJECT: Turning on SWXL Service for (POS)

Hi Oscar,

We are in the process now of starting to work on
deploying the SWXL for the Singapore and India POSs. So I wanted to contact you early to request that the SWXL service be turned on for those sites. We have test SWXL data ready to go once you say you are ready, but figured it would be good to schedule the work while there was time. Trying to avoid last minute stuff.

We hope to be ready to go live by Oct 22, 2012. Ideally we should run the test before that time so we can identify any potential issues. So could you let me know when you would be able to schedule that work?

Thanks!