Difference between revisions of "Retrieve data from top-1000-company site"
(Created page with '== Introduction == This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system == Solution == We have * Server Path: \catproc\Sha…') |
|||
Line 1: | Line 1: | ||
== Introduction == | == Introduction == | ||
− | This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system | + | This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system |
+ | |||
+ | <br> | ||
== Solution == | == Solution == | ||
− | We have | + | We have |
− | * Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\ | + | *Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\ |
− | * Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path | + | *Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path |
− | * Customize data, import to Excel file | + | *Customize data, import to Excel file |
− | * Run Macro (get Macro source code from from server path | + | *Run Macro (get Macro source code from from server path<br> |
+ | * | ||
+ | === Step 1 - Generate files by Internal Tool === | ||
− | + | *Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time. | |
− | + | [[Image:Top1000Tool 1.png]]<br> | |
− | + | *Go to your local C:\ to get exported files | |
− | + | <br> | |
=== Step 2 - Customize raw data === | === Step 2 - Customize raw data === | ||
− | * Merge all files we get from step1 to one file | + | *Merge all files we get from step1 to one file |
− | * Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel) | + | *Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel) |
− | < | + | <br> |
− | * Save file as txt format | + | [[Image:Customize Data.png]] |
+ | |||
+ | <br> | ||
+ | |||
+ | *Save file as txt format | ||
+ | |||
+ | <br> | ||
=== Step 2 - Import to Excel === | === Step 2 - Import to Excel === | ||
− | * Open Excel application - | + | *Open Excel application -> select Data tab -> Get External data -> From Text |
− | * Browse the txt file we get from step2 | + | *Browse the txt file we get from step2 |
− | * Follow step 1: Choose Delimited option | + | *Follow step 1: Choose Delimited option |
− | + | <br> | |
− | |||
− | + | [[Image:Import step 1.png]] | |
+ | |||
+ | <br> | ||
+ | |||
+ | *Follow step 2: Unselect "Tab" option, Select "Other" option = @ | ||
+ | |||
+ | <br> | ||
+ | |||
+ | [[Image:Import step 2.png]] | ||
+ | |||
+ | <br> | ||
=== Step 3 - Customize Excel data === | === Step 3 - Customize Excel data === | ||
− | Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..." | + | Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..." |
− | |||
− | |||
− | + | *Open file "Unicode name" from server path to get list of incorrect names and replacements. | |
+ | *Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions | ||
− | + | [[Image:Vlookup function.png]] | |
+ | *Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies" | ||
+ | |||
+ | <br> | ||
=== Step 3 - Run Macro === | === Step 3 - Run Macro === | ||
− | * Get Macro source code from server path | + | *Get Macro source code from server path |
− | * Orderly run these functions: | + | *Orderly run these functions: |
− | - Run Macro "top1000One" <br> | + | |
− | - Run Macro "top1000Two" <br> | + | - Run Macro "top1000One" <br> - Run Macro "top1000Two" <br> - Run Macro "top1000Three" <br> - Run Macro "top1000Four" <br> |
− | - Run Macro "top1000Three" <br> | ||
− | - Run Macro "top1000Four" <br> | ||
[[Category:Production_guidelines]] | [[Category:Production_guidelines]] |
Revision as of 07:36, 7 April 2010
Contents
Introduction
This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system
Solution
We have
- Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
- Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
- Customize data, import to Excel file
- Run Macro (get Macro source code from from server path
Step 1 - Generate files by Internal Tool
- Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.
- Go to your local C:\ to get exported files
Step 2 - Customize raw data
- Merge all files we get from step1 to one file
- Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)
- Save file as txt format
Step 2 - Import to Excel
- Open Excel application -> select Data tab -> Get External data -> From Text
- Browse the txt file we get from step2
- Follow step 1: Choose Delimited option
- Follow step 2: Unselect "Tab" option, Select "Other" option = @
Step 3 - Customize Excel data
Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."
- Open file "Unicode name" from server path to get list of incorrect names and replacements.
- Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions
- Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"
Step 3 - Run Macro
- Get Macro source code from server path
- Orderly run these functions:
- Run Macro "top1000One"
- Run Macro "top1000Two"
- Run Macro "top1000Three"
- Run Macro "top1000Four"