Retrieve data from top-1000-company site: Difference between revisions
From Catglobe Wiki
More actions
 Created page with '== Introduction  ==  This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system  == Solution  ==  We have  * Server Path: \catproc\Sha…'  | 
				No edit summary  | 
				||
| Line 1: | Line 1: | ||
== Introduction  ==  | == Introduction  ==  | ||
This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system  | This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system    | ||
<br>   | |||
== Solution  ==  | == Solution  ==  | ||
We have  | We have    | ||
* Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\  | *Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\    | ||
* Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path  | *Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path    | ||
* Customize data, import to Excel file  | *Customize data, import to Excel file    | ||
* Run Macro (get Macro source code from from server path  | *Run Macro (get Macro source code from from server path<br>   | ||
*  | |||
=== Step 1 - Generate files by Internal Tool  ===  | |||
*Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.  | |||
[[Image:Top1000Tool 1.png]]<br>   | |||
*Go to your local C:\ to get exported files  | |||
<br>   | |||
=== Step 2 - Customize raw data  ===  | === Step 2 - Customize raw data  ===  | ||
* Merge all files we get from step1 to one file  | *Merge all files we get from step1 to one file    | ||
* Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)  | *Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)  | ||
<  | <br>    | ||
* Save file as txt format  | [[Image:Customize Data.png]]   | ||
<br>   | |||
*Save file as txt format  | |||
<br>   | |||
=== Step 2 - Import to Excel  ===  | === Step 2 - Import to Excel  ===  | ||
* Open Excel application -  | *Open Excel application -> select Data tab -> Get External data -> From Text    | ||
* Browse the txt file we get from step2  | *Browse the txt file we get from step2    | ||
* Follow step 1: Choose Delimited option  | *Follow step 1: Choose Delimited option  | ||
<br>    | |||
[[Image:Import step 1.png]]   | |||
<br>   | |||
*Follow step 2: Unselect "Tab" option, Select "Other" option = @  | |||
<br>   | |||
[[Image:Import step 2.png]]   | |||
<br>    | |||
=== Step 3 - Customize Excel data  ===  | === Step 3 - Customize Excel data  ===  | ||
Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."  | Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."    | ||
*Open file "Unicode name" from server path to get list of incorrect names and replacements.   | |||
*Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions  | |||
[[Image:Vlookup function.png]]   | |||
*Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"  | |||
<br>   | |||
=== Step 3 - Run Macro  ===  | === Step 3 - Run Macro  ===  | ||
* Get Macro source code from server path  | *Get Macro source code from server path    | ||
* Orderly run these functions:  | *Orderly run these functions:  | ||
- Run Macro "top1000One" <br>  | |||
- Run Macro "top1000Two" <br>  | - Run Macro "top1000One" <br> - Run Macro "top1000Two" <br> - Run Macro "top1000Three" <br> - Run Macro "top1000Four" <br>    | ||
- Run Macro "top1000Three" <br>  | |||
- Run Macro "top1000Four" <br>  | |||
[[Category:Production_guidelines]]  | [[Category:Production_guidelines]]  | ||
Revision as of 06:36, 7 April 2010
Introduction
This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system
 
Solution
We have
- Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
 - Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
 - Customize data, import to Excel file
 - Run Macro (get Macro source code from from server path
 
Step 1 - Generate files by Internal Tool
- Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.
 
- Go to your local C:\ to get exported files
 
 
Step 2 - Customize raw data
- Merge all files we get from step1 to one file
 - Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)
 
 
 
- Save file as txt format
 
 
Step 2 - Import to Excel
- Open Excel application -> select Data tab -> Get External data -> From Text
 - Browse the txt file we get from step2
 - Follow step 1: Choose Delimited option
 
 
 
- Follow step 2: Unselect "Tab" option, Select "Other" option = @
 
 
 
Step 3 - Customize Excel data
Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."
- Open file "Unicode name" from server path to get list of incorrect names and replacements.
 - Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions
 
- Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"
 
 
Step 3 - Run Macro
- Get Macro source code from server path
 - Orderly run these functions:
 
- Run Macro "top1000One" 
 - Run Macro "top1000Two" 
 - Run Macro "top1000Three" 
 - Run Macro "top1000Four" 
 
				



