Difference between revisions of "Retrieve data from top-1000-company site"

From Catglobe Wiki
Jump to: navigation, search
(Created page with '== Introduction == This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system == Solution == We have * Server Path: \catproc\Sha…')
 
Line 1: Line 1:
 
== Introduction  ==
 
== Introduction  ==
  
This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system
+
This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system  
 +
 
 +
<br>
  
 
== Solution  ==
 
== Solution  ==
  
We have
+
We have  
  
* Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
+
*Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\  
* Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
+
*Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path  
* Customize data, import to Excel file
+
*Customize data, import to Excel file  
* Run Macro (get Macro source code from from server path
+
*Run Macro (get Macro source code from from server path<br>
 +
*
  
 +
=== Step 1 - Generate files by Internal Tool  ===
  
=== Step 1 - Generate files by Internal Tool  ===
+
*Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.
  
* Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.
+
[[Image:Top1000Tool 1.png]]<br>
  
<Top1000Tool_1 Image>
+
*Go to your local C:\ to get exported files
  
* Go to your local C:\ to get exported files
+
<br>
  
 
=== Step 2 - Customize raw data  ===
 
=== Step 2 - Customize raw data  ===
  
* Merge all files we get from step1 to one file
+
*Merge all files we get from step1 to one file  
* Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)
+
*Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)
  
<Customize data Image>
+
<br>  
  
* Save file as txt format
+
[[Image:Customize Data.png]]
 +
 
 +
<br>
 +
 
 +
*Save file as txt format
 +
 
 +
<br>
  
 
=== Step 2 - Import to Excel  ===
 
=== Step 2 - Import to Excel  ===
  
* Open Excel application -> select Data tab -> Get External data -> From Text
+
*Open Excel application -&gt; select Data tab -&gt; Get External data -&gt; From Text  
* Browse the txt file we get from step2
+
*Browse the txt file we get from step2  
* Follow step 1: Choose Delimited option
+
*Follow step 1: Choose Delimited option
  
  <Import 1 Image>
+
<br>  
* Follow step 2: Unselect "Tab" option, Select "Other" option = @
 
  
  <Import 2 Image>
+
[[Image:Import step 1.png]]
 +
 
 +
<br>
 +
 
 +
*Follow step 2: Unselect "Tab" option, Select "Other" option = @
 +
 
 +
<br>
 +
 
 +
[[Image:Import step 2.png]]
 +
 
 +
<br>  
  
 
=== Step 3 - Customize Excel data  ===
 
=== Step 3 - Customize Excel data  ===
  
Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."
+
Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."  
* Open file "Unicode name" from server path to get list of incorrect names and replacements.
 
* Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions
 
  
<Vlookup Image>
+
*Open file "Unicode name" from server path to get list of incorrect names and replacements.
 +
*Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions
  
* Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"
+
[[Image:Vlookup function.png]]
  
 +
*Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"
 +
 +
<br>
  
 
=== Step 3 - Run Macro  ===
 
=== Step 3 - Run Macro  ===
  
* Get Macro source code from server path
+
*Get Macro source code from server path  
* Orderly run these functions:
+
*Orderly run these functions:
- Run Macro "top1000One" <br>
+
 
- Run Macro "top1000Two" <br>
+
- Run Macro "top1000One" <br> - Run Macro "top1000Two" <br> - Run Macro "top1000Three" <br> - Run Macro "top1000Four" <br>  
- Run Macro "top1000Three" <br>
 
- Run Macro "top1000Four" <br>
 
  
 
[[Category:Production_guidelines]]
 
[[Category:Production_guidelines]]

Revision as of 07:36, 7 April 2010

Introduction

This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system


Solution

We have

  • Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
  • Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
  • Customize data, import to Excel file
  • Run Macro (get Macro source code from from server path

Step 1 - Generate files by Internal Tool

  • Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.

Top1000Tool 1.png

  • Go to your local C:\ to get exported files


Step 2 - Customize raw data

  • Merge all files we get from step1 to one file
  • Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)


Customize Data.png


  • Save file as txt format


Step 2 - Import to Excel

  • Open Excel application -> select Data tab -> Get External data -> From Text
  • Browse the txt file we get from step2
  • Follow step 1: Choose Delimited option


Import step 1.png


  • Follow step 2: Unselect "Tab" option, Select "Other" option = @


Import step 2.png


Step 3 - Customize Excel data

Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."

  • Open file "Unicode name" from server path to get list of incorrect names and replacements.
  • Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions

Vlookup function.png

  • Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"


Step 3 - Run Macro

  • Get Macro source code from server path
  • Orderly run these functions:

- Run Macro "top1000One"
- Run Macro "top1000Two"
- Run Macro "top1000Three"
- Run Macro "top1000Four"