Toggle menu
876
3.8K
30.2K
279.1K
Catglobe Wiki
Toggle preferences menu
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

Retrieve data from top-1000-company site: Difference between revisions

From Catglobe Wiki
Created page with '== Introduction == This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system == Solution == We have * Server Path: \catproc\Sha…'
 
No edit summary
Line 1: Line 1:
== Introduction  ==
== Introduction  ==


This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system
This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system  
 
<br>


== Solution  ==
== Solution  ==


We have
We have  


* Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
*Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\  
* Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
*Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path  
* Customize data, import to Excel file
*Customize data, import to Excel file  
* Run Macro (get Macro source code from from server path
*Run Macro (get Macro source code from from server path<br>
*


=== Step 1 - Generate files by Internal Tool  ===


=== Step 1 - Generate files by Internal Tool  ===
*Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.


* Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.
[[Image:Top1000Tool 1.png]]<br>


<Top1000Tool_1 Image>
*Go to your local C:\ to get exported files


* Go to your local C:\ to get exported files
<br>


=== Step 2 - Customize raw data  ===
=== Step 2 - Customize raw data  ===


* Merge all files we get from step1 to one file
*Merge all files we get from step1 to one file  
* Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)
*Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)


<Customize data Image>
<br>  


* Save file as txt format
[[Image:Customize Data.png]]
 
<br>
 
*Save file as txt format
 
<br>


=== Step 2 - Import to Excel  ===
=== Step 2 - Import to Excel  ===


* Open Excel application -> select Data tab -> Get External data -> From Text
*Open Excel application -&gt; select Data tab -&gt; Get External data -&gt; From Text  
* Browse the txt file we get from step2
*Browse the txt file we get from step2  
* Follow step 1: Choose Delimited option
*Follow step 1: Choose Delimited option


  <Import 1 Image>
<br>  
* Follow step 2: Unselect "Tab" option, Select "Other" option = @


  <Import 2 Image>
[[Image:Import step 1.png]]
 
<br>
 
*Follow step 2: Unselect "Tab" option, Select "Other" option = @
 
<br>
 
[[Image:Import step 2.png]]
 
<br>  


=== Step 3 - Customize Excel data  ===
=== Step 3 - Customize Excel data  ===


Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."
Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."  
* Open file "Unicode name" from server path to get list of incorrect names and replacements.
* Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions


<Vlookup Image>
*Open file "Unicode name" from server path to get list of incorrect names and replacements.
*Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions


* Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"
[[Image:Vlookup function.png]]


*Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"
<br>


=== Step 3 - Run Macro  ===
=== Step 3 - Run Macro  ===


* Get Macro source code from server path
*Get Macro source code from server path  
* Orderly run these functions:
*Orderly run these functions:
- Run Macro "top1000One" <br>
 
- Run Macro "top1000Two" <br>
- Run Macro "top1000One" <br> - Run Macro "top1000Two" <br> - Run Macro "top1000Three" <br> - Run Macro "top1000Four" <br>  
- Run Macro "top1000Three" <br>
- Run Macro "top1000Four" <br>


[[Category:Production_guidelines]]
[[Category:Production_guidelines]]

Revision as of 06:36, 7 April 2010

Introduction

This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system


Solution

We have

  • Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
  • Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
  • Customize data, import to Excel file
  • Run Macro (get Macro source code from from server path

Step 1 - Generate files by Internal Tool

  • Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.


  • Go to your local C:\ to get exported files


Step 2 - Customize raw data

  • Merge all files we get from step1 to one file
  • Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)



  • Save file as txt format


Step 2 - Import to Excel

  • Open Excel application -> select Data tab -> Get External data -> From Text
  • Browse the txt file we get from step2
  • Follow step 1: Choose Delimited option



  • Follow step 2: Unselect "Tab" option, Select "Other" option = @



Step 3 - Customize Excel data

Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."

  • Open file "Unicode name" from server path to get list of incorrect names and replacements.
  • Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions

  • Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"


Step 3 - Run Macro

  • Get Macro source code from server path
  • Orderly run these functions:

- Run Macro "top1000One"
- Run Macro "top1000Two"
- Run Macro "top1000Three"
- Run Macro "top1000Four"