Toggle menu
876
3.8K
30.2K
279.1K
Catglobe Wiki
Toggle preferences menu
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

Retrieve data from top-1000-company site

From Catglobe Wiki
Revision as of 09:22, 4 October 2011 by Cg_pham (talk | contribs)

<accesscontrol>Administrators,,Cem,,Maysunshine</accesscontrol>

Introduction

This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system


Solution

We have

  • Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
  • Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
  • Customize data, import to Excel file
  • Run Macro (get Macro source code from from server path

Step 1 - Generate files by Internal Tool

  • Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.


  • Go to your local C:\ to get exported files


Step 2 - Customize raw data

  • Merge all files we get from step1 to one file
  • Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)



  • Save file as txt format


Step 2 - Import to Excel

  • Open Excel application -> select Data tab -> Get External data -> From Text
  • Browse the txt file we get from step2
  • Follow step 1: Choose Delimited option



  • Follow step 2: Unselect "Tab" option, Select "Other" option = @



Step 3 - Customize Excel data

Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."

  • Open file "Unicode name" from server path to get list of incorrect names and replacements.
  • Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions

  • Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"


Step 3 - Run Macro

  • Get Macro source code from server path
  • Orderly run these functions:

- Run Macro "top1000One"
- Run Macro "top1000Two"
- Run Macro "top1000Three"
- Run Macro "top1000Four"

Step 4 -

Test your work, compare to original site http://top1000.dk/da/top1000/1/rank/ASC