Retrieve data from top-1000-company site

From Catglobe Wiki
Revision as of 07:32, 7 April 2010 by Catglobe (talk | contribs) (Created page with '== Introduction == This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system == Solution == We have * Server Path: \catproc\Sha…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

This Solution is used to retrieve data from site http://top1000.dk/ and then import to Catglobe system

Solution

We have

  • Server Path: \catproc\Share\CatGlobe Teams\Production\Top 1000\
  • Use Tool Top1000dk.exe to get raw data from http://top1000.dk/ site from server path
  • Customize data, import to Excel file
  • Run Macro (get Macro source code from from server path


Step 1 - Generate files by Internal Tool

  • Run Top1000dk.exe Tool, input index of page. Generate all pages. Note: run only one instance of this tool at a time.

<Top1000Tool_1 Image>

  • Go to your local C:\ to get exported files

Step 2 - Customize raw data

  • Merge all files we get from step1 to one file
  • Replace character "~" as "$$" and "|" as "$" (the purpose is separating into columns when importing these files to Excel)

<Customize data Image>

  • Save file as txt format

Step 2 - Import to Excel

  • Open Excel application -> select Data tab -> Get External data -> From Text
  • Browse the txt file we get from step2
  • Follow step 1: Choose Delimited option
  <Import 1 Image>
  • Follow step 2: Unselect "Tab" option, Select "Other" option = @
  <Import 2 Image>

Step 3 - Customize Excel data

Because of limitation of the original site http://top1000.dk/, we encounter some problems with Unicode and long names which are displayed as "..."

  • Open file "Unicode name" from server path to get list of incorrect names and replacements.
  • Replace all incorrect unicode character in file we get from Step2 by correct ones. Tips: we use IFERROR and VLOOKUP functions

<Vlookup Image>

  • Similarly, replace three-dot companies from the file 3-dot companies. We get the list of replacements of three-dot companies from file "3-dot companies"


Step 3 - Run Macro

  • Get Macro source code from server path
  • Orderly run these functions:

- Run Macro "top1000One"
- Run Macro "top1000Two"
- Run Macro "top1000Three"
- Run Macro "top1000Four"