Author Archives: Adrian Balcan

What you have to know before requesting web scraping services?

Before you request web scraping services you have to know what are your needs (what data you need, structure of it and where you can find this data).

Step 1: Define what data you need?

Data needs depending on purpose, if you want to find new customers you probably need contact data from players in your industry. Also if you want to study your competitors you need to define who are they. Only after that you can select data sources (websites feeds or other electronic sources) for this extraction.

In many cases for discovering and defining data sources are used search engines like Google, Bing, Yahoo, and others.

Step 2: Structure of data

Data structure it’s directly linked to usage purpose. In many cases data structure it’s a table where a row represents an entity and a cell of this row represents a property of this entity. In other cases Data structure is a a chart or another graphic representation builder with data extracted from a web source.

Step 3: Number of data extraction

In many cases is needed one time data extraction. In other cases when you need a regular report, are needed periodically extractions.

If you have defined all of above points you are ready to request a quote and an amount estimation from this contact form.

What you need to know before building a web crawler

This article it’s for persons with technical skills that are some experience in the internet field.

A web spider or a web crawler is a specific program build and used for extracting data from a specific website.

Before start coding for a web crawler you need to know some info about next points:

1 what is your data source (website URL)

2.what it’s your crawling strategy:

If you get data from multiple URLs, How can you start maybe an index page, or a list with all of interest URL

3 common elements

Crawling is about finding common elements and extract different data from different locations (as URLs) contained in elements with the same structure like a div with a specific class or another HTML element.

4 programming language

What programming language you can use for this and what libraries you need to use for this. Also this it’s the point when you need to decide if you use a DOM parser or regex for finding common element and extract data from it.

DOM versus Regex in web scraping

In web scraping field there are two methods for data filtration. and the question is what is best?

The correct answer is, depends.

First is to use a DOM (Document Object Model) parser and second is regex matching (regex is an acronym from regular expressions). Both of them has advantages and disadvantages.

DOM Parser

Advantages Disadvantages
Simple to code Use more memory
Sensitive at bad HTML

Regex

Advantages Disadvantages
Insensitive at bad HTML Use more CPU
more difficult to code

How to convert .XLS files in .CSV and viceversa?

XLS (Microsoft Excel spreadsheet format)

Is a binary format used by Microsoft excel for storing data.

CSV ( Comma Separated Values )

Is a file format for storing data in text files. Every value are separated from other value by a delimiter (many times is , or ; ). Content of a CSV file looks like following lines:

Year;Make;Model;Length
1997;Ford;E350;2,34
2000;Mercury;Cougar;2,38

Convert XLS to CSV

This it’s very simple:

1. You need to open XLS (Excel spreadsheet) file with Microsoft Excel

2. Save as (from file or office menu)

3. Insert file name

4. Select CSV (Comma delimited) (*.csv) from save as type ( above File name )

5, Press Save, then OK and Yes This is all.

Convert CSV to XLS file

1. You need to open CSV file with Microsoft Excel

2. Save as (from file or office menu)

3. Insert file name

4. Select Excel 97-2003 Workbook (*.xls) from save as type ( above File name )

5. Press Save This is all.

 

Web scraping in email marketing

Many businesses have difficulties to bring their products on their market. Often their market it’s easy to be defined. On a defined market you can easy research your competitors and your customers, and what it’s very valuable, you can find details about every potential customer and you can bring him an offer.

A real example:

If your business does cleaning solutions for swimming pools you may be interested by all contact details (contact person, phone number, email address, website) of hotels who have swimming pools near you or even in entire country. If you have this data you can do direct marketing or email marketing (sending services or product offer by email)