Tag Archives: DOM parser

DOM versus Regex in web scraping

In web scraping field there are two methods for data filtration. and the question is what is best?

The correct answer is, depends.

First is to use a DOM (Document Object Model) parser and second is regex matching (regex is an acronym from regular expressions). Both of them has advantages and disadvantages.

DOM Parser

Advantages Disadvantages
Simple to code Use more memory
Sensitive at bad HTML

Regex

Advantages Disadvantages
Insensitive at bad HTML Use more CPU
more difficult to code