In web scraping field there are two methods for data filtration. and the question is what is best?
The correct answer is, depends.
First is to use a DOM (Document Object Model) parser and second is regex matching (regex is an acronym from regular expressions). Both of them has advantages and disadvantages.
DOM Parser
Advantages | Disadvantages |
---|---|
Simple to code | Use more memory |
Sensitive at bad HTML |
Regex
Advantages | Disadvantages |
---|---|
Insensitive at bad HTML | Use more CPU |
more difficult to code |