In web scraping field there are two methods for data filtration. and the question is what is best?
The correct answer is, depends.
First is to use a DOM (Document Object Model) parser and second is regex matching (regex is an acronym from regular expressions). Both of them has advantages and disadvantages.
DOM Parser
| Advantages | Disadvantages |
|---|---|
| Simple to code | Use more memory |
| Sensitive at bad HTML |
Regex
| Advantages | Disadvantages |
|---|---|
| Insensitive at bad HTML | Use more CPU |
| more difficult to code |