Capturing Baidu spider entries from Apache Web log file

Q

How to capture the Baidu spider entries from Web log file?

Here are some Web log file entries:

✍: Guest

A

Baidu Spider is a Web crawler from Baidu.com that obtains content for the baidu Search engine. Baidu spider uses the following user agent string:

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

The regular expression to capture Baidu Spider entries from Apache Web log file can be written as: with the multiple lines modifier "m" specified:

Click the button to test this regular expression here online:

2013-02-04, 6551👍, 0💬