Robots.Txt Catchall Picks

No matter what anyone says these robots.txt lines work for the googlebot horde:

Disallow: /*.epub$
Disallow: /*.swf$
Disallow: /*.xml$
Disallow: /*?
Disallow: /*%20
Disallow: /%3F*
Disallow: /*=
Disallow: */activity
Disallow: */author/
Disallow: */page/
Disallow: */category/
Disallow: */tag/
Disallow: */comments
Disallow: */members
Disallow: */register/
Disallow: */trackback/
Disallow: */user
Disallow: */wp-
Disallow: */wp
Disallow: */xml
Disallow: */1
Disallow: */2

Paste into the robots.txt checker in (blocked URL’s) webmaster tools and test against url’s you don’t want crawled.

Pages, directories and formats above are examples from my robots.txt file for my needs. But you can see some patterns to mess with and from here you can find what works for you.

For example, if you change your permalink from date to no date the */1 will block any year starting with 1. (1998, 1999 etc). I got rid of a lot of lines.