This proved very helpful to me. It's a short, concise explanation of how to extract data from the structured markup of a website in python. It's a no-brainer, but sometimes you need someone to show you the no-brainer before it becomes one for you.
The bottom line is that html is a string. It's also a tree. You can parse it either way. Microformats and the semantic web aren't only useful to big spidering companies like the goog; they're useful to all of us.
Hello Dude,
ReplyDeleteA microformat is a web-based approach to semantic markup which seeks to re-use existing HTML or XHTML tags to convey metadata and other attributes in web pages and other contexts that support HTML, such as RSS. Thanks a lot....
Web Harvester