Today , I am going to talk about a very useful code library (at least it was useful for me and saved me a ton of time ) used to parse HTML .Its called HtmlAgilitypack and can be accessed in codeplex at the link below :
Recently at my work I had to go to the websites of various Parliamentary Parties in UK and extract the Name ,the constituency they were representing and the link to their individual profile .With around 10 sites to load and couple hundreds candidates info to gather ,my manager asked me to make the process handy reuseable for future .The trick was to read the HTML embedded in the page and extract all the info by parsing the HTML .This also gave me an opportunity to get hands on experience with XPATH and XQUERY J , which was not as scary as I thought .Following is an excerpt of the code behind file in VB which gets executed on the Button Click event to load the page www.labour.co.uk/ppc
and extract all the relevent info .I have put the comments in the code as well .The logic is very simple and once you debug the app, everything should make sense.I am writing the o/p in a text file for later retrieval.Hope that helps you get started with HTML Agility pack .
Apologies for not being able to upload the txt file .
Happy Coding 🙂