ASp.Net day

Micro blog

C# html Parser

January 22
by Satalaj 22. January 2010 10:44

  If you came here by searching for C#.net or VB.net html parser, your search ends here.

While developing screen scrapping with C#.net, I came across very rich open source HtmlParser. Who is capable of parsing entire html document.

Here is a link to download and parse the required fields in html  
http://www.codeplex.com/htmlagilitypack.

It is very similar to librarys provided by .net to parse or iterate XMLDocuments.

code snippet:

It will give you list of  all nodes of type Input.  Here str is your actual HTML string that you received in HttpWebResponse object

 HtmlAgilityPack.HtmlDocument hd = new HtmlAgilityPack.HtmlDocument();

hd.LoadHtml(str);

HtmlNodeCollection hc = hd.DocumentNode.SelectNodes("//input");


You can use Xpath query to iterate nodes and ChildNodes.

Satalaj

Tags:

Asp.net

Comments

Comments are closed

About Satalaj

My name is Satalaj. I'm 2010 asp.net MVP. I write technical stuff here. www.satalaj.com

Ads by Lake Quincy Media

The best inline translator

Live lookup to see what asp.net developers are searching