Sunday, September 12, 2004 12:12 AM bart

Screenscraping my "number of ASP.NET posts"

Ever wondered how I get the number of my ASP.NET Forums posts on my homepage? The answer is by using screenscraping and the use of regular expressions. Here's the code:

<%@ OutputCache Duration="30" VaryByParam="none" %>
<%@ Control Language="C#" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<%@ Import Namespace="System.IO" %>
<%@ Import Namespace="System.Net" %>

<script runat="server">
 private string URL = "
http://www.asp.net/Forums/User/UserProfile.aspx?tabindex=1&UserName=bdesmet";

 public void Page_Load(object sender, System.EventArgs e)
 {
  try
  {
   WebClient clnt = new WebClient();
   Stream s = clnt.OpenRead(URL);
   StreamReader r = new StreamReader(s);
   string res = r.ReadToEnd();
 
   Regex regex = new Regex("contributed to ((.|\n)*?) out of", RegexOptions.IgnoreCase);
   Match oM = regex.Match(res);
 
   lblPosts.Text = oM.Groups[1].ToString().Replace(",","");
  }
  catch
  {
   lblPosts.Text = "unable to retrieve";
  }
 }
</script>

<asp:Label id="lblPosts" runat="server" />

Pretty simple, isn't it? However, don't forget to cache the whole thing (this is the code of an .ascx, so it causes "partial page caching" of the homepage). A try...catch block should appear in teh code as well to incorporate the possible events of "scraped site down" or "scraped site redesigned".

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

No Comments