feedback
Apr 25 2006

Lightbox usage for calendar

by John Dyer

I just made my first implimentation of Lightbox ("Gone Wild" edition). Lightbox (original, gone wild) is a small JavaScript tool for making accessible modal windows. I've implimented a slighlty tweaked version it in a calendar page (see pics below) and on a page that shows faculty member's publications (example). I've also used some IE hacks to get IE6 to properly use alpha PNGs for a nicer overlay. Rather than use the full-fledged ASP.NET calendar component, I wrote a custom ASP.NET control that renders a CSS-only table. I would have liked to make the calendar URL-hackable (have the ability to move through months via URLs like /calendar/2006/09/) but for now I'm using postbacks.

Before clicking the link
Before pressing the link
After clicking a lightbox link
After clicking on a calendar entry
Apr 19 2006

WebServices Single Sign-On for ASP.NET 2.0

by John Dyer

A while back I wrote about a Login provider for ASP.NET that uses WebServices.

The code is now available at http://freetextbox.com/files/ under "Code Lab." The direct link is here: Single Sign-On.

Apr 7 2006

WebCrawler Engine in C# (first draft)

by John Dyer

A few weeks ago, I wrote about using SearchAroo as a spider to index a site with DotLucene. I've written a new WebCrawler using SearchAroo as a base and turned it into a library that can be reused for other applications.

Download Web Crawler (zip file with WebCrawler engine and sample web and forms apps)

Here are the improvements I've made:

  1. Gets text from the following HTML tag attributes: alt, title, summary, longdesc
  2. Better ability to determine relative URLs
  3. The WebDocument object keeps record of all files it links out to, including external and internal links, as well as images. This is useful for determining if your site has missing images or outgoing links.
  4. Compiled into a reusable library (the author of SearchAroo didn't want to have a dll, but I feel it's much more usable this way) which means it can be plugged into any indexing framework or used for other purposes such as simple link checking.

Here is the basic code to get it running:

string baseUrl = "http://mywebsite.com/";
CrawlerEngine crawler = new CrawlerEngine();
crawler.OnDocumentLoaded += new DocumentHandler(crawler_OnDocumentLoaded);
crawler.Crawl(baseUrl);

void crawler_OnDocumentLoaded(WebDocumentBase webDocument, int level) {
    // do indexing code
    // WebDocumentBase is a base class for all documents that are downloaded and spidered
    // it has the following properties (Uri, ContentType, MimeType, Encoding, Length, TextData, InternalLinks, ExternalLinks, ImageSrcs)

    // if the file an HTML file, then it can be cast as an HtmlDocument
    // with the following additional properties (Title, Description, Keywords, Html)

    // future additions will hopefully have plugins for PdfDocument and WordDocument
}

Future things I'd like to add:

  1. Other document types (PDF, Word, other Office formats) for indexing like DotLucene's indexer.
  2. More events to help steer the crawling
  3. Weight to heading tags (h1, h2, etc.)

Please note, the namespace "Refresh.Web" is for a future business endeavor. The code is released with an CC-attributive license. If you're interested using it, please leave a comment on additional features you'd like to see.