Immediate TODO’s

12 03 2007

- Learn to read String, output string
- Learn to user Socket Class, ServerSocket class
- Learn to use URL class
- Learn to use Threads to server each request
- Code a server and a client. Server should print the msg that client sends.

Books to use:
- Learning Java, 3rd Edition
- Java proffessional
- Networking in Java.





Stage 1 Complete

8 03 2007

We are just over with Stage 1. We could call it the analysis phase.

These are the things we’ve decided upon.

  • Use Java.
  • Use pastry- an implementation of DHT in Java.
  • Code our project as a Proxy
  • This proxy should implement the chaching mechanism, use DHT to route the requests
  • Cache only static HTML pages




a few more good links on web cache

2 03 2007

here are a few more good links which gives u an idea about how urls are xtracted from an html page, algoritms for replacing old unused urlsin cache and storage format of the urls in cache.

http://www.mnot.net/cache_docs/#WORK
http://www.seoconsultants.com/articles/1000/cache-control.asp
http://www8.org/w8-papers/2a-webserver/caching/paper2.html
http://www.eecs.harvard.edu/~vino/web/usenix.196/
http://download-east.oracle.com/docs/cd/B15904_01/caching.1012/b14046/cache.htm
http://feedparser.org/docs/http-etag.html
http://searchoracle.techtarget.com/generic/0,295582,sid41_gci1049915,00.html
http://www.gii.upv.es/web_architecture/download/paper-20061031125802-apont.pdf
http://www8.org/w8-papers/2a-webserver/caching/paper2.html#wcaching
http://webjunction.org/do/DisplayContent?id=933
http://www.panicware.com/resource_cookies.html
http://www.cisco.com/web/about/ac123/ac147/ac174/ac199/about_cisco_ipj_archive_article09186a

00800c8903.html

Each web page within a website is an HTML file which has its own URL. After each web page is

created, they are typically linked together using a navigation menu composed of hyperlinks.

http://www2003.org/cdrom/papers/refereed/p096/p96-broder.html#Knut73
http://www.clevercomponents.com/articles/article010/urlextractor.asp
http://www.nirsoft.net/utils/addrview.html

The HTTP GET message is used to retrieve a document given its URL. It is clear from the HTTP

specification that when GET is passed to a cache, the cache may choose to return a cached

document; GET alone does not guarantee that it will return a fresh page.

http://iw3c2.cs.ust.hk/WWW5/www5conf.inria.fr/fich_html/papers/P2/Overview.html

http://www.codersource.net/csharp_screen_scraping.html





WEB CACHING

1 03 2007




The Project Topic

16 02 2007

The project topic has been decided as “Decentralized Web Cache”. So first is a introduction about the project.

The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose and evaluate decentralized web caching algorithm called “home store” method. It achieves the benefits of decentralization, such as being scalable, self-organizing and resilient to node failures, while imposing low overhead on the participating nodes.

Some suggested readings are:





Quick Start

18 12 2006