Scribes for the Class on 5 Oct 2001 (Viraj Bhat) I have put together the scribes of the class and I have in some places referred to the web and in some places put in my own opinion(bracketed it) Please take a timeout before reading this.Also this summarizes the important deadlines. Today's class discussion was centred around the Discovery of 2 peer to peer applications 1)Gnutella discussed by Manish 2)Pastry discussed by Dennis Before starting the Discussion we agreed on the new deadlines and submissions: Next week 12 Oct : We submit names of 3-4 driving applications and submit a report by that day. On 19 Oct submit a 1-2 page individual report of the study. Gnutella was the first application chosen for discussion. 1)The first few slides were on the Protocol Header that Gnutella uses to Discover it's peers.Gnutella runs on a TCP/IP protocol. Gnutella header has these fields Name Bytes 1)Message ID 0-15 2)Function ID 16 3)TTL Remaining 17 4)Hops Taken 18 4)Data Length 19-22 The main messages it sends are : a)Ping : Used to actively discover hosts on the network. A servant descriptor is expected to respond with one or more Pong descriptors. b)Pong : The response to a Ping. Includes the address of a connected Gnutella servant and information regarding the amount of data it is making available to the network. Query : The primary mechanism for searching the distributed network. A servant receiving a Query descriptor will respond with a QueryHit if a match is found against its local data set. QueryHit : The response to a Query. This descriptor provides the recipient with enough information to acquire the data matching the corresponding Query. Push : A mechanism that allows a firewalled servant to contribute file-based data to the network. Note :The document in clip2.com have referred to each peer as a servant. If 2 peers are behind firewalls there can be no interaction. I had posed a question as to whether Gnutella can run on UDP.TCP connections may go over firewalls.Also security issues involved in Gnutella might have prompted Gnutella to be implemented over TCP. There were some figures involving Gnutella's Ping and Query Mechanism. Gnutella is anonymous When you send a query to the GnutellaNet, there is not much in it that can link that query to you. I'm not saying it's totally impossible to figure out who's searching for what, but it's pretty unlikely, and each time your query is passed, the possibility of discovering who originated that query is reduced exponentially. ------------------------------------------------------------------------ Problems with Gunutella was that it did not scale.There is a problem of free riding.I did not undertand what free ridign means but when I went down to my desk and read about it means that peers that volunteer to share files are not necessarily those who have desirable ones.It may lead to system degradation.There was a table which teels that the maximum gnutella can scale is around 1098056 with 8 connections and Hop count of 7.There is 83 byte overhead on the packet size. There was a discussion of an algorithm.The algorithm for the Gnutella Crawler was developed at the university of Cincinnati and is used by them to develop an alogrithm for the Gnutella.They tell that it is has a small world characteristics. Softwares that implement gnutella are : LimeWire, Bearshare,Gnuclues,Phex, SwapHut and XoloX --------------------------------------------------------------------- Pastry(A Peer-to-Peer Object Location and Routing Infrastructure) --------------------------------------------------------------------- Pastry is another attempt at peer to peer.The contributors are from Rice University and Microsoft. Pastry uses a unique id to discover nodes.Each node in a Pastry network has a unique identifier called node id.Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops.( this is the abstract from the paper). Pastry has a unique routing table which as at rows bounded by O(log2b N) and 2 raised to b columns. There was a brief description of Paastry's API's : a)pastryInit(Credentials, Application) b)route(msg,key) c)deliver(msg,key) d)forward(msg,key,nextId) e)newLeafs(leafSet) Pastry routes any nodes to the overlay network in O(logN) steps in the absense of node failures. The experiments that were performed were on a a quad-processor Compaq AlphaServer ES40(500MHz 21264 Alpha CPUs) with 6GBytes of main memory,running True64 UNIX,version 4.0F. The Pastry node software was implemented in Java and executed using Compaqs Java 2 SDK, version 1.2.2-6 and the Compaq FastVM, version 1.2.2-4.(I put this up as of my interest) ---------------------------------------------------------------------- During the end of the class : There was a common consensus that there should be a lookup of the Grpah algorithms concerning this.Also the presenters should put the slides on the web and submit a written report(these are important too) ----------------------------------------------------------------------