Confusion regarding following a link
Taha Masood (taha.masood@streaming-networks.com)
Fri, 20 Apr 2001 11:37:17 -0700
This is a multi-part message in MIME format.
------=_NextPart_000_0031_01C0C98E.40CEAF40
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hi folks ,
I have a little confusion regarding HTTP , I would appreciate if someone =
could help me solve it.
The problem is as follows:
e.g. I give the following request to a browser:
http://directory.google.com/Top/Computers/Algorithms/
it Builds the a request that aprt from other things contains the =
following in which I am interested now:
GET /Top/Computers/Algorithms/ HTTP/1.1
Host: directory.google.com
Fine , the server responds back and gives an HTML page to me back .
Now I render the HTML to my GUI .
The HTML contains the following link:
<a =
href=3D"/Top/Science/Math/Applications/Communication_Theory/Cryptography/=
Algorithms/">Cryptography</a>
Now the confusion is that if my user "clicks" on the hyperlink given =
above , what request should I generate :
What I used to do till now was to classify the situation into three =
portions:
Whenever we are currently viewing a certain page on the web , and we =
try=20
to follow a link to another page , there can be three cases. For all =
the=20
cases , the current page is say : www.abc.com/help/u1/myHelp.html
FIRST CASE:
The link I try to follow is : "/yourHelp.com"
Effective URL should be:
www.abc.com/help/u1/yourHelp.com
SECOND CASE:
The link I try to follow is : "../../TopLevelHelp.com"
Effective URL should be:
www.abc.com/TopLevelHelp.com
THIRD CASE:
The link I try to follow is : "www.beta.com/OtherHelp.com"
Effective URL should be:
www.beta.com/OtherHelp.com
I had implemented a little parsing in my application which works in a =
way that it is given the URL of the resource currently being displayed =
and the link which we are trying to follow , which given in the HTML =
after " <a href=3D " tag. , and then it returns an Effective URL which =
actually has to be shown . From that URL , I separate the Host part and =
the relative part , and build an HTTP request and pass it on to the =
server . IT used to work pretty fine till now , but I encountered an =
error today , that led me to believe that I was probably NOT =
understanding the things probably.
The problem occurred when I got to the page :
http://directory.google.com/Top/Computers/Algorithms/
The above contains a line in HTML as :
<a =
href=3D"/Top/Science/Math/Applications/Communication_Theory/Cryptography/=
Algorithms/">Cryptography</a>
Now when my user "clicks" on the hyperlink given above , according to =
my CASES , this thing falls into the FIRST CASE , and what I do is that =
the EFFECTIVE URL made is:
http://directory.google.com/Top/Computers/Algorithms/Top/Science/Math/App=
lications/Communication_Theory/Cryptography/Algorithms/
Fine , so I remove the host and relative part and Build the HTTP request =
:
GET =
/Top/Computers/Algorithms/Top/Science/Math/Applications/Communication_The=
ory/Cryptography/Algorithms/ HTTP/1.1
Host: directory.google.com
The server replies that this resource is not there .
When I follow the same link through MS Internet Explorer , the request =
it generates is :
GET =
/Top/Science/Math/Applications/Communication_Theory/Cryptography/Algorith=
ms/ HTTP/1.1
Host: directory.google.com
I fail to understand what are the General Rules for following links ? =
What portion of the RFC refers to it ?
I would really appreciate if someone could explain this to me .
Thanks in advance ,
Regards,
Taha
------=_NextPart_000_0031_01C0C98E.40CEAF40
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
Hi folks ,
I have a little confusion regarding =
HTTP , I would=20
appreciate if someone could help me solve it.
The problem is as follows:
e.g. I give the following request to a=20
browser:
it Builds the a request that aprt from =
other things=20
contains the following in which I am interested now:
GET /Top/Computers/Algorithms/=20
HTTP/1.1
Host: directory.google.com
Fine , the server responds back and =
gives an HTML=20
page to me back .
Now I render the HTML to my GUI =
.
The HTML contains the following =
link:
<a=20
href=3D"/Top/Science/Math/Applications/Communication_Theory/Cryptography/=
Algorithms/">Cryptography</a>
Now the confusion is that if my =
user "clicks"=20
on the hyperlink given above , what request should I generate =
:
What I used to do till now was to =
classify the=20
situation into three portions:
Whenever we are currently =
viewing a certain=20
page on the web , and we try
to follow a link to another page =
, there=20
can be three cases. For all the
cases , the current page is =
say : www.abc.com/help/u1/myHel=
p.html
I had implemented a little parsing in =
my=20
application which works in a way that it is given the URL of the =
resource=20
currently being displayed and the link which we are trying to follow , =
which=20
given in the HTML after " <a href=3D " tag. , and then it returns an =
Effective=20
URL which actually has to be shown . From that URL , I separate the Host =
part=20
and the relative part , and build an HTTP request and pass it on =
to the=20
server . IT used to work pretty fine till now , but I encountered an =
error today=20
, that led me to believe that I was probably NOT understanding the =
things=20
probably.
The problem occurred when I got to the =
page=20
:
The above contains a line in HTML as =
:
<a=20
href=3D"/Top/Science/Math/Applications/Communication_Theory/Cryptography/=
Algorithms/">Cryptography</a>
Now when my user "clicks" on the =
hyperlink=20
given above , according to my CASES , this thing falls into the FIRST =
CASE , and=20
what I do is that the EFFECTIVE URL made is:
Fine , so I remove the host and relative part and Build the HTTP =
request=20
:
GET=20
/Top/Computers/Algorithms/Top/Science/Math/Applications/Communication_The=
ory/Cryptography/Algorithms/=20
HTTP/1.1
Host: directory.google.com
The server replies that this resource is not there .
When I follow the same link through MS Internet Explorer , =
the=20
request it generates is :
GET=20
/Top/Science/Math/Applications/Communication_Theory/Cryptography/Algorith=
ms/=20
HTTP/1.1
Host: directory.google.com
I fail to understand what are the General Rules for following links =
? What=20
portion of the RFC refers to it ?
I would really appreciate if someone could explain this to me =
.
Thanks in advance ,
Regards,
Taha
------=_NextPart_000_0031_01C0C98E.40CEAF40--