Understanding web urls

Consider the following URL

http://216.187.231.34/akc/servlet/DisplayServlet?url=DisplayNoteMPURL&reportId=1356&ownerUserId=hari_komatineni

This URL is a request to the webserver located at 216.187.231.34 to paint a web page using a report Id of 1356. The URL also identifies a owner of the document so that the document can be displayed inside of a master page that belongs to that user. A master page is like a background page.

This URL has the following properties

url partExplanation
http protocol name
:// Protcol separator from the rest of the url string
216.187.231.34 Web server name or address
/akc/servlet/DisplayServlet The above web servers internal adress or identifier.
? Start of the arguments
url=DisplayNoteMPURL argument 1
& Start of an additional argument
reportId=1356 argument 2
ownerUserId=hari_komatineni argument 3

Protocol

The first part of the URL "http" is called a protocol identifier. This part tells the web browser or the webserver, what to do with the rest of the string. Some of the known protocols are "ftp", "file", "mail" etc. This article primarily focuses on the "http" protocol. For other protocols look up "google".

Host and port

The second part of the URL following the protocol identifies a host or a machine using either its name or an ip address. Both methods are equivalent. Using a name is advantageous where available as the ip address can change. A single machine can host any number of web servers. In that case the client needs to tell which web server the client needs access to. This is done through a port number. When a webserver starts on a machine, it uses one of the available ports on the machine. A client needs to know what this port number is. A port number is a logical number that a machine maintains, like a table. There is really no practical limit to the number of ports a machine can maintain. Think of it as a notebook in your house where you keep the names of people that you deal with. If you need a new account, just create a new page with a new number. very similar.

But usually if you don't specify any port on the URL the client assumes that the port is called "8080". This is called a well known port. That means there is a web server on that machine waiting on this port to service web clients. So with this logic the above URL can be rewritten as

http://216.187.231.34:8080/akc/servlet/DisplayServlet ....

Notice how the port number is now explicitly specified

URI: Universal Resource Identifier

The next part in the URL "/akc/servlet/DisplayServlet" is called a resource identifier. This is really an instruction to the web server to deliver a webpage. Usually in static webservers this points to a filename. So a webserver will lookup the filename and returns that html file.

But in a dyanmic web server this string is handed over to a server side logic to decide on a web page. In this particular case this identifies a "java servlet" or a "java class" that is responsible for devlivering the web page. This same servlet, based on the arguments that follow may deliver several different web pages. For example, in Aspire all web pages are delivered through the same servlet. This means this part of the URL stays the same. What page gets delivered is completely dependent on the arguments that follow

Arguments

Arguments are a comma separated list of key, value pairs identifying something meaningful to the web process or the web servlet identified by the URI.

Relative URL

The URL structure that I have discussed so far is called an "absolute URL". This is because, every aspect of the URL is fully specified. If that is an "absolute URL", what is a relative URL?

A relative URL is when you don't specify the protocol, the webserver, and the port. Let me rewrite the above URL as a relative URL

/akc/servlet/DisplayServlet?url=DisplayNoteMPURL&reportId=1356&ownerUserId=hari_komatineni

Notice how the protocol, and webserver are removed from the URL. If you type this URL in your web browser, you won't see the document. This is because the browser doesn't know what the protocol is or what the webserver is. So these relative URLs make sense only when you are linking from an existing webpage.

If your browser has downloaded a web page from a webserver then the browser knows what webserver the page came from. If there are any links on that web page, then the browser can guess the webserver for these relative URLs. The advantage of relative URLs is that when you change the webservers, the links doesn't have to change as they are relative to what ever web server their parent pages are located at.

So when you are creating links on web pages, use these relative URLs as much as possible.

Webapp: A further refinement of the URL

The URI part of the above URL is

/akc/servlet/DisplayServlet

In java based web servers this is further sub divided into two parts. A "web application prefix" and a "servlet identifier".

/akc - web application prefix
/servlet/DisplayServlet - Servlet identifier

For example if you have applications in a java web server, you can separate them as follows

http://host/application1/servlet/myservlet
http://host/application2/servlet/myservlet

See how "application1" and "application2" reside on the same host or web server.

Shortening the servlet names or hiding the servlets

The ordinary web user usually do not care about these nuances, especially the long servlet names etc. For this purpose java servers have a way to declare short aliases for their servlets, or a way to associate part of a url name to a servlet invocation. Using this scheme let me revisit the relative URL I have approached earlier

/akc/servlet/DisplayServlet?url=DisplayNoteMPURL&reportId=1356&ownerUserId=hari_komatineni

Now given a path definition of "display = /servlet/DisplayServlet" on a java web server the above URL can be rewritten as

/akc/display?url=DisplayNoteMPURL&reportId=1356&ownerUserId=hari_komatineni

In summary

In summary, while creating links on a web page, a long URL lik

http://216.187.231.34/akc/servlet/DisplayServlet?url=DisplayNoteMPURL&reportId=1356&ownerUserId=hari_komatineni

can be rewritten as


/akc/display?url=DisplayNoteMPURL&reportId=1356&ownerUserId=hari_komatineni

by taking advantage of relative URLs and also some server specific path mappings