Final Programming Project

You may work in teams of 2 or 3. Please submit your assignment before midnight on April 8th on Richards server http://leguen.ca/soen229/assignments/view/programming%20project.

Overview

So far we have been using Perl for scripts that a user might want to use at a command line. In this Assignment we will move from local users, to non-local users. This is the equivalent of the Web Server Software (circled in red below)

Programming Project: simplistic Web Server

  1. Read an HTTP request out of STDIN
  2. Output a corresponding HTTP response to STDOUT
  3. Log the server activities
  4. Write a simple server side script in cgi to save a file to the server

You need to test your server with an html file (provided below), jpg file (provdied below) and with a server-side script (provided below). Your server should be able to act differently when it is asked for an html or jpg, vs a server side script. After you have written your server you will know enough about how a server works to write a perl script which runs on and interacts with your server. For part 4 of the project, you need to write a server-side script that will upload a file.

Marking Scheme

  1. Theory (30%)
    1. Demonstrate understanding of your server script (readme, comments and usage) (10%)
    2. Demonstrate understanding of HTTP requests and responses (10%)
    3. Demonstrate understanding of GET and POST methods (5%)
    4. Demonstrate knowledge of Perl (5%)
  2. Coding (70%)
    1. Sever script (60%)
      1. ENV hash (10%)
      2. Logging (10%)
      3. Handle cgi requests (test with both get and post samples (10%)
      4. Print HTTP response (10%)
        1. GET method
        2. POST method
      5. Error pages (10%)
      6. Clarity of algorithm design (10%)
  3. BONUS: CGI script to upload a file (10% upload a text, 15% upload a binary file)


Table of Contents

A detailed tutorial explaining what a webserver does and suggestions for implementation are provided below.

0 Open a port so you can access localhost:8080

Richard has provided a script which allow your webserver script to communciate with a browser. It only runs on Unix computers. Save the script to the folder where you are developing your project. The script takes your server script as an argument, like this
perl HttpWebServer.pl perl my_server_script.pl 

After you call the script you can contact your webserver script at http://localhost:8080.

Refer to the instructions in this tutorial.

1 Read an HTTP request

An HTTP (hypertext transfer protocol) request at it's most basic is a command line which asks the server to "get" a file. Over the years since the first Textual Browsers this has become more complicated. You can read more about it on Wikipedia.

GET /somedirectory/filename.html

The first column below shows an HTTP request from a Firefox browser. The second column shows how you would save this in a Perl hash. This is an overview, the sections will give more details on how to do this.

Preview of what you will do

Sample Input:

Sample Output:

GET /index.html?var2=hi&var3=yo HTTP/1.1 REQUEST_METHOD => GET
QUERY_STRING => var2=hi&var3=yo
FILE_REQUESTED => /index.html
CONTEXT_TYPE => html
HTTP_var2 => hi
HTTP_var3 => yo
Host: localhost:8083 SERVER_NAME => localhost:8083
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8 HTTP_USER_AGENT => Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 HTTP_ACCEPT => text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5 HTTP_ACCEPT_LANGUAGE => en-gb,en;q=0.5
Accept-Encoding: gzip,deflate HTTP_ACCEPT_ENCODING => gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 HTTP_ACCEPT_CHARSET => ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300 HTTP_KEEP_ALIVE => 300
Connection: keep-alive HTTP_CONNECTION => keep-alive
Cache-Control: max-age=0 HTTP_CACHE_CONTROL => max-age=0
You can find out the HTTP request that your browser sent the script by reading it in and printing it out. To test you can use this instead of your webserver script
#!/usr/bin/perl -w
use strict;
my $stdin;
my $request;


####### print some stuff so that the page will appear
        print "HTTP/1.0 200 OK\n";
        print "Content-Type: text/plain\n";

######## loop to read the request into a variable and print it out
        do {
                $stdin = STDIN;
                $request .= $stdin;
        } while ($stdin =~ /\S/);

        print "This line wont appear $request \n";
        print "Print the HTTP request in the browser \n$request \n";

The next sections explain how to extract the information from the HTTP Request and store it into a hash in your script.

1.1 A data structure to hold the HTTP request information

You need the information that is in the HTTP request. To have easy access to the information as key=>value pairs you should put them into the %ENV hash. The %ENV is a special hash where Perl it keeps information. Here is an example:

Sample %ENV hash
QTINC => /usr/lib/qt-3.3/include
HOST => akchin.cs.concordia.ca
SSH_ASKPASS => /usr/libexec/openssh/gnome-ssh-askpass
NNTPSERVER => newsflash.concordia.ca
REMOTEHOST => 
OSTYPE => linux
SSH_AUTH_SOCK => /tmp/keyring-ncjkax/ssh
LESSOPEN => |/usr/bin/lesspipe.sh %s
KDE_IS_PRELINKED => 1
PWD => /nfs/www/home/v/v_cook/teaching/soen229/assignment3/solution
QT_PLUGIN_PATH => /usr/lib/kde4/plugins
USER => v_cook
LANG => en_US.UTF-8
VISUAL => vim
GROUP => v_cook
GNOME_DESKTOP_SESSION_ID => Default
G_BROKEN_FILENAMES => 1
LOGNAME => v_cook
ORGANIZATION => ENCS - Concordia University
SHLVL => 3
XDG_SESSION_COOKIE => 1c92605ceaa193a7ddd277f948a2f128-1238277167.670071-271864797
INPUTRC => /etc/inputrc
QTLIB => /usr/lib/qt-3.3/lib
PATH => /encs/bin:/usr/bin:/bin:/usr/sbin:/sbin
WINDOWID => 48238120
MODULEPATH => /usr/share/Modules/modulefiles:/etc/modulefiles:
GTK_MODULES => gnomebreakpad
COLORTERM => gnome-terminal
HOSTTYPE => i386-linux
HISTSIZE => 1000
TERM => xterm
DM_CONTROL => /var/run/xdmctl
KDEDIRS => /usr
PAGER => less
XMODIFIERS => @im=none
HOME => /home/v/v_cook
DBUS_SESSION_BUS_ADDRESS => unix:abstract=/tmp/dbus-qE4PrY8ORQ,guid=f37a24cbd6e9aeaf88d7350549ce9c30
SSH_AGENT_PID => 13109
GNOME_KEYRING_PID => 12940
MANPATH => :/encs/man:/usr/share/man
WINDOWPATH => 7
DISPLAY => :0.0
GTK_RC_FILES => /etc/gtk/gtkrc:/home/v/v_cook/.gtkrc-1.2-gnome2
MODULESHOME => /usr/share/Modules
XDM_MANAGED => method=classic
MAIL => /var/spool/mail/v/v_cook
EDITOR => vim
QTDIR => /usr/lib/qt-3.3
VENDOR => intel
LOADEDMODULES => 
GNOME_KEYRING_SOCKET => /tmp/keyring-ncjkax/socket
HOSTNAME => akchin.cs.concordia.ca
OLDPWD => /nfs/www/home/v/v_cook/teaching/soen229/lab9_scripts
DESKTOP_SESSION => default
_ => /encs/bin/perl
SDL_AUDIODRIVER => esd
DESKTOP_STARTUP_ID => 
LS_COLORS => 
SHELL => /encs/bin/tcsh
BASH_ENV => /encs/Share/bash/profile
MACHTYPE => i386
SESSION_MANAGER => local/unix:@/tmp/.ICE-unix/13108,unix/unix:/tmp/.ICE-unix/13108
HISTCONTROL => ignoredups


If you're curious, you can see what is in your %ENV hash using this script:
#!/usr/bin/perl -w
use strict;

my %hash = %ENV;
for my $key ( keys %hash ) {
    my $value = $hash{$key};
    print "$key => $value\n";
}

The next section will talk about how to format the information you got from the HTTP Request, and store it into the %ENV hash.

1.2 Formatting the hash key value pairs

There are a number of small changes you need to do to the HTTP request lines to make them into %ENV key=>value pairs.

1.2.1 Formatting the Request Method

Let's start with first line of an HTTP Request. There is a number of useful information in the first line of an HTTP request. Adding all the information to the %ENV will require a number of steps which you need to figure out. Example:
URL:
http://localhost:8080/index.html?var2=hi&var3=yo
The browser will send this HTTP request:
GET /index.html?var2=hi&var3=yo HTTP/1.1
From the first line of the HTTP request your script must create this hash:
HTTP_var3 => yo
FILE_REQUESTED => /index.html
REQUEST_METHOD => GET
HTTP_var2 => hi
QUERY_STRING => var2=hi&var3=yo

1.2.2 If the line is about the host

Another detail, you need to save the Host line as SERVER_NAME
Host: localhost:8080
$ENV{'SERVER_NAME'} = 'localhost:8080';

1.2.3 For all other lines

As you noticed, the keys in the %ENV were all uppercase with underscore _ between words. For help doing this see the tutorial on regular expressions.
Sample request lines:
Keep-Alive: 300
Sample perl lines to add these to the %ENV:
$ENV{'HTTP_KEEP_ALIVE'} = '300';

2 Print the HTTP response

After you read the HTTP request you need to reply with the file requested, and information about the file so that the Browser knows when it recieved all the information.

2.1 Locate the file

Like your command line interpreter you have a path variable to say where your files are, however in this case its called a Document-Root. Store the value of your DOCUMENT_ROOT into the %ENV hash.

A URL (Uniform Resource Locator) on the internet is composed of 2 primary parts:
1 2
The doman name which is licensed to you, The path to the file on the server that should be displayed.
Eg: concordia.ca Eg. info/futurestudents/undergraduate/
A path on a server is also composed of 2 primary parts:
1 2
The server root The path to the file on the server that should be displayed.
Eg: /www/home/c/c_concordia Eg. info/futurestudents/undergraduate/index.html

If the user gives a path to a directory, the script should display the "index.html" in that directory if there is one.

Before you open the file be sure that it exists and that the "world" has permission to read it. You will need to reuse some of your code in Assignment 2 to check the permisions on the file before you open it.

2.2 Content Type

Almost the only important thing in your HTTP Response is the Context Type.

Before you can print out the contents of the file, you need to print out some headers so that the browser will know how to display the contents. The Content-Type you can figure out from the extension of the file which the user requested.

Here are some common file extensions and their matching Content-Type:
Extension Content-Type
.html text/html
.htm text/html
.css text/css
.txt text/plain
.xml text/xml
.gif image/gif
.jpeg image/jpeg
.jpg image/jpeg
.png image/png
.mp3 audio/mpeg
.pdf application/pdf
other application/octet-stream

Sample perl script

Perl script print "Content-type: text/html\n\n";

2.3 Request normal files

Requesting non .cgi pages will be done as described above. If there are problems you will need generate error pages.

2.3.1 Test your server with these files

2.3.2 Errors

If something goes wrong your server script needs to display information about the error. You should probably use a subroutine with a template to create a minimal error page like the Apache page on the left. If you want to have some fun you can generate a fancier page like the Concordia page on the right.

Sample Error pages:

Your error pages should contain an error code number (ie 404,403), a message about what the problem might be and the server information. Try creating errors on various websites and see the kinds of information they print http://users.encs.concordia.ca/~v_cook/looking HINT: you should be able to replicate this information using the information you stored in %ENV.

Error pages:

2.4 Request for CGI (Server-Side Scripts)

Your server also must handle server side scripts. A server side script written in Perl is called a ‘.cgi’ script and uses the .cgi extension instead of .pl Other than that it is just a normal perl file.

A CGI script will generate the content of the page rather than just display it, so you need to follow a different procedure than for the other context types. If the .cgi script is executable output a status line and then system() to execute the script. The script will generate the Content-Type header and the body of the response.

2.4.1 Test your server with these files

After you know your server works you should write a .cgi to go with the upload.html. See Section 4 for details.

2.4.2 Errors


3 Keep a Logfile

If your script is run with an ‘l’ option set, it should write to a log file. Since your script only runs for the duration of one request, it must append output to the logfile. Every time your script is invoked, it must append a line to the logfiel similar to the following:

[10/Oct/2000:13:55:36 -0700] “GET /apache_pb.gif HTTP/1.0” 200 2326 “Mozilla/4.08 [en] (Win98; I ;Nav)”

This logging includes the following information:


4 Write a .cgi that uses POST to upload a file

Now that you have written a web server, you know how to read and write HTTP Requests and Responses. Your .cgi script will do the reading of a HTTP Request it self, and the writing of the Response itself. This is particularly useful if you want to hide the data you are transmiting using the POST method.

To begin with I suggest exploring the sample login html, and validatelogin.cgi. There are multiple versions of each file for the GET and POST methods, as well as the default one validatelogin.cgi which has some if statements to allow any method.

For more information about writing a .cgi in Perl you can look at the slides from Web Applications.

I reccomend writing your script so that you can use either POST, GET or even a command line so that you can debug offline. To see how to do this look at the source code for validatelogin.cgi
If you you use if statements you can even test a cgi at the command line:

4.0 Sample Source

#!/usr/bin/perl
#alarm=60;
#
print "Content-type: text/html\n\n";
print "";
print "This uses either GET or POST to validate the info that was entered on the login.html page.
You can also use this script at the command line.

\n "; #Script can handle either get or post, or even command line input #For more info: http://www.cknuckles.com/webapps/downloads/PowerPoint/chap06.ppt my $datastring; if ($ENV{"REQUEST_METHOD"} eq "POST") { read(STDIN, $datastring, $ENV{"CONTENT_LENGTH"}); } elsif (exists $ENV{"REQUEST_METHOD"}) { $datastring = $ENV{"QUERY_STRING"}; } else { print "Offline execution detected\n"; print "Please enter some data. This is the equvalent to a Query-String in a browser.\n"; $datastring = <>; chomp $datastring; } ($user, $pass) = split (/&/, $datastring , 2); ($junk, $user) = split(/=/, $user, 2); ($junk, $pass) = split(/=/, $pass , 2); print "This is the username entered $user
\n"; print "This is the password entered $pass
\n"; $filename="passwd.txt"; open("IN","<".$filename); .. more code checking if the user and password are valid see the source file ...

Output at the Commandline


> perl validatelogin.cgi
Content-type: text/html

This uses either GET or POST to validate the info that was entered on the login.html page.
Offline execution detected Please enter some data. This is the equvalent to a Query-String in a browser. username=gina&pass=hi This is the username entered gina
This is the password entered hi
I looked in the passwd.txt file and the user was not found. >

4.1 HTTP Request processing by your .cgi

You need to write a .cgi that uses the POST method to upload a file. Take in the data using "read" function and save it into a file on the server. Now that you have written a web server you know how to read and write HTTP Requests and Responses. Your .cgi script will do the reading of a HTTP Request it self, and the writing of the Response itself.

4.1.1 Sample Input

Here is a sample text file to upload.

4.1.2 Sample Output HTTP Request using POST

 -----------------------------196291262324084
Content-Disposition: form-data; name="uploaded-file"; filename="081115assign3.notes.txt"
Content-Type: text/plain

notes from 12.4

search engine

inverted index/inverted file, 

1. occurance list of index terms in dictionary format:
keys are words, proper nouns that might be searched for
Entry(word,Lcollection of pages containing the word)

2. compressed trie for the entries in the dictionray (uindex terms 
the external nodes store the index of the occurance listfor tha tturm

the occurance lists will be big, so just hve a poitner to them in the tree to keep the leaves cleen
find the keyword in the tree, then return the associated occurance list

to facilitate intersection of multiple keywords' occurance list the occurance list should also be a dictionary collection see 11.6



-----------------------------196291262324084--



4.2 HTTP Response processing by your .cgi

The response body will say if the upload was successful, where it was uploaded and a link to the file so that the user could download it again.

Details of a HTTP request header
Key Request Header Value
CONTENT_LENGTH Content-Length The length (in bytes) of the body of the HTTP request, if provided by the HTTP Request’s header of the same name.
Example: ??
CONTENT_TYPE Content-Type The content type of the body of the HTTP request, if provided by the HTTP equests’s header of the same name.
Example: text/html
DOCUMENT_ROOT The home directory of your “web server”; All requested URLs are considered to be relative to the document root.
Example: /www/home/u/u_user/
QUERY_STRING – (in the request line) The query information obtained from the requested URL (anything after a ‘?’ character)
Example: validatelogin.cgi?user=michael&pass=this+isa+pass
REQUEST_METHOD – (in the request line) The HTTP method used by this request. Provided by the HTTP request’s ‘request line’.
Example: GET
SERVER_NAME Host The server’s name or ip address. Provided by the HTTP request’s ‘Host’ header.
Example: users.encs.concordia.ca
SERVER_PROTOCOL – (in the request line) The name and revision of the HTTP request’s protocol. Provided by the HTTP request’s ‘request line’.
Example: HTTP/1.1
SERVER_SOFTWARE The name and version of your "web server" script which is answering these requests… try to have fun here, and give your script a cool name!
Example: Apache/1.3.31 (Unix) mod_ssl/2.8.19 OpenSSL/0.9.7l
Example of your own software: AFMPSTA A far more patchy server than Apache :)