Web Technologies Web Architectures Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel
http://www.beatsigner.com 2 December 2005
Basic Client-Server Web Architecture HTTP Request
Internet HTTP Response
Client
Server
Effect of typing http://www.vub.ac.be in the broswer bar (1) (2)
(3) (4)
October 6, 2017
use a Domain Name Service (DNS) to get the IP address for www.vub.ac.be (answer 134.184.129.2) create a TCP connection to 134.184.129.2 send an HTTP request message over the TCP connection visualise the received HTTP response message in the browser Beat Signer - Department of Computer Science -
[email protected]
2
Web Server Tasks of a web server (1) (2) (3) (4) (5)
setup connection receive and process HTTP request fetch resource create and send HTTP response logging
Worldwide Web Servers, http://news.netcraft.com
The most prominent web servers are the Apache HTTP Server and Microsoft's Internet Information Services (IIS)
A lot of devices have an embedded web server
October 6, 2017
printers, WLAN routers, TVs, ... Beat Signer - Department of Computer Science -
[email protected]
3
Example HTTP Request Message GET / HTTP/1.1 Host: www.vub.ac.be User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Connection: keep-alive
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
4
Example HTTP Response Message HTTP/1.1 200 OK Date: Thu, 03 Oct 2013 17:02:19 GMT Server: Apache/2.2.14 (Ubuntu) X-Powered-By: PHP/5.3.2-1ubuntu4.15 Content-Language: nl Set-Cookie: lang=nl; path=/; domain=.vub.ac.be; expires=Mon, 18-Sep-2073 17:02:16 GMT Content-Type: text/html; charset=utf-8 Keep-Alive: timeout=15, max=987 Connection: Keep-Alive Transfer-Encoding: chunked ... Vrije Universiteit Brussel | Redelijk eigenzinnig ...
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
5
HTTP Protocol Request/response communication model
HTTP Request HTTP Response
Communication always has to be initiated by the client Stateless protocol (no sessions) HTTP can be used on top of various reliable protocols
TCP is by far the most commonly used one runs on TCP port 80 by default
Latest version: HTTP/2.0 (May 2015) HTTPS scheme used for encrypted connections October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
6
Uniform Resource Identifier (URI) A Uniform Resource Identifier (URI) uniquely identifies a resource
There are two types of URIs
Uniform Resource Locator (URL) - contains information about the exact location of a resource - consists of a scheme, a host and the path (resource name) - e.g. https://vub.academia.edu/BeatSigner - problem: the URL changes if resource is moved! • idea of Persistent Uniform Resource Locators (PURLs) [https://purl.oclc.org]
Uniform Resource Name (URN) - unique and location independent name for a resource - consists of a scheme name, a namespace identifier and a namespace-specific string (separated by colons) - e.g. urn:ISBN:3837027139
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
7
HTTP Message Format HTTP/1.1 200 OK
start line
Date: Thu, 03 Oct 2013 17:02:19 GMT Server: Apache/2.2.14 (Ubuntu) X-Powered-By: PHP/5.3.2-1ubuntu4.15 Transfer-Encoding: chunked Content-Type: text/html
header field(s)
blank line (CRLF) ...
message body (optional)
Request and response messages have the same format HTTP_message = start_line , {header} , "CRLF" , {body};
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
8
HTTP Request Message Request-specific start line start_line = method, " " , resource , " " , version; method = "GET" , "HEAD" , "POST" , "PUT" , "TRACE" , "OPTIONS" , "DELETE"; resource = complete_URL | path; version = "HTTP/" , major_version, "." , minor_version;
Methods
October 6, 2017
GET : get a resource from the server HEAD : get the header only (no body) POST : send data (in the body) to the server PUT : store request body on server TRACE : get the "final" request (after it has potentially been modified by proxies) OPTIONS : get a list of methods supported by the server DELETE: delete a resource on the server Beat Signer - Department of Computer Science -
[email protected]
9
HTTP Response Message Response-specific start line start_line = version , status_code , reason; version = "HTTP/" , major_version, "." , minor_version; status_code = digit , digit , digit; reason = string_phrase;
Status codes
October 6, 2017
100-199 : informational 200-299 : success (e.g. 200 for 'OK') 300-399 : redirection 400-499 : client error (e.g. 404 for 'Not Found') 500-599 : server error (e.g. 503 for 'Service Unavailable')
Beat Signer - Department of Computer Science -
[email protected]
10
HTTP Header Fields There exist general headers (for requests and responses), request headers, response headers, entity headers and extension headers Some important headers
Accept - request header definining the Media Type that the client will accept • formerly known as Multipurpose Internet Mail Extensions (MIME type)
User-Agent - request header specifying the type of client
Keep-Alive (HTTP/1.0) and Persistent (HTTP/1.1) - general header helping to improve the performance since otherwise a new HTTP connection has to be established for every single webpage element
Content-Type - entity header specifing the body's MIME type
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
11
HTTP Header Fields ... Some important headers ...
If-Modified-Since - request header that is used in combination with a GET request (conditional GET); the resource is only returned if it has been modified since the specified date
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
12
Media Types The Media Type defines the request or response body's content (used for appropiate processing)
7 top-level media types
mediaType = toplevel_type , "/" , subtype;
Standard Media Types are registered with the Internet Assigned Numbers Authority (IANA) [RFC-6838] Media Type
Description
text/plain
Human-readable text without formatting information
text/html
HTML document
image/jpeg
JPEG-encoded image
... October 6, 2017
... Beat Signer - Department of Computer Science -
[email protected]
13
HTTP Message Information Various tools for HTTP message logging
e.g. HttpFox add-on for Firefox browser
Simple telnet connection telnet wise.vub.ac.be 80 (press Enter) GET /beat-signer HTTP/1.1 (press Enter) Host: wise.vub.ac.be (press Enter 2 times)
Until 1999 the W3C has been working on HTTP Next Generation (HTTP-NG) as a replacement for HTTP/1.1
never introduced recently HTTP/2.0 has been released - inspired by Goggle’s development of SPDY
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
14
Proxies
Internet Proxy Client
Server
A web proxy is situated between the client and the server
acts as a server to the client and as a client to the server can for example be specified in the browser settings; used for - firewalls and content filters
- transcoding (on the fly transformation of HTTP message body) - content router (e.g. select optimal server in content distribution networks) - anonymous browsing, ... October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
15
Caches Client 1
1 Internet 2
1 2
Client 2
Proxy Cache
Server
A proxy cache is a special type of proxy server
October 6, 2017
can reduce server load if multiple clients share the same cache often multi-level hierarchies of caches (e.g. continent, country and regional level) with communication between sibling and parent caches as defined by the Internet Cache Protocol (ICP) passive or active (prefetching) caches Beat Signer - Department of Computer Science -
[email protected]
16
Caches ... Special HTTP cache control header fields
Expires - expiration date after which the cached resource has to be refetched
Cache-Control: max-age - maximum age of a document (in seconds) after it has been added to the cache
Cache-Control: no-cache - response cannot be directly served from the cache (has to be revalidated first)
...
Validators
Last-modified time as validator - cache with resource that has been last modified at time t uses an If-Modified-Since t request for updates
Entity tags (ETag) - changed by the publisher if content has changed; If-None-Match etag request
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
17
Caches ... Advantages
reduces latency and used network bandwidth reduces server load (client and reverse proxy caches) transparent to client and server
Disadvantages
October 6, 2017
additional resources (hardware) required might get stale data out of the cache creates additional network traffic if we use an active caching approach (prefetching) but achieve a low cache hit rate server loses control (e.g. access statistics) since no longer all requests have to be sent to the server
Beat Signer - Department of Computer Science -
[email protected]
18
Tunnels
Internet
SSL
SSL
HTTP
SSL Client
HTTP[SSL]
HTTP[SSL]
SSL Server
Implement one protocol on top of another protocol
e.g. HTTP as a carrier for SSL connections
Often used to "open" a firewall to protocols that would otherwise be blocked
October 6, 2017
e.g. tunneling of SSL connections through an open HTTP port Beat Signer - Department of Computer Science -
[email protected]
19
Gateways
Internet
HTTP
FTP
HTTP Client
HTTP/FTP Gateway
FTP Server
A gateway can act as a kind of "glue" between applications (client) and resources (server)
October 6, 2017
translate between two protocols (e.g. from HTTP to FTP) security accelerator (e.g. HTTPS/HTTP on the server side) often the gateway and destination server are combined in a single application server (HTTP to server application translator) Beat Signer - Department of Computer Science -
[email protected]
20
Session Management HTTP is a stateless protocol Session (state) tracking solutions
use of IP address - problem: IP address is often not uniquely assigned to a single user
browser login - use of special HTTP authenticate headers - after a login the browser sends the user information in each request
URL rewriting - add information to the URL in each request
hidden form fields - similar to URL rewriting but information can also be in body (POST request)
cookies - the server stores a piece of information on the client which is then sent back to the server with each request
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
21
Cookies Introduced by Netscape in June 1994 A cookie is a piece of information that is assigned to a client on their first visit
list of pairs often just a unique identifier sent via Set-Cookie or Set-Cookie2 HTTP response headers
Browser stores the information in a "cookie database" and sends it back every time the same server is accessed
Potential privacy issues
third-party websites might use persistent cookies for user tracking
Cookies can be disabled in the browser settings October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
22
Hypertext Markup Language (HTML) Beat Signer: Interactive Paper, PaperWorks, Paper++, ... Beat Signer is Associate Professor of Computer Science at the VUB and co-director of the WISE laboratory ...
Dominant markup language for webpages If you never heard about HTML have a look at
http://www.w3schools.com/html/
More details in the exercise and in the next lecture October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
23
Dynamic Web Content Often it is not enough to serve static web pages but content should be changed on the client or server side
Server-side processing
Common Gateway Interface (CGI) Java Servlets JavaServer Pages (JSP) PHP: Hypertext Preprocessor (PHP) ...
Client-side processing
October 6, 2017
JavaScript Java Applets Adobe Flash ... Beat Signer - Department of Computer Science -
[email protected]
24
Common Gateway Interface (CGI) Program in Perl, Tcl, C, C++, Java, ..
CGI HTTP Request
Internet HTTP Response
HTML Pages Client
Server
CGI was the first server-side processing solution
October 6, 2017
transparent to the user certain requests (e.g. /account.pl) are forwarded via CGI to a program by creating a new process program processes the request and creates an answer with optional HTTP response headers Beat Signer - Department of Computer Science -
[email protected]
25
Common Gateway Interface (CGI) ... CGI Problems
a new process has to be started for each request if the CGI program for example acts as a gateway to a database, a new DB connection has to be established for each request which results in a very poor performance
FastCGI solves some of the problems by introducing persistent processes and process pools
CGI/FastCGI becomes more and more replaced by other technologies (e.g. Java Servlets)
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
26
Java Servlets
Servlet Container
HTTP Request
Servlets
Internet HTTP Response
HTML Pages Client
Server
A Java servlet is a Java class that has to extend the abstract HTTPServlet class
The Java servlet class is loaded by a servlet container and relevant requests (based on a servlet binding) are forwarded to the servlet instance for further processing October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
27
Java Servlets ... Main HttpServlet methods doGet(HttpServletRequest req, HttpServletResponse resp) doPost(HttpServletRequest req, HttpServletResponse resp) init(ServletConfig config) destroy()
Servlet life cycle
a servlet is initialised once via the init() method the doGet(), doPost() methods may be executed multiple times (by different HTTP requests) finally the servlet container may unload a servlet (upcall of the destroy() method before that happens)
Servlet container (e.g. Apache Tomcat) either integrated with web server or as standalone component October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
28
Java Servlet Example package org.vub.wise; import import import import
java.io.*; java.util.Date; javax.servlet.http.*; javax.servlet.*;
public class HelloWorldServlet extends HttpServlet { public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { PrintWriter out = res.getWriter(); out.println(""); out.println("Hello World"); out.println("The time is " + new Date().toString() + ""); out.println(""); out.close(); } }
In the exercise you will learn how to process parameters etc. October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
29
JavaServer Pages (JSP) A "drawback" of Java servlets is that the whole page (e.g. HTML) has to be defined within the servlet
not easy to share tasks between web designer and programmer
Add program code through scriptlets and markup to existing HTML pages
These JSP documents are then either interpreted on the fly (Apache Tomcat) or compiled into Java servlets
The JSP approach is similar to PHP or Active Server Pages (ASP)
Note that Java Servlets become more and more an enabling technology (as with JSP) October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
30
JavaScript Interpreted scripting language for client-side processing JavaScript functionality often embedded in HTML documents but can also be provided in separate files
JavaScript often used to
validate data (e.g. in a form) dynamically add content to a webpage process events (onLoad, onFocus, etc.) change parts of the original HTML document create cookies ...
Note: Java and JavaScript are completely different languages! October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
31
JavaScript Example document.write("Hello World!");
More details about JavaScript in lecture 6 and in the exercise session
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
32
Java Applets A Java applet is a program delivered to the client side in the form of Java bytecode
executed in the browser using a Java Virtual Machine (JVM) an applet has to extend the Applet or JApplet class runs in the sandbox
Advantages
the user automatically always has the most recent version high security for untrusted applets full Java API available
Disadvantages
October 6, 2017
requires a browser Java plug-in
Beat Signer - Department of Computer Science -
[email protected]
33
Java Applets ... Disadvantages ...
only signed applets can get more advanced functionality - e.g. network connections to other machines than the source machine
More recently Java Web Start (JavaWS) is replacing Java Applets
program no longer runs within the browser - less problematic security restrictions - less browser compatibility issues
Math and Physics Applet Examples
October 6, 2017
http://www.falstad.com/mathphysics.html
Beat Signer - Department of Computer Science -
[email protected]
34
Exercise 2 Hands-on experience with the HTTP protocol
October 6, 2017
Beat Signer - Department of Computer Science -
[email protected]
35
References David Gourley et al., HTTP: The Definitive Guide, O'Reilly Media, September 2002
R. Fielding et al., RFC2616 - Hypertext Transfer Protocol - HTTP/1.1
http://www.faqs.org/rfcs/rfc2616.html
N. Freed et al., RFC6838 - Media Type Specifications and Registration Procedures
http://www.faqs.org/rfcs/rfc6838.html
HTML and JavaScript Tutorials
October 6, 2017
http://www.w3schools.com
Beat Signer - Department of Computer Science -
[email protected]
36
References ... M. Knutson, HTTP: The Hypertext Transfer Protocol (refcardz #172)
http://refcardz.dzone.com/refcardz/http-hypertext-transfer-0
W. Jason Gilmore, PHP 5.4 (refcardz #23)
http://refcardz.dzone.com/refcardz/php-54-scalable
Java Servlet Tutorial
October 6, 2017
http://www.tutorialspoint.com/servlets/
Beat Signer - Department of Computer Science -
[email protected]
37
Next Lecture HTML5 and the Open Web Platform
2 December 2005