Web Architectures - Web Technologies (1019888BNR)

Share Embed


Descrição do Produto

Web Technologies Web Architectures Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel

http://www.beatsigner.com 2 December 2005

Basic Client-Server Web Architecture HTTP Request

Internet HTTP Response

Client

Server

 Effect of typing http://www.vub.ac.be in the broswer bar (1) (2)

(3) (4)

October 6, 2017

use a Domain Name Service (DNS) to get the IP address for www.vub.ac.be (answer 134.184.129.2) create a TCP connection to 134.184.129.2 send an HTTP request message over the TCP connection visualise the received HTTP response message in the browser Beat Signer - Department of Computer Science - [email protected]

2

Web Server  Tasks of a web server (1) (2) (3) (4) (5)

setup connection receive and process HTTP request fetch resource create and send HTTP response logging

Worldwide Web Servers, http://news.netcraft.com

 The most prominent web servers are the Apache HTTP Server and Microsoft's Internet Information Services (IIS)

 A lot of devices have an embedded web server 

October 6, 2017

printers, WLAN routers, TVs, ... Beat Signer - Department of Computer Science - [email protected]

3

Example HTTP Request Message GET / HTTP/1.1 Host: www.vub.ac.be User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Connection: keep-alive

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

4

Example HTTP Response Message HTTP/1.1 200 OK Date: Thu, 03 Oct 2013 17:02:19 GMT Server: Apache/2.2.14 (Ubuntu) X-Powered-By: PHP/5.3.2-1ubuntu4.15 Content-Language: nl Set-Cookie: lang=nl; path=/; domain=.vub.ac.be; expires=Mon, 18-Sep-2073 17:02:16 GMT Content-Type: text/html; charset=utf-8 Keep-Alive: timeout=15, max=987 Connection: Keep-Alive Transfer-Encoding: chunked ... Vrije Universiteit Brussel | Redelijk eigenzinnig ...

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

5

HTTP Protocol  Request/response communication model  

HTTP Request HTTP Response

 Communication always has to be initiated by the client  Stateless protocol (no sessions)  HTTP can be used on top of various reliable protocols 



TCP is by far the most commonly used one runs on TCP port 80 by default

 Latest version: HTTP/2.0 (May 2015)  HTTPS scheme used for encrypted connections October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

6

Uniform Resource Identifier (URI)  A Uniform Resource Identifier (URI) uniquely identifies a resource

 There are two types of URIs 

Uniform Resource Locator (URL) - contains information about the exact location of a resource - consists of a scheme, a host and the path (resource name) - e.g. https://vub.academia.edu/BeatSigner - problem: the URL changes if resource is moved! • idea of Persistent Uniform Resource Locators (PURLs) [https://purl.oclc.org]



Uniform Resource Name (URN) - unique and location independent name for a resource - consists of a scheme name, a namespace identifier and a namespace-specific string (separated by colons) - e.g. urn:ISBN:3837027139

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

7

HTTP Message Format HTTP/1.1 200 OK

start line

Date: Thu, 03 Oct 2013 17:02:19 GMT Server: Apache/2.2.14 (Ubuntu) X-Powered-By: PHP/5.3.2-1ubuntu4.15 Transfer-Encoding: chunked Content-Type: text/html

header field(s)

blank line (CRLF) ...

message body (optional)

 Request and response messages have the same format HTTP_message = start_line , {header} , "CRLF" , {body};

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

8

HTTP Request Message  Request-specific start line start_line = method, " " , resource , " " , version; method = "GET" , "HEAD" , "POST" , "PUT" , "TRACE" , "OPTIONS" , "DELETE"; resource = complete_URL | path; version = "HTTP/" , major_version, "." , minor_version;

 Methods 

   

  October 6, 2017

GET : get a resource from the server HEAD : get the header only (no body) POST : send data (in the body) to the server PUT : store request body on server TRACE : get the "final" request (after it has potentially been modified by proxies) OPTIONS : get a list of methods supported by the server DELETE: delete a resource on the server Beat Signer - Department of Computer Science - [email protected]

9

HTTP Response Message  Response-specific start line start_line = version , status_code , reason; version = "HTTP/" , major_version, "." , minor_version; status_code = digit , digit , digit; reason = string_phrase;

 Status codes  

  

October 6, 2017

100-199 : informational 200-299 : success (e.g. 200 for 'OK') 300-399 : redirection 400-499 : client error (e.g. 404 for 'Not Found') 500-599 : server error (e.g. 503 for 'Service Unavailable')

Beat Signer - Department of Computer Science - [email protected]

10

HTTP Header Fields  There exist general headers (for requests and responses), request headers, response headers, entity headers and extension headers  Some important headers 

Accept - request header definining the Media Type that the client will accept • formerly known as Multipurpose Internet Mail Extensions (MIME type)



User-Agent - request header specifying the type of client



Keep-Alive (HTTP/1.0) and Persistent (HTTP/1.1) - general header helping to improve the performance since otherwise a new HTTP connection has to be established for every single webpage element



Content-Type - entity header specifing the body's MIME type

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

11

HTTP Header Fields ...  Some important headers ... 

If-Modified-Since - request header that is used in combination with a GET request (conditional GET); the resource is only returned if it has been modified since the specified date

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

12

Media Types  The Media Type defines the request or response body's content (used for appropiate processing) 

7 top-level media types

mediaType = toplevel_type , "/" , subtype;

 Standard Media Types are registered with the Internet Assigned Numbers Authority (IANA) [RFC-6838] Media Type

Description

text/plain

Human-readable text without formatting information

text/html

HTML document

image/jpeg

JPEG-encoded image

... October 6, 2017

... Beat Signer - Department of Computer Science - [email protected]

13

HTTP Message Information  Various tools for HTTP message logging 

e.g. HttpFox add-on for Firefox browser

 Simple telnet connection telnet wise.vub.ac.be 80 (press Enter) GET /beat-signer HTTP/1.1 (press Enter) Host: wise.vub.ac.be (press Enter 2 times)

 Until 1999 the W3C has been working on HTTP Next Generation (HTTP-NG) as a replacement for HTTP/1.1  

never introduced recently HTTP/2.0 has been released - inspired by Goggle’s development of SPDY

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

14

Proxies

Internet Proxy Client

Server

 A web proxy is situated between the client and the server  

acts as a server to the client and as a client to the server can for example be specified in the browser settings; used for - firewalls and content filters

- transcoding (on the fly transformation of HTTP message body) - content router (e.g. select optimal server in content distribution networks) - anonymous browsing, ... October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

15

Caches Client 1

1 Internet 2

1 2

Client 2

Proxy Cache

Server

 A proxy cache is a special type of proxy server  

 October 6, 2017

can reduce server load if multiple clients share the same cache often multi-level hierarchies of caches (e.g. continent, country and regional level) with communication between sibling and parent caches as defined by the Internet Cache Protocol (ICP) passive or active (prefetching) caches Beat Signer - Department of Computer Science - [email protected]

16

Caches ...  Special HTTP cache control header fields 

Expires - expiration date after which the cached resource has to be refetched



Cache-Control: max-age - maximum age of a document (in seconds) after it has been added to the cache



Cache-Control: no-cache - response cannot be directly served from the cache (has to be revalidated first)



...

 Validators 

Last-modified time as validator - cache with resource that has been last modified at time t uses an If-Modified-Since t request for updates



Entity tags (ETag) - changed by the publisher if content has changed; If-None-Match etag request

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

17

Caches ...  Advantages   

reduces latency and used network bandwidth reduces server load (client and reverse proxy caches) transparent to client and server

 Disadvantages    

October 6, 2017

additional resources (hardware) required might get stale data out of the cache creates additional network traffic if we use an active caching approach (prefetching) but achieve a low cache hit rate server loses control (e.g. access statistics) since no longer all requests have to be sent to the server

Beat Signer - Department of Computer Science - [email protected]

18

Tunnels

Internet

SSL

SSL

HTTP

SSL Client

HTTP[SSL]

HTTP[SSL]

SSL Server

 Implement one protocol on top of another protocol 

e.g. HTTP as a carrier for SSL connections

 Often used to "open" a firewall to protocols that would otherwise be blocked 

October 6, 2017

e.g. tunneling of SSL connections through an open HTTP port Beat Signer - Department of Computer Science - [email protected]

19

Gateways

Internet

HTTP

FTP

HTTP Client

HTTP/FTP Gateway

FTP Server

 A gateway can act as a kind of "glue" between applications (client) and resources (server)   

October 6, 2017

translate between two protocols (e.g. from HTTP to FTP) security accelerator (e.g. HTTPS/HTTP on the server side) often the gateway and destination server are combined in a single application server (HTTP to server application translator) Beat Signer - Department of Computer Science - [email protected]

20

Session Management  HTTP is a stateless protocol  Session (state) tracking solutions 

use of IP address - problem: IP address is often not uniquely assigned to a single user



browser login - use of special HTTP authenticate headers - after a login the browser sends the user information in each request



URL rewriting - add information to the URL in each request



hidden form fields - similar to URL rewriting but information can also be in body (POST request)



cookies - the server stores a piece of information on the client which is then sent back to the server with each request

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

21

Cookies  Introduced by Netscape in June 1994  A cookie is a piece of information that is assigned to a client on their first visit   

list of pairs often just a unique identifier sent via Set-Cookie or Set-Cookie2 HTTP response headers

 Browser stores the information in a "cookie database" and sends it back every time the same server is accessed

 Potential privacy issues 

third-party websites might use persistent cookies for user tracking

 Cookies can be disabled in the browser settings October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

22

Hypertext Markup Language (HTML) Beat Signer: Interactive Paper, PaperWorks, Paper++, ... Beat Signer is Associate Professor of Computer Science at the VUB and co-director of the WISE laboratory ...

 Dominant markup language for webpages  If you never heard about HTML have a look at 

http://www.w3schools.com/html/

 More details in the exercise and in the next lecture October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

23

Dynamic Web Content  Often it is not enough to serve static web pages but content should be changed on the client or server side

 Server-side processing 

   

Common Gateway Interface (CGI) Java Servlets JavaServer Pages (JSP) PHP: Hypertext Preprocessor (PHP) ...

 Client-side processing  

  October 6, 2017

JavaScript Java Applets Adobe Flash ... Beat Signer - Department of Computer Science - [email protected]

24

Common Gateway Interface (CGI) Program in Perl, Tcl, C, C++, Java, ..

CGI HTTP Request

Internet HTTP Response

HTML Pages Client

Server

 CGI was the first server-side processing solution   

October 6, 2017

transparent to the user certain requests (e.g. /account.pl) are forwarded via CGI to a program by creating a new process program processes the request and creates an answer with optional HTTP response headers Beat Signer - Department of Computer Science - [email protected]

25

Common Gateway Interface (CGI) ...  CGI Problems  

a new process has to be started for each request if the CGI program for example acts as a gateway to a database, a new DB connection has to be established for each request which results in a very poor performance

 FastCGI solves some of the problems by introducing persistent processes and process pools

 CGI/FastCGI becomes more and more replaced by other technologies (e.g. Java Servlets)

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

26

Java Servlets

Servlet Container

HTTP Request

Servlets

Internet HTTP Response

HTML Pages Client

Server

 A Java servlet is a Java class that has to extend the abstract HTTPServlet class

 The Java servlet class is loaded by a servlet container and relevant requests (based on a servlet binding) are forwarded to the servlet instance for further processing October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

27

Java Servlets ...  Main HttpServlet methods doGet(HttpServletRequest req, HttpServletResponse resp) doPost(HttpServletRequest req, HttpServletResponse resp) init(ServletConfig config) destroy()

 Servlet life cycle   

a servlet is initialised once via the init() method the doGet(), doPost() methods may be executed multiple times (by different HTTP requests) finally the servlet container may unload a servlet (upcall of the destroy() method before that happens)

 Servlet container (e.g. Apache Tomcat) either integrated with web server or as standalone component October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

28

Java Servlet Example package org.vub.wise; import import import import

java.io.*; java.util.Date; javax.servlet.http.*; javax.servlet.*;

public class HelloWorldServlet extends HttpServlet { public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { PrintWriter out = res.getWriter(); out.println(""); out.println("Hello World"); out.println("The time is " + new Date().toString() + ""); out.println(""); out.close(); } }

 In the exercise you will learn how to process parameters etc. October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

29

JavaServer Pages (JSP)  A "drawback" of Java servlets is that the whole page (e.g. HTML) has to be defined within the servlet 

not easy to share tasks between web designer and programmer

 Add program code through scriptlets and markup to existing HTML pages

 These JSP documents are then either interpreted on the fly (Apache Tomcat) or compiled into Java servlets

 The JSP approach is similar to PHP or Active Server Pages (ASP)

 Note that Java Servlets become more and more an enabling technology (as with JSP) October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

30

JavaScript  Interpreted scripting language for client-side processing  JavaScript functionality often embedded in HTML documents but can also be provided in separate files

 JavaScript often used to   

  

validate data (e.g. in a form) dynamically add content to a webpage process events (onLoad, onFocus, etc.) change parts of the original HTML document create cookies ...

 Note: Java and JavaScript are completely different languages! October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

31

JavaScript Example document.write("Hello World!");

 More details about JavaScript in lecture 6 and in the exercise session

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

32

Java Applets  A Java applet is a program delivered to the client side in the form of Java bytecode  



executed in the browser using a Java Virtual Machine (JVM) an applet has to extend the Applet or JApplet class runs in the sandbox

 Advantages   

the user automatically always has the most recent version high security for untrusted applets full Java API available

 Disadvantages 

October 6, 2017

requires a browser Java plug-in

Beat Signer - Department of Computer Science - [email protected]

33

Java Applets ...  Disadvantages ... 

only signed applets can get more advanced functionality - e.g. network connections to other machines than the source machine

 More recently Java Web Start (JavaWS) is replacing Java Applets 

program no longer runs within the browser - less problematic security restrictions - less browser compatibility issues

 Math and Physics Applet Examples 

October 6, 2017

http://www.falstad.com/mathphysics.html

Beat Signer - Department of Computer Science - [email protected]

34

Exercise 2  Hands-on experience with the HTTP protocol

October 6, 2017

Beat Signer - Department of Computer Science - [email protected]

35

References  David Gourley et al., HTTP: The Definitive Guide, O'Reilly Media, September 2002

 R. Fielding et al., RFC2616 - Hypertext Transfer Protocol - HTTP/1.1 

http://www.faqs.org/rfcs/rfc2616.html

 N. Freed et al., RFC6838 - Media Type Specifications and Registration Procedures 

http://www.faqs.org/rfcs/rfc6838.html

 HTML and JavaScript Tutorials 

October 6, 2017

http://www.w3schools.com

Beat Signer - Department of Computer Science - [email protected]

36

References ...  M. Knutson, HTTP: The Hypertext Transfer Protocol (refcardz #172) 

http://refcardz.dzone.com/refcardz/http-hypertext-transfer-0

 W. Jason Gilmore, PHP 5.4 (refcardz #23) 

http://refcardz.dzone.com/refcardz/php-54-scalable

 Java Servlet Tutorial 

October 6, 2017

http://www.tutorialspoint.com/servlets/

Beat Signer - Department of Computer Science - [email protected]

37

Next Lecture HTML5 and the Open Web Platform

2 December 2005

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.