Internet Guide Logo


bullet Introduction

The Hypertext Markup Language (HTML) is a simple/standard markup language that was designed to create web documents (known as webpages that are located at websites). The traditional use of a markup language is as a document language; widely used by book publishing companies to automate the process of editing and publishing their books. Markup languages use 'elements' to distinguish two things: 1) Annotation 2) Text.

Sir Tim Berners-Lee is generally credited as being the inventor of HTML - as well as being the inventor of the World Wide Web - and the HTML standard is currently maintained by the World Wide Web Consortium (W3C) and is evolved by the Web Hypertext Application Technology Working Group (WHATWG). When the World Wide Web was launched, HTML was the only markup language used to design webpages. In 1994, a style sheet language named Cascading Style Sheets (CSS) was proposed to Tim Berners-Lee by Hakon Wium Lie; CSS describes the visual presentation of markup languages and has been in use since 1995.

The first official standard for HTML was published in 1995: it was HTML version 2.0 and roughly corresponded with the capabilities of HTML that were in use prior to June 1994. RFC 1866 (the document specification for HTML 2.0) stated that HTML 2.0 documents are SGML documents. The latest version of HTML is HTML5 and this version abandoned HTML's link with SGML, and began to use XML semantics/syntax. HTML has been standardised by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) in document ISO/IEC 15445.

bullet History

Sir Tim Berners-Lee is generally credited as being the inventor of HTML, but he was aided in its development by a number of computer scientists: most notable Dan Connolly and Robert Cailliau. One of the first times that HTML was mentioned was in a HyperText.m document that was written on the 25th September 1990, and is still available at HTML was created during the later stages of the CERN WWW Project, when Tim Berners-Lee and Robert Cailliau visited the European HyperText Convention (ECHT90 conference program) in November 1990, probably to seek ideas from the likes of Prof. Peter Brown (University of Kent, UK) who made a keynote address at the convention.

The original specification for HTML (RFC 1886) states that HTML documents are Standard Generalized Markup Language (SGML) documents. HTML includes a range of elements found within SGML, and it is viewed as an SGML-based language/application. SGML is a markup language that evolved from IBM's Generalized Markup Language (GML).

The earliest HTML documents were uploaded between November-December 1990, and are still available at the website. These documents are basic and only contain a title tag element, header tag element and a hyperlink. The full code of the earliest webpage (13th of November 1990) is as follows: <title>Hypertext Links</title><h1>Links and Anchors</h1>A link is the connection between one piece of <a href=WhatIs.html>hypertext</a> and another.

Only by the end of 1991, when Dan Connolly joined the CERN WWW Project, did HTML start to 'take shape', with more elements/features being added, and overall stability. That said, HTML still suffered from a lack of standardisation, with browsers using a range of custom tags and parsing methods. Tim Berners-Lee did release a document titled "HTML Tags" that provided a description of HTML in 1991, but it was only in 1995 that a standard specification was written by Tim Berners-Lee and Dan Connolly.

Since 1995, HTML has evolved to include more elements and attributes, and has been evolved by a number of organisations and working groups:

  1. Web Hypertext Application Technology Working Group (WHATWG)
  2. World Wide Web Consortium
  3. Internet Engineering Task Force

As of 2015, HTML has been released in the following draft and standardised versions.

  1. HTML Tags: released in October, 1991. (draft version)
  2. HTML DTD: released in June, 1992. (draft version)
  3. HTML DTD 1.1: released in November, 1992. (draft version)
  4. HTML 1.0: released in June, 1993. (draft version)
  5. HTML 2.0: released on the 24th of November, 1995.
  6. HTML 3.0: released in April, 1995.
  7. HTML 3.2: released in January, 1997. (draft version)
  8. HTML 4.0: released in December, 1997.
  9. HTML 4.01: released in December, 1999.
  10. HTML5: released in October, 2014

The last SGML inspired version of HTML was version 4.01. The World Wide Web Consortium (W3C) has not only developed HTML, it has also developed the following markup languages: XML and XHTML. XHTML was developed after the publication of a W3C Working Draft titled 'Reformulating HTML in XML' (1998). XHTML 1.0 was released in 2000. By 2010, HTML and XHTML were commonly used on the World Wide Web. HTML5 was developed as an attempt to create a single markup language that encompasses and includes both HTML and XHTML. When HTML5 was released, it became the first version of HTML to become an application of XML - instead of SGML - and aims to increase the interoperability of future web content.

bullet Structure of HTML documents

HTML documents are written using tags, tags include the name of a HTML element: the element is generated when the document is parsed by a web browser. HTML parsers are a software component, within a browser, that interpret the element name within tags to build a data structure that is rendered into a visual display. Typically, a HTML tag will have a beginning and an end <tag></tag>: tags are written within angle brackets, and content is included between the start and the end of a tag. As stated, tags contain an element name: elements are the 'building blocks' of a webpage: allowing objects and images to be embedded into a webpage to create a structured document.

Elements can be categorised into two broad kinds: void elements and normal element. Void elements only have a start tag, and therefore can not contain any content. A prime example of a void element is the image tag (which loads an image into a webpage): Void Element: <img src="picture.gif">. Normal elements have a beginning <tag> and an end </tag> tag . Normal elements allow content and additional tags to be inserted inbetween the start and the end of the tag. A prime example of a normal element is a paragraph tag: Normal Element: <p>Text inserted in-between a normal element</p>.

Void and normal elements can be classified as being:

  1. Structural element: either <html>, <head> or <body> tag
  2. Header element: an example would be a <meta> tag
  3. Body element: an example would be a <p> tag

The first tag written in a HTML document is the structural tag: <html>. Before the structure tag is written in a HTML document: the document can include a document type declaration. The purpose of a document type declaration is to declare to the client reading the document (browser) what version of HTML the document is written in: so that the browser can render the document in the required standard's mode. HTML tags can include an attribute: an attribute will modify the element when it is parsed by a web browser. Consider the alt attribute: which modifies an image element by combining text with the image. When a user hovers a cursor on-top of the image: the browser will display text - which should describe what the image represents. Without the alt attribute: a browser would only display the image.

Shown below, is the basic structure of an HTML document:

Basic html code and document for showing the essential element and tags

Description of the HTML tags (in the document above):

<html> - Everything within this tag constitutes the document.
<head> - Defines and describes the information in the document.
<title> - The title of the document, describes its content.
<body> - The content of the document.
<h1> - Header tag, which is used at the top of important paragraphs
<p> - Paragraph, the same purpose for grammar as with a written document.

List of additional (popular) HTML tags:

<applet> - An applet is a program which can be embedded into an HTML document.
<frame> - Can divide a document in sections, which are called frames.
<header> - Defines a section within a document, called a header.
<meta> - Defines metadata about the document; refresh tags are meta tags.
<noscript> - Enables content for scripts that can't be run by a user agent (browser).

bullet Creating HTML documents

Once the structure and elements of HTML are understood, it is simple to create a static HTML document: all a developer requires is a text editor - like Notepad in Windows - then the code can be written and saved with a (.html, .htm) extension. When designing a template for a website, Cascading Style Sheets (CSS) can be used to define the appearance of a webpage. The World Wide Web Consortium (W3C), which develops and maintains HTML, recommends the use of CSS.

While webmasters can write HTML code by hand, there is an even simpler way to create HTML documents: which is by using an automated HTML editor like Macromedia Dreamweaver. HTML editor's automatically generate HTML code, and create a webpage in a graphical interface. Therefore, a webpage can be created with little to no knowledge of HTML. Likewise, there are numerous software applications that can generate webpage images, such as: banners, animated gif, and image maps. Interactive content can also be embedded into HTML documents, and are typically created by using: Flash or Java.

While it is easy to create a static HTML document, it is rather more complex to create dynamic HTML documents. Dynamic HTML content is generated 'on the fly'; typically using session ID's. Creating dynamic web content is complex, and typically requires the use of: CGI, Javascript, Perl or PHP. Dynamic content typically uses a database, therefore, websites that generate dynamic content are often described as: database driven. CSV (known as a CSV feed) and MySQL are two technologies that are widely used to create and maintain database driven websites. Server side commands are used to administer dynamic web content: so that content from databases can be easily inserted into web documents.

While webmasters can create a database driven website from scratch, they typically purchase 'off the shelf' software applications. For example, most webmasters do not create their own bulletin board software: instead they download and use a professionally developed application. There are numerous 'content applications' that help webmasters create and maintain web content: the most obvious of which is the Wiki system (designed using CGI); which has resulted in websites like WikiLeaks being developed.

The homepage of a website has a default file name, which is usually: index (also known as the entry page). Before content is uploaded to a website, the index page is sometimes uploaded with a 'under construction' holding page. During the construction of a website, a htaccess file can be uploaded to its web server to redirect visitors.

Some factors to consider when building a webpage:

  1. Make sure the webpage is compatible with a wide range of browsers.
  2. Avoid using annoying images and text that either flashes or flickers.
  3. Make sure your pages are small (below 40K); they load well in mobile browsers.
  4. Avoid using too many images on your pages; for the above reason
  5. Search Engines love content (text) websites.
  6. Banner exchanges and tacky promotion gimmicks leave a poor impression.