Craig Francis


Coding Standard

When developing websites, I try to follow these standards.

Basic overview of the HTML Standards

All web pages should be coded using XHTML (1.0 Strict) and should be capable of being parsed in browsers, like Firefox, which accept the XML mime type (application/xhtml+xml). This mime type should be used in development to enforce the XML syntax (no syntax errors) and improve forwards compatibility with future standards (e.g. XHTML2 or HTML5). However when the website goes live, it will use the 'text/html' mime type to avoid issues where pages do not display due to trivial XML errors.

When building the XHTML document, relevant mark-up will be used to form a semantic document. This will enforce the correct use of tables for tabular data (not page layout) and block-quotes for quotations (not for indentation).

CSS will be used to apply styles to the web pages. As all browsers have slightly different interpretations of the CSS spec (something which is improving), so we will test on Firefox, Safari, Opera and iCab to ensure these standards are being upheld. Additional CSS will be added to fix display problems in IE7, IE6, IE5.5 and IE5 on the Windows platform. IE5 on the Mac should still be tested, but changes will only be made to make the website 'work', not to match the design of the website - this is due to the small market share for this browser, which has now been discontinued.

User interaction with the website may be enhanced with unobtrusive JavaScript. This will work by extending a working, non-JavaScript version of the functionality. As the development will be done with the pages served under the 'application/xhtml+xml' mime type, we will be using DOM scripting (where possible) instead of legacy methods such as 'document.write'. Additionally all JavaScript should be developed in an appropriate development environment with strict error checking - currently the recommended environment is Firefox with the strict JavaScript warnings enabled.

The JavaScript and CSS that we develop should not appear within the document, instead it should be pulled in from external resources (files found in the /a/ directory) which can be cached by the browser. This is to ensure correct separation of the content, design and client side functionality. Although there are occasional exceptions to this rule... for example some JavaScript files are common though-out all of the websites, but need to be configured on each page. In these (rare) cases, it would be counter-intuitive to create a separate JavaScript file for each page. This also extends to some stats packages, whereby the code they provide must be added to the pages content.

The websites accessibility should be checked against WAI Priority 1+2, although trivial issues which fail Priority 3 should also be corrected. For example the identification of the primary natural language of a document, as this is relatively easy to uphold by setting the 'lang' attribute.

Each website should provide the following accessibility features:

Folder structure

Folder names should use camelNotation... as in, all letters are lower cased, including the acronyms. Non alphanumeric characters are removed. Then the first letter of each subsequent word is upper cased, and all words joined together. For example if a page was to be called 'About PHP setup', then its folder would be named 'aboutPhpSetup'.

Although the live server should be setup with mod_spelling to avoid issues with case sensitivity... the website should be constructed to run on the case-sensitive (demo) server.

All of the assets should be stored in the relevant folders:

All pages should be named either 'index.php' or 'index.html', and sit within a folder... no links within the website should refer to these index files directly, as it should be possible to change the files extention (e.g. from '.html' to '.php') without breaking those links.

For example, the file:

/register/index.php

Should be linked to with the URL:

/register/

Flash Replace

Any Flash movies should be removed.

Database conventions

The MySQL database will be used for all projects, and as such, it should be used to its full potential. Instead of wasting resources making generic SQL statements which can be used in any DBMS, or using a complicated or limiting database wrapper, all code should be written explicitly for MySQL. This allows the use of auto_increment fields, the INSERT SELECT statement and the LIMIT clause.

Dates should be stored in a DATETIME, not as a UNIX timestamp in an INT field... although this may require more processing power (strtotime), it allows quick referencing when using tools like phpMyAdmin to answer quick data questions, instead of having to construct long queries or using a conversion tool.

Following is a list of naming conventions, mostly written by Justin Watt.

Variable handling

All variables within PHP should use camelNotation - lower case, but new words starting with an uppercase letter... abbreviations like RADAR and PHP should be treated as single words and similarly lowercased.

All variables should be initially defined... although scripts should be running on servers with auto_globals disabled, they will still run with error_reporting set to E_ALL, where all logs are analysed for potential errors.

Strings should use single quotes to avoid the (trivial) processing for variable substitution, but mainly helps with reducing the number of characters which need to be escaped.

All variables which are printed out to the browser should go though the html() function to avoid XSS security vulnerabilities... this is a simple helper function which simply sets the optional 2nd and 3rd parameters in the native htmlentities() function.

All variables which are used in SQL should be passed though the $db->escape() function found in the database class... this is just a wrapper for mysql_real_escape_string().

If a variable will be used in a URL, then it must go though the urlencode() function.

Variable naming

Variables use a version of Hungarian notation (apps)... whereby the name shows the type of variable, not in terms of its data type (int, string, float etc)... but instead the type of string - plain text, HTML, SQL, URL, etc.

For example, when taking user input from a URL (a GET request), you cannot simply print that out onto the page, as it could contain un-safe HTML code, which potentially opens an XSS security hole... so all GET variables printed to the browser need to have the HTML entities converted to protect the user... if this needs to be stored in a variable, then that variable is prefixed with 'html'.

The same is true with variables which will be added to an SQL statement... these get the prefix of 'sql', and so can be treated as 'safe'... this is to prevent SQL injection attacks.

Variables used in URL's, where characters like the ampersand (&), need to be converted... these are prefixed with 'sql'

The same for values being passed into a regular expression.

So for a quick and dirty example, which does not check for undefined variables:

<?php

$name = $_GET['name'];

$urlProfile = '/profile/?name=' . urlencode($name);

$htmlLink = '<a href="' . html($urlProfile) . '">Profile</a>';

echo '<p>' . $htmlLink . '</p>';

?>

Any feedback would be greatly appreciated, I don't include comments due to the admin time required, but if you email me, I will reply and make appropriate updates. Also, if you would like to take a copy of this article, please read the terms this article is released under. This article was originally written Sunday 5th August 2007.