From Wikipedia, the free encyclopedia - View original article
Clean URLs, RESTful URLs, user-friendly URLs or SEO-friendly URLs are purely structural URLs that do not contain a query string [e.g., action=delete&id=91] and instead contain only the path of the resource (after the scheme [e.g., http] and the authority [e.g., example.org]). This is often done for aesthetic, usability, or search engine optimization (SEO) purposes. Other reasons for designing a clean URL structure for a website or web service include ensuring that individual web resources remain under the same URL for years, which makes the World Wide Web a more stable and useful system, and to make them memorable, logical, easy to type, human-centric, and long-lived.
Examples of "unclean" versus "clean" URLs follow:
|Unclean URL||Clean URL|
The most often cited reasons for using clean URLs is for search engine optimization, but clean URLs can also greatly improve usability and accessibility. Removing unnecessary parts simplifies URLs and makes them easier to type and remember.
The general format of an unclean URL involves a query string with implementation details, ids, illegible encodings, long names, etc.:
A clean URL should have all components legible, and in terms of the URI scheme have no query string, but only a hierarchical part, similar to a path with filename. The hierarchical components should reflect a logical structure, while the last component, called the slug, is analogous to the basename in a filename:
A fragment identifier can be included at the end, for references within a page, and should also be user-readable.
But there can also be different levels of cleanliness. Web developers usually recommend for usability and search engine optimization purposes to make URLs descriptive; so when planning the structure of clean URLs, webmasters often take this opportunity to include relevant keywords in the URL and remove irrelevant words from it. So common words like "the", "and", "an", "a", etc. are often stripped out to further trim down the URL while descriptive keywords are added to increase user-friendliness and improve search engine ranking. This includes replacing hard-to-remember numerical IDs with the name of the resource it refers to. Similarly, it is common practice to replace cryptic variable names and parameters with friendly names or to simply do away with them altogether. Shorter URLs that do not contain any abbreviations or complex syntax that is unknown to the average user are less intimidating and contribute to overall usability.
A slug is the part of a URL which identifies a page using human-readable keywords. It is usually the end part of the URL, which can be interpreted as the name of the resource, similar to the basename in a filename or the title of a page. The name is based on the use of the word slug in the news media to indicate a short name given to an article for internal use. For example, in
the slug is medical-patents. This can be generated automatically from a page title or specified manually.
If generated automatically, characters in the original title may be substituted to avoid percent-encoding due to restrictions on web URLs, and common words may be omitted to minimise the final length of the slug. It is common practice to make the slug all lowercase, accented characters are usually replaced by letters from the English alphabet, punctuation marks are generally removed, and long page titles may also be truncated to keep the final URL to a reasonable length. For example, "Nuts & Raisins in the News!" URL-encodes to Nuts%20%26%20Raisins%20in%20the%20News%21, but could have been simplified automatically to nuts-raisins-news. Automatically creating a slug from a title ("slugging") can be seen as a form of munging or wrangling. It often involves the use of regular expression substitutions.
Instead of automatic slugging, a slug can also be entered or altered manually so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines.
Another aspect of clean URLs is that they do not contain implementation details of the underlying web application. For example, many URLs include the filename of a server-side script, such as "example.php", "example.asp" or "cgi-bin". Such details are irrelevant to the user and do not serve to identify the content, and make it harder to change the implementation of the server at a later date. For example, if a script "example.php" is rewritten in Python, URLs that include the name of the script have to change, but clean URLs that leave out such cruft stay the same -- Cool URIs don't change. Typically clean URLs use rewrite rules to select which script (if any) to run, rather than putting the name of the script in the URL.