What does a URL consist of?
We use URL addresses all the time as we click on links and buttons in our messengers, emails, and on websites. Some frequently used social media and search engine URLs, the likes of vk.com or ya.ru, we even know by heart. We can type them into the address bar in our sleep.
The URL is the web address composition system invented by CERN fellow Tim Berners-Lee in the 1990s. The very first website to get an URL was http://info.cern.ch. This URL now leads to a memorial website celebrating the birth of the World Wide Web.
The web was then a minuscule fraction of what it is now, three decades on, and URL addresses are now everywhere. Below, we break down the structure of an URL, which has mandatory and optional parts. Knowing about them would benefit both beginner level developers and regular web users.
Mandatory parts of a URL address
The URL has a rigid structure, some parts of which are compulsory, while others optional. Some mandatory parts may be skipped, but then the default values will kick in. The first part of the URL is the scheme, the codeword for the protocol the browser must use. Where the scheme has been skipped, the http scheme will be used by default.
The scheme indicates the protocol - a set of rules whereby machines transfer and exchange data around the network. Where the HTTP protocol is used, the browser will send out a request with the appropriate URL, and the website server will respond with page hypertext. The browser will visualize the received code into what we see onscreen. HTTP is not the only protocol used. FTP is another common file transfer protocol. HTTPS is a secure extension of the HTTP that supports message encryption.
The scheme is followed by the host, which is the domain name or IP address. The domain name is actually a link to the IP address of a specific server, except that it is written in a way that's easier to memorize. For example, one can get to the google.com website using the link http://172.217.22.14, but the host name google.com will stick in the memory more readily than a sequence of four numerals ever could. Numeric IP addresses, although perfectly legit, are typically used for technical purposes. One way or another, the domain name consists of groups of characters with dots in between. For example, in the URL of our blog page https://ispmanager.com/news, ispmanager.com corresponds to domain name.
The port indicates the standard number of a certain process running on the computer. This also applies to the systems in charge of the protocol. For example, port 80 is the default reserved for the HTTP scheme, and port 443, for HTTPS. One standard protocol and one port are usually reserved for every scheme, so the port will rarely, if ever, be indicated on the address. The point to keep in mind is that values used in web development may differ from the default values. When the port is specified, its value follows the domain name, separated by a colon. For instance, if we were to present a more detailed version of the URL from the earlier example, it would look like this: https://ispmanager.ru:443/news.
What other parts may a URL have?
It is sufficient to know the domain name to gain access to the website. Since every page has to have a unique URL, the absolute path is followed, after a slash, by relative paths to other pages. They form the semblance of a tree, with the host representing the common trunk, and the relative paths, the boughs extending to every page or "leaf".
A relative path is a path relative to the absolute domain name address at which the target web resource page is located. In our URL example https://www.ispmanager.com/news, the relative path corresponds to the string /news. And if we try to reach a page inside the article page, we will be extending this relative path even further. Let's say we target the SSL Certificates article on the news page: https://www.ispmanager.com/news/ssl-for-ip-address. Now the relative path string is this: /news/ssl-for-ip-address. This path corresponds to a definite file structure on the server.
An anchor is a link that "bookmarks" a specific spot within a page. An anchor starts with the pound sign: #. Anchors are used, for example, in tables of content to make it easier to move between sections on a page. Let's say a Python lead developer assigns a beginner to reread the language documentation by sending this link: https://docs.python.org/3/tutorial/datastructures.html#dictionaries. Once you click through on the link, the browser will fast forward the page to the target spot, which in the page's HTML code is marked by the appropriate ID, equaling the "dictionaries" string.
Here is another crucial case: what does the web developer do when there is no link where a link had been designed. The solution is to put a temporary placeholder where the link should have been: a solitary # character. It is a valid link. Although it does not lead anywhere, at least the page won't reload when it is clicked.
Parameters. Where the website has a public database, the URL may specify the parameters to filter search results. Let's say a user is shopping a website for shoes, and the relative path involved contains the following added string: ?search=shoes&fbrand=1&fsize=27. With this URL, the client gets to save a set of filters and transfer the search results.
Structurally, UTM marks resemble search filters, yet they in no way affect page visualization. These are the URL parameters marketers use to keep track of ad campaigns. For example, a marketer may use the tag ?utm_medium=social&utm_source=facebook.com to track how many users clicked through to website using a social media link.
Acceptable URL characters
As illustrated by the examples above, an URL may contain a great variety of characters, but not all existing characters can be used. In most cases, the URL will consist of characters used "as is":
- letters of the Latin alphabet from A to Z, numerals from 0 to 9, and the characters "-", ".", "_", "~"
- or reserved characters: ":", "/", "?", "#", "[", "]", "@", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "="
But sometimes you have to use some characters with the URL that are not acceptable URL characters. In this case, the solution is to employ percent encoding.
Percent encoding. The unacceptable characters get recoded using the UTF-8 encoding system and percent characters. The same procedure is applied to render out the reserved characters. The link pointing to an article about C++, when sent, is apt to look something like this: https://en.wikipedia.org/wiki/C%2B%2B
All + characters have been recoded here. A modern browser, however, will get it, and if the text is presented in a more familiar form, as https://en.wikipedia.org/wiki/C++, the recoding will happen automatically.
Things to keep in mind
The URL is a unique address that directs how one machine is to connect with another, which port is to be used for data transfer, and how to find the data. But URLs do not serve the machines alone. The URL makes it easier for users to navigate the web, to link a specific spot of a website or major page, or to capture a page visualization with a preset array of filters.
In your comments, tell us if there's anything else you would like to know about the URL or domains, and we'll write about it. Subscribe to stay posted on our new articles.