A Quick Dirty Primer on HTTP

If you're a web developer you need to know about the HTTP protocol.

After all, you'll be dealing with it constantly every day.

But what is it exactly and how does it work? Here's a quick intro on the topic that's enough to get you up and running with a better understanding.

HTTP In Simple Terms

Put really simply, HTTP is a communication protocol. It's a common language that computers use to communicate with each other over the internet.

Humans communicate with each other using written or spoken language. Browsers and web servers communicate using HTTP.

HTTP is a request-response, client-server model. A client (such as a web browser) makes a request to a server (such as an Apache server running on a dedicated machine somewhere), which computes a response and sends it back to the client.

What a HTTP Request Looks Like

Here's an example HTTP request to google.com:

GET /search HTTP/2
Host: google.com
Cache-Control: no-cache

The HTTP request consists of these parts:

  • The HTTP method - in this case, GET. Different types of interactions between clients and servers are handled using different HTTP methods. GET is typically used for fetching resources, such as pages, images, JS, etc. POST is typically used for submitting form data to a server. Other methods exist, but GET and POST are by far the most common.
  • The path name - in this case, /search.
  • The HTTP version - in our request, it's version 2. Different versions of HTTP exist, but the two you'll see most are 1.1 and 2. HTTP 1.1 has been around for a long time, and pretty much every web server now accepts requests from clients that support it. A new version of HTTP, HTTP 2, is also being gradually rolled out and is now quite widely adopted.
  • HTTP headers - these are key-value pairs, in the format key: value separated by new lines. The Host HTTP header is almost always specified and contains the hostname of the site you're talking to (in our case, google.com). Other headers may be specified - in our example, there's a Cache-Control header, which is used to tell servers how they should cache responses for a given request.
  • Request body (optional) - I didn't specify a request body here since this is a GET request. You can include a request body with a GET request if you want, but semantically it's meaningless and most servers will ignore it. But with other HTTP methods, such as POST, we often include a request body. You'll see an example of a HTTP POST request with a request body later on.

What a HTTP response looks like

Here's a sample HTTP response from google.com:

HTTP/2 200
date: Fri, 17 Aug 2018 07:37:56 GMT
expires: -1
cache-control: private, max-age=0
content-type: text/html; charset=ISO-8859-1
server: gws
x-xss-protection: 1; mode=block
x-frame-options: SAMEORIGIN
set-cookie: 1P_JAR=2018-08-17-07; expires=Sun, 16-Sep-2018 07:37:56 GMT; path=/; domain=.google.com
set-cookie: NID=136=ndoBDBOltK2AgrQA-Ous7ZUpRjJj7ilEqOQPv7TyXERHZcPFrvHxiTSe-EpDdprkKnkDwBshcrCYxpL1QnebKfZhURr0QJRsLi9Sy2rKJlC7qCg9dO-fPKPmEZtCKfgc; expires=Sat, 16-Feb-2019 07:37:56 GMT; path=/; domain=.google.com; HttpOnly
accept-ranges: none
vary: Accept-Encoding

<!doctype html><html itemscope=""...

There's a lot of stuff here but I'll break it down for you:

  • The HTTP version - in this case, HTTP/2. I made the request using HTTP/2 and this is basically the server responding to my request saying that it does indeed support HTTP/2.
  • The status code - for this request, 200. HTTP response codes are used to indicate various things, such as whether the request was successful or not. More on this later.
  • HTTP headers - Like with HTTP requests, HTTP responses also include HTTP headers. There are a lot of headers included in this response, and it's common in general for there to be far more response headers than request headers. Some headers relate to the response body (see below), such as content-type which tells us if the response is a HTML document, image, or whatever. Others have different meanings/purposes. Don't worry if you don't know about all these different headers yet.
  • The response body - this appears after the HTTP headers. In this case, it's the HTML code for the page I requested (I've trimmed it to include only the first few bits of HTML).

How to see HTTP requests in your browser

If you visit a website and open up Chrome devtools (or FireFox dev tools if you use FireFox), you should see a 'network' tab.

Click on the network tab and refresh the page. You'll see a log build up of all the HTTP requests for the page.

Click on any request and another set of tabs will appear. You can use them to see request and response headers, the response body, and other stuff.

HTTP Status codes

The status code for the example response I showed was 200, which signifies the request was successful.

But there are many different status codes that mean different things. Technically, a web server can output any status code for any reason if it really wanted. But most servers set response status codes in line with the following conventions:

  • 1xx codes: Codes in the 100 range indicate that a request was recieved by a server and is being processed. You typically won't deal with these codes yourself.
  • 2xx codes: Codes in the 200 range indicate a successful request and response.
  • 3xx codes: Redirection. The request was made OK, and the server's response was to redirect you to another URL.
  • 4xx codes: The client made an erroneous request. For example, the famous 404 code indicates that a resource was not found. 401 means the client is not authorized to request a particular resource.
  • 5xx codes: An error occurred on the server. An exmaple is the 500 'Internal Server Error' code which you've probably seen before.

If you'd like to learn more about status codes, the MDN documentation has a good page about them: HTTP response status codes

What is HTTPS?

By default the HTTP procotol is insecure. Requests and responses are not encrypted but rather are sent in plain text, meaning attackers can intercept requests and responses and read their contents.

For example, an attacker could install malicious software on your router that spies on your HTTP requests.

So if you entered your credit card information into a form and sent it with an insecure HTTP request, someone could read the request and steal your card info.

HTTPS is the solution to this problem. With HTTPS, all requests and responses are encrypted using a security protocol called Transport Layer Security (TLS).

You may have heard the term SSL (Secure Sockets Layer). This is basically the predecessor of TLS. SSL is still in use, but it is considered inferior to TLS and you shouldn't use a web server that doesn't support TLS.

SSL Certificates

You've probably heared of SSL certificates. Note that many people still call them SSL certificates even though SSL is deprecated in favour of TLS, so they should be called TLS certificates. Anyway, certificates are a method of proving that the site you are connecting to really is who they say they are. So if you're connecting to google.com, you can cryptographically verify that it really is Google you are making a request to.

If you'd like to learn about certificates - why they're needed, how they work - here is a good introduction: A Primer on SSL Certificates

What about HTTP/2?

HTTP/2 is a newer version of the HTTP protocol that promises to be faster than HTTP/1.1, with more efficient utilization of network resources. Most browsers fully support it, though the majority of websites don't yet (but adoption is rising quickly).

The request and response format is the same, in terms of HTTP methods, response codes, headers and so on. But there are many changes to the way requests and responses are downloaded over the network, which are worth knowing about.

I found a good overview of the main features of HTTP/2 here if you are interested.

Curl

Curl is a powerful tool that is used to transfer data over a network. It supports many different communication protocols, but it's most commonly used to make and recieve HTTP requests in your terminal.

It's a good idea to become familiar with curl. Being able to make HTTP requests in your terminal is very convenient, such as if you are debugging something, or if making the request in a browser would be difficult.

It's also a great way to learn about HTTP since you can use curl to hand craft HTTP requests, and view the responses you get back.

I've provided some sample curl commands below for you to study.

GET examples

For these examples I'll be making requests to this JSON placeholder API. It's basically a free 'fake' api that provides canned responses to various requests.

I'd encourage you to try the commands yourself and play around with them.

Example GET request:

curl https://jsonplaceholder.typicode.com/posts/1

Response:

{
  "userId": 1,
  "id": 1,
  "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
  "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}

All I had to do was write curl followed by the request url. I didn't even have to specify it was a GET request, since GET is the default HTTP method curl uses.

You might have noticed that curl only returned the response body for the request. If you want to include the full response including headers, status code, etc then you must specify the -i option.

curl -i https://jsonplaceholder.typicode.com/posts/1

Response:

HTTP/1.1 200 OK
Date: Fri, 24 Aug 2018 07:46:31 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 292
Connection: keep-alive
Set-Cookie: __cfduid=d518185d060ca5519395b07ed0a7e23dd1535096791; expires=Sat, 24-Aug-19 07:46:31 GMT; path=/; domain=.typicode.com; HttpOnly
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: public, max-age=14400
Pragma: no-cache
Expires: Fri, 24 Aug 2018 11:46:31 GMT
X-Content-Type-Options: nosniff
Etag: W/"124-yiKdLzqO5gfBrJFrcdJ8Yq0LGnU"
Via: 1.1 vegur
CF-Cache-Status: HIT
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Server: cloudflare
CF-RAY: 44f434a03c9d6b97-LHR

{
  "userId": 1,
  "id": 1,
  "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
  "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}

If you don't want to see the response body, you can specify the -I option. This makes a HTTP HEAD request - the same as a GET request, except the server omits the response body.

curl -I https://jsonplaceholder.typicode.com/posts/1

Response:

HTTP/1.1 200 OK
Date: Fri, 24 Aug 2018 07:47:54 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 292
Connection: keep-alive
Set-Cookie: __cfduid=d46fe546b82dafdd3afb398c8afe555b41535096874; expires=Sat, 24-Aug-19 07:47:54 GMT; path=/; domain=.typicode.com; HttpOnly
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: public, max-age=14400
Pragma: no-cache
Expires: Fri, 24 Aug 2018 11:47:54 GMT
X-Content-Type-Options: nosniff
Etag: W/"124-yiKdLzqO5gfBrJFrcdJ8Yq0LGnU"
Via: 1.1 vegur
CF-Cache-Status: HIT
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Server: cloudflare
CF-RAY: 44f436aa498fbba2-LHR

POST examples

Here's an example of a POST request. Here I am posting some JSON data to an endpoint.

curl -X POST -H "Content-type: application/json; charset=UTF-8" https://jsonplaceholder.typicode.com/posts -d '{"title":"foo","body":"bar","userId":1}'

Response:

{
  "title": "foo",
  "body": "bar",
  "userId": 1,
  "id": 101
}

The -X option is for specifying a HTTP method. The -H option is for specifying a HTTP request header (you can specify any number of -H options to set multiple request headers).

The -d option is what you use to specify the request body. In this case, it's the JSON data I'm sending to the endpoint.

The reason I set the Content-Type header to application/json is because this is a best practise to ensure the server knows I am sending some JSON data. Many web servers won't process my request correctly unless I do this.

Further reading

For more stuff on curl, here's a nice gist detailling more options and examples.

Conclusion

Hopefully by now you should feel much more comfortable with the HTTP protocol. If you have any questions feel free to give me a shout on twitter.