circuit

Parsing post data 3 different ways in Node.js without third-party libraries

application/json, application/x-www-form-urlencoded, and multipart/form-data




Alright, for this one I'll be honest, it was not easy, because everywhere I looked there were solutions with third-party libraries or just partial theory information, and as you might already know, I'm learning Node.js and web technologies from scratch to know what's going on under the hood.

The full source code is available at the end of this post, so if you know what you're doing you can dive straight to it, if you're also learning enjoy the read.

This project takes advantage of the learnings I shared in my previous posts, so I won't be covering those topics here:

1. application/x-www-form-urlencoded

The default enctyp of html forms is application/x-www-form-urlencoded, it sends data formatted in the same way you sometimes see on URLs when visiting websites, example: name1=value2&name2-value2, the encoding type changes to multipart/form-data when posting files, we will show how to parse these kinds of requests later in the story.

The below HTML snippet creates a form that encodes requests as application/x-www-form-urlencoded,

<form method="post">
  <input id="username1" type="text" name="username">
  <input id="password1" type="password" name="password">
  <input type="submit">
</form>

The form above sends data using the POST method, you can also send this kind of data as GET request, but I rather parse data in the body of the request than the URL.

Within the form tags there are 3 different input types, text, password and submit.

  • text: use to capture text.

  • password: same as text but visibly hides the text and puts • for each character typed.

  • submit: creates the button to trigger the post request.

The id in the inputs are useful for styling with CSS or to find the fields easily when accessing them through JavaScript.

The name is really important, if you don't include the name, it won't send the data to the server, the name is used to identity the incoming data from the server side.

Because this form doesn't include the action parameter in the first <form> tag, it sends the request to the same URL loaded in the browser.

Now how do we read and parse this kind of data from Node.js?

To read the data I added callbacks to the data and end emit events as follows:

const querystring = require('querystring')

let rawData = ''
request.on('data', chunk => {
  rawData += chunk
})

request.on('end', () => {
  let parsedData = querystring.decode(rawData)
  ...
}

Before registering the callbacks I declare and initialize a string named rawData to append all the incoming data in order, the data event will be called every time data is available for the server to read, the end event will be called when there is no more incoming data. I use the end event to parse the complete captured data and to send the response to the server.

Parsing application/x-www-form-urlencoded is a one liner! Node.js provides a built-in library for this type of parsing named querystring.

2. application/json

Parsing JSON POST requests is also a one liner, the tedious part comes from the HTML point of view, because we have to hack the form to prevent it's default behavior and manually read data from the form, manually create the JSON and create the request.

<form action="javascript:" onsubmit="onFormSubmit(this)">
  <input id="username2" type="text">
  <input id="password2" type="password">
  <input type="submit">
</form>

In the above HTML form, we're using action to tell the form that it will be triggering a javascript method instead of the default behavior, and we're using onsubmit to specify which JavaScript method to call when the user presses the submit button, we're sending this as parameter to the function so that we're able to read the information from the form.

To make it simple, I have created the onFormSubmit(form) method in the same HTML site as follows:

<head>
...
  <script>
    'use strict'
    function onFormSubmit(form) {
      const username = form["username2"].value
      const password = form["password2"].value
      let body = JSON.stringify({
        username: username,
        password: password
      });
      (async () => {
        try {
          const response = await fetch('/', {
            headers: {
              'content-type': 'application/json'
            },
            method: 'POST',
            body: body
          })
          const text = await response.text()
          if (response.status !== 200) {
            if (text && text.length > 0) {
              console.error(text)
            } else {
              console.error('There was an error')
            }
            return
          }
          document.body.innerHTML = text
        } catch (e) {
          console.error(e.message)
        }
      })()
    }
  </script>
</head>

First we read the values from username and password by using the given ids in the form.

We then create the JSON string response using JSON.stringify().

I then create a POST fetch request specifying that the data is application/json and the JSON data as body.

The next few lines reads the server response and with the document.body.innerHTML = text line we are replacing the current content of the site with the data returned to the browser to have a similar behavior as application/x-www-form-urlencoded above.

On the server side I read the data using the same data and end emit methods and then parse the data with JSON.parse(rawData).

3. multipart/form-data

This one was the trickiest one of all, this is not close to being a one liner; if it's so complicated why bother? The simple answer is transferring files with form data in a single request.

There are different approached how to parse this data, but for the purpose of this example I'm loading all data to memory, then storing the given file to the server and then of course as the previous examples also returning the request data as response data, I remove the binaries from the response because I'm showing the data on the website and loading binary takes a long time. The HTML form is simpler than application/json, it's basically the same as application/x-www-form-urlencoded but with different encoding (enctype)

<form method="post" enctype="multipart/form-data">
  <input id="username3" type="text" name="username">
  <input id="password3" type="password" name="password">
  <input id="picture3" type="file" name="picture">
  <input type="submit">
</form>

What's different from x-www-form-urlencoded is that we not explicitly assign enctype to multipart/form-data and add a file input type named picture to the mix.

I use the same data and end emit event callbacks as the previous methods, but this time, before we register the callbacks we change the request encoding to latin1 so that it reads the binary data correctly.

if (request.headers['content-type'] === 'multipart/form-data') {
  // Use latin1 encoding to parse binary files correctly
  request.setEncoding('latin1')
}

The data sent by multipart/form-data is separated in parts divided by a boundary specified in the content-type header sent by the browser, an example below:

------WebKitFormBoundaryxJi9AgGdxx83BunR
Content-Disposition: form-data; name="username"

sal
------WebKitFormBoundaryxJi9AgGdxx83BunR
Content-Disposition: form-data; name="password"

pass
------WebKitFormBoundaryxJi9AgGdxx83BunR
Content-Disposition: form-data; name="picture"; filename="Chromium_11_Logo.svg"
Content-Type: image/svg+xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape ([http://www.inkscape.org/](http://www.inkscape.org/)) -->

<svg>
...
</svg>

------WebKitFormBoundaryxJi9AgGdxx83BunR--

The boundary in the above example is ------WebKitFormBoundaryxJi9AgGdxx83BunR, I created a helper function to extract the boundary from the request header as follows:

function getBoundary(request) {
  let contentType = request.headers['content-type']
  const contentTypeArray = contentType.split(';').map(item => item.trim())
  const boundaryPrefix = 'boundary='
  let boundary = contentTypeArray.find(item => item.startsWith(boundaryPrefix))
  if (!boundary) return null
  boundary = boundary.slice(boundaryPrefix.length)
  if (boundary) boundary = boundary.trim()
  return boundary
}

In the above method, I read the information from the content-type header, then I split it by ;, then I trim whitespaces from each entry in the content-type, then I get the boundary by running the find() method on the array by checking the boundary= as prefix, I then return the string after this prefix.

Now that we know how to get the boundary, we're going to use it to split the request post body as follows:

const boundary = getBoundary(request)
let result = {}
const rawDataArray = rawData.split(boundary)
for (let item of rawDataArray) {
  ...
}

Within the for loop the first thing I check is for the name:

// Use non-matching groups to exclude part of the result
let name = getMatching(item, /(?:name=")(.+?)(?:")/)
if (!name || !(name = name.trim())) continue

I use non-matching groups regex search to exclude part of the matching string so that it returns just the data I'm looking to get, which is just the name matched with (.+?), if no name was found continue to the next item.

Next up I get the value for the given named input with a kinda complicated regex search, also using non-matching groups:

let value = getMatching(item, /(?:\r\n\r\n)([\S\s]*)(?:\r\n--$)/)
if (!value) continue

The above matches all characters including spaces, newlines and tabs, the value is matched with ([\S\s]*), and if no value was found we continue to the next item in the array.

Next up is to check if this entry is a file, and we find that out if we find a filename as follows:

let filename = getMatching(item, /(?:filename=")(.*?)(?:")/)
if (filename && (filename = filename.trim())) {
...
}

Filename can be empty, If the user didn't select any file and presses the submit button, that's why I match the filename using (.*?) to match 0 or more characters.

If the filename if valid I also search for content-type as follows:

let contentType = getMatching(item, /(?:Content-Type:)(.*?)(?:\r\n)/)

the above regex string matches the content type with (.*?).

Because there can be several files in a post request in the example I store all the files in a resulting array and to store the files in the server I do the following:

for (let file of data.files) {
  const stream = fs.createWriteStream(file.filename)
  stream.write(file.picture, 'binary')
  stream.close()
  file.picture = 'bin'
}

notice that when writing the file I set the encoding to binary this is requires to store the file correctly.

This is now all there is to know how to encode all combinations of parameters and form configurations, but this example should give you a very good start to create your own complete parser, or just to know what's going on behind the hood on many libraries.

The complete source code you should you the implementation of getMatching() used in the examples above to find matching strings with regex, you will also find some conditions to validate requests, and a security measure to reject requests that are greater than # of bytes; all the parsing code is within a file named security-utils.js and the 3 HTML forms are available in index.html.

References




Continue Learning