Data Storage

1.1 Data File Formats

# Introduction

Welcome to MAD9022. This course has changed many times over the years to become its current incarnation. With the current version, our topics will include:

  • Ways to organize your code for modern web applications
  • Intermediate level JavaScript
  • More HTML5 APIs
  • Approaches to data storage and caching
  • How to build installable, offline capable, Progressive Web Apps
  • How to create cross-platform Mobile Apps using the Dart programming language and the Flutter framework.

# Data File Formats

Data can be stored in many different types of text file formats. json, xml, csv, tsv, yaml, txt, and even html. These are all text files, meaning that you can open then with any text editor and make direct changes to the text data inside them.

Moving beyond just text files there are lots of ways to store and save data that require some other specialized software. Databases are a common example of this. Databases save your information in binary files that are organized and compressed in efficient ways. Each brand of database - MySQL, SQL Server, IndexedDB, MongoDB, etc. has its own file format and approach to saving the information. They all save their information in files but not files that you can open and read in a text editor.

As web developers we need to be comfortable working with data that is coming from a variety of formats. For client-side web development we need to be able to use JavaScript to work with json, xml and html text files. yaml files are also commonly used for web development. On the client side the only database that is embedded in the browser is IndexedDB. Other databases are typically accessed from the web server. We will use HTTP Requests to upload and download data from the server. A server-side script is used to access the database and complete the CRUD (Create, Read, Update, Delete) operations.

There are some solutions, like Firebase Firestore which allow us to talk to a server-side database from our client-side JavaScript. However, we are still really just using JavaScript to make HTTP Requests to a script running on the server that accesses the Firestore database and runs the CRUD operations.

# JSON

JSON - JavaScript Object Notation is the most popular format for sending data between clients and servers. It is called JSON because it uses a JavaScript-compatible syntax for encoding the information inside the file.

However, it is NOT JavaScript. It is just a text file with a single long string. For this exact reason, we cannot save things like functions or DOM elements inside of JSON files. We can only save String, Number, Boolean, and null (Primitive values) plus Array literals and Object literals.

The JSON file format is used by localStorage and sessionStorage to hold data in the browser. More on this next week.

The primary differences between a JS object and a JSON object are:

  • All object keys must be wrapped in double quotes.
  • All string values must be wrapped in double quotes.
  • No trailing commas are allowed after array or object values.
  • No comments are allowed in the file.

Here is a JavaScript Object:

let obj = {
  name: 'Joanne',
  id: 123,
  active: true,
  courses: ['HRT100', 'HRT200', 'HRT300']
}
1
2
3
4
5
6

and here is the same information as JSON:

{"name":"Joanne", "id":123, "active":true, "courses":["HRT100", "HRT200", "HRT300"]}
1

Notice all the double quotes around all the string values. No quotes around the number or boolean values.

# XML

XML - eXtensible Markup Language, created in 1998, was the first file format that was used for client-side web development for the transfer of data between clients and servers. As the name suggests, it is a MarkUp language. Angle brackets < > are used to wrap the tag names which are used to label and describe the information in the file.

The most important rule for writing XML files is Human Readable.

This one rule meant that XML rapidly became a very popular format with the thousands of new developers who started working in web development in the late 90s and early 2000s. The format was adopted by nearly all major software providers and is still widely used today.

An example of the widespread support for XML was the Microsoft adoption of it as a wrapper for all their MS Office files in Office 2007. With this release file formats changed from .doc to .docx and .xls to .xslx and so on. The name change reflected that XML had become a core part of the file format. A .docx file is really just a .doc file, wrapped inside of an XML file and then zipped. All the new features for MS Word have been added via the XML portion of the file.

JSON overtook XML as the most popular web development format during the last decade because it was Developer Readable and because the file size was noticeably smaller than XML.

Here is the same data as above, as an XML file.

<?xml version="1.0" encoding="utf-8" xmlns="https://com.algonquincollege/student">
<student>
  <name>Joanne</name>
  <id>123</id>
  <active>true</active>
  <courses>
    <course>HRT100</course>
    <course>HRT200</course>
    <course>HRT300</course>
  </courses>
</student>
1
2
3
4
5
6
7
8
9
10
11

You can see how much more typing is required to output that small amount of information.

# YAML

YAML - YAML Ain't Markup Language is another simple text format for data. yaml website (opens new window) It is primarily used as a way to save settings for other applications. It uses carriage returns and tabbed spaces to indicate grouping and structure in the file instead of {}, [], or < >.

Here is a quick 3 minute video about YAML file format.

All three of these formats have native support across many, many programming languages.

# Other Formats

csv stands for Comma separated values. These are files that save data in columns and rows just like a spreadsheet. Each end of line | carriage return is the end of a row. Each of the values in a row is separated by a comma, to create the columns.

tsv stands for Tab Separated values. These files are very similar to the CSV files in the way that the information is saved in rows and columns. The primary difference between csv and tsv is that the tab separated value files use the tab character instead of a comma to create the columns.

fixed-width files are closely related to the tsv files. They have rows of entries. Each entry (column) has a fixed number of characters allowed. Space characters are used to pad the data within the available spaces. Visually very much like the tsv files.

txt text files are the most basic version of the data files. It is up to the developer to decide how they want to arrange the data inside these files. This type and the others listed so far are often called flat-file databases.

html HyperText Markup Language files are very familiar to web developers. While you may not think of it as a data format file, they do contain all the text information for a webpage. The attributes in the HTML elements can be used to store extra non-visual information. This approach is actually so common that dataset properties were officially added to the HTML specification. Any attribute name that starts with data- is one of the dataset properties and it has special JS methods for reading and writing these values.

# What to do this week

TODO

Things to do before next week.

  • Read all the content from Modules 1.1, 1.2, and 2.1.
  • Prepare questions to ask in class.
Last Updated: 1/7/2022, 4:36:48 PM