1.1 Data File Formats
# Introduction
Welcome to MAD9022. This course has changed many times over the years to become its current incarnation. With the current version, our topics will include:
- Ways to organize your code for modern web applications
- Intermediate level JavaScript
- More HTML5 APIs
- Approaches to data storage and caching
- How to build installable, offline capable,
Progressive Web Apps
- How to create cross-platform Mobile Apps using the
Dart
programming language and theFlutter
framework.
# Data File Formats
Data can be stored in many different types of text file formats. json
, xml
, csv
, tsv
, yaml
, txt
, and even html
. These are all text files, meaning that you can open then with any text editor and make direct changes to the text data inside them.
Moving beyond just text files there are lots of ways to store and save data that require some other specialized software. Databases are a common example of this. Databases save your information in binary files that are organized and compressed in efficient ways. Each brand of database - MySQL
, SQL Server
, IndexedDB
, MongoDB
, etc. has its own file format and approach to saving the information. They all save their information in files but not files that you can open and read in a text editor.
As web developers we need to be comfortable working with data that is coming from a variety of formats. For client-side web development we need to be able to use JavaScript to work with json
, xml
and html
text files. yaml
files are also commonly used for web development. On the client side the only database that is embedded in the browser is IndexedDB
. Other databases are typically accessed from the web server. We will use HTTP Requests to upload and download data from the server. A server-side script is used to access the database and complete the CRUD
(Create, Read, Update, Delete) operations.
There are some solutions, like Firebase Firestore
which allow us to talk to a server-side database from our client-side JavaScript. However, we are still really just using JavaScript to make HTTP Requests to a script running on the server that accesses the Firestore
database and runs the CRUD
operations.
# JSON
JSON
- JavaScript Object Notation is the most popular format for sending data between clients and servers. It is called JSON because it uses a JavaScript-compatible syntax for encoding the information inside the file.
However, it is NOT JavaScript. It is just a text file with a single long string. For this exact reason, we cannot save things like functions or DOM elements inside of JSON files. We can only save String
, Number
, Boolean
, and null
(Primitive
values) plus Array
literals and Object
literals.
The JSON
file format is used by localStorage
and sessionStorage
to hold data in the browser. More on this next week.
The primary differences between a JS object and a JSON object are:
- All object keys must be wrapped in double quotes.
- All string values must be wrapped in double quotes.
- No trailing commas are allowed after array or object values.
- No comments are allowed in the file.
Here is a JavaScript Object:
let obj = {
name: 'Joanne',
id: 123,
active: true,
courses: ['HRT100', 'HRT200', 'HRT300']
}
2
3
4
5
6
and here is the same information as JSON:
{"name":"Joanne", "id":123, "active":true, "courses":["HRT100", "HRT200", "HRT300"]}
Notice all the double quotes around all the string values. No quotes around the number or boolean values.
# XML
XML
- eXtensible Markup Language, created in 1998, was the first file format that was used for client-side web development for the transfer of data between clients and servers. As the name suggests, it is a MarkUp language. Angle brackets < >
are used to wrap the tag names which are used to label and describe the information in the file.
The most important rule for writing XML
files is Human Readable.
This one rule meant that XML
rapidly became a very popular format with the thousands of new developers who started working in web development in the late 90s and early 2000s. The format was adopted by nearly all major software providers and is still widely used today.
An example of the widespread support for XML was the Microsoft adoption of it as a wrapper for all their MS Office files in Office 2007. With this release file formats changed from .doc
to .docx
and .xls
to .xslx
and so on. The name change reflected that XML had become a core part of the file format. A .docx
file is really just a .doc
file, wrapped inside of an XML file and then zipped. All the new features for MS Word have been added via the XML portion of the file.
JSON
overtook XML as the most popular web development format during the last decade because it was Developer Readable and because the file size was noticeably smaller than XML
.
Here is the same data as above, as an XML file.
<?xml version="1.0" encoding="utf-8" xmlns="https://com.algonquincollege/student">
<student>
<name>Joanne</name>
<id>123</id>
<active>true</active>
<courses>
<course>HRT100</course>
<course>HRT200</course>
<course>HRT300</course>
</courses>
</student>
2
3
4
5
6
7
8
9
10
11
You can see how much more typing is required to output that small amount of information.
# YAML
YAML
- YAML Ain't Markup Language is another simple text format for data. yaml website (opens new window) It is primarily used as a way to save settings for other applications. It uses carriage returns and tabbed spaces to indicate grouping and structure in the file instead of {}
, []
, or < >
.
Here is a quick 3 minute video about YAML file format.
All three of these formats have native support across many, many programming languages.
# Other Formats
csv
stands for Comma separated values. These are files that save data in columns and rows just like a spreadsheet. Each end of line | carriage return is the end of a row. Each of the values in a row is separated by a comma, to create the columns.
tsv
stands for Tab Separated values. These files are very similar to the CSV files in the way that the information is saved in rows and columns. The primary difference between csv
and tsv
is that the tab separated value files use the tab character instead of a comma to create the columns.
fixed-width
files are closely related to the tsv
files. They have rows of entries. Each entry (column) has a fixed number of characters allowed. Space characters are used to pad the data within the available spaces. Visually very much like the tsv
files.
txt
text files are the most basic version of the data files. It is up to the developer to decide how they want to arrange the data inside these files. This type and the others listed so far are often called flat-file databases.
html
HyperText Markup Language files are very familiar to web developers. While you may not think of it as a data format file, they do contain all the text information for a webpage. The attributes in the HTML elements can be used to store extra non-visual information. This approach is actually so common that dataset
properties were officially added to the HTML specification. Any attribute name that starts with data-
is one of the dataset
properties and it has special JS methods for reading and writing these values.
# What to do this week
TODO
Things to do before next week.
- Read all the content from
Modules 1.1, 1.2, and 2.1
. - Prepare questions to ask in class.