50.043 - SQL¶
Learning Outcomes¶
By the end of this unit you should be able to use SQL to
- create and alter databases and tables
- create and alter table constraints
- inject data into tables
- retrieve data from tables
- update/delete data from tables
SQL¶
SQL is a high-level language for data definition and data manipulation. By high-level, we mean it is * expressive * closer to the programmers
SQL is a declarative programming language. By declarative, we use SQL to specify what to do but not how to do. (The "how-to" parts are left to the underlying runtime to decide, i.e. the DBMS query operation module).
SQL is almost universal and cross platforms. Modern big data and non-relation databases extend to support SQL.
Note that different DBMSes have different subset of SQL statements. In this unit, we try to cover the common ones.
Data definition language¶
Let's consider the DDL of SQL.
Create and drop database¶
Note that in a DBMS, there may be many different databases, identified by their names.
To create a database, we may use the following SQL statement.
where db_name is the name of the database; if not exists means the create statement only applies when there is no existing database in the DBMS with name db_name. 
Note that SQL is case-insensitive.
To drop a database
To rename a database
However in some DBMS, e.g. MySQL, the above statement is rejected. Instead, we need to dump the old database and load it into the new database.
Create table¶
In the above statement, we create the article table in the database my_db. article has two attributes, id and name. Both attributes are of type char(100), i.e. character sequence with max length = 100. id is set to be primary key of the table. 
Similarly, we create the book, publisher and publish tables.
Finally, we create the table publish which was translated from a tertiary relationship.
article_id, book_id and publisher_id to ensure their existence in the entity table article, book and publisher. 
Alter table¶
We can alter a table (e.g. name, attribute name, attribute type, constraints) using the alter statement. For instance,
The first alter statement drop the primary key, and the second one recreate a new primary using (article_id, publisher_id) composition. 
Some DBMS Implementation Specific Details¶
Note that in some DBMS implementation, we need to drop the foreign key constraints before we can drop the primary key. For instance MySQL, the foreign key constraint automatically create an index on the attribute, in this case article_id and book_id. It uses the existing index from the existing primary key constraint. 
In case of MySQL, we need to first find out the name of the foreign key constraints
Then execute the following statements before 
the drop primary key statement.
Then execute the following statements after the add primary key statement
For more detals of alter table statement for MySQL, refer to the MySQL documentation.
Drop table¶
To drop a table, we could run
Injecting value¶
To inject values, we use the insert statement.
Note that we can omit the schema (id, name) when values for all columsn are present. Furthermore, when inserting values into a table with foreign key constraint, e.g. publish references article, book and publisher, the values to be inserted must respect the existence of the referenced keys from the referenced tables.
Mass importing and exporting¶
In some situation, it is inefficient to inject values one by one via insert statement. Many DBMS implementations offer means to import and export data from text file or other format. For MySQL, please refer to this document and this document.
Querying table¶
To retrieve data stored in a table, we use the select statement.
publish table. In this case, since we are retrieving all columns, we could re-write the above as follows,
Export to CSV¶
In some implemntation, such as MySQL, we could use the select statement to export the data in a table into a CSV file.
Join-Query¶
When querying multiple table, we would use the inner join.
For breivity, we could give aliases to the tables being joined.
The above queries produceThe left- and right- outter join queries can be expressed in a similar way by replacing inner by left or right.
Where clause¶
Suppose we would like to find all the article names that are published by publisher p1.
Note that instead of inner join, we can rewrite the above query using equi-join which is pushing the id matching to the filtering operation.
This is because the following equation holds in relational algebra
Self join¶
Suppose we want to find all articles that are published by both publisher p1 and p2.
The following query
where clause is applied to per tuple level. Since there is no tuple having publisher_id as p1 and p2 at the same time, the result is an empty set.
In such situation, we need to join a table to itself.
publish table twice, then the join are performed among the two clones and the article table.
Nested query¶
Alternatively to self-join, we could express the above query using nested query.
In the above, we find a nested query, the outer query joins publish with article and filters out those tuples with publisher_id as p1. The last predicate checks the article id must be found in the result of the nested query. The nested query joins the clones of the two tables and filter tuples with publisher_id equal to p2. 
Aggregation¶
For analytic purpose, we need to aggregate values by group.
For example, the following statement counts the number of tuples in the publish table.
Suppose we would like to counts the number of published article published by publisher p1.
In the above the group by clause specifies the attribute publisher_id is the attribute the groups created by, i.e. tuples within each group should have the same publisher_id. 
The above SQL statement is equivalent to the following relational algebra express
Sorting¶
Suppose we want to sort the result of the last query by the counts in ascending order.
In the above the as cnt creates an alias for the column count(*) for the ease of references. The order by clause specifies the order of the returned results.
Update and delete¶
To update tuples/records in table, we use the update statement.
The above SQL statement updates all tuples's name topublisher one in publisher table with the existing name as publisher 1.
To delete tuples/records, we use the delete statement.