50.043 - SQL¶
Learning Outcomes¶
By the end of this unit you should be able to use SQL to
- create and alter databases and tables
- create and alter table constraints
- inject data into tables
- retrieve data from tables
- update/delete data from tables
SQL¶
SQL is a high-level language for data definition and data manipulation. By high-level, we mean it is * expressive * closer to the programmers
SQL is a declarative programming language. By declarative, we use SQL to specify what to do but not how to do. (The "how-to" parts are left to the underlying runtime to decide, i.e. the DBMS query operation module).
SQL is almost universal and cross platforms. Modern big data and non-relation databases extend to support SQL.
Note that different DBMSes have different subset of SQL statements. In this unit, we try to cover the common ones.
Data definition language¶
Let's consider the DDL of SQL.
Create and drop database¶
Note that in a DBMS, there may be many different databases, identified by their names.
To create a database, we may use the following SQL statement.
where db_name
is the name of the database; if not exists
means the create statement only applies when there is no existing database in the DBMS with name db_name
.
Note that SQL is case-insensitive.
To drop a database
To rename a database
However in some DBMS, e.g. MySQL, the above statement is rejected. Instead, we need to dump the old database and load it into the new database.
Create table¶
In the above statement, we create the article
table in the database my_db
. article
has two attributes, id
and name
. Both attributes are of type char(100)
, i.e. character sequence with max length = 100. id
is set to be primary key of the table.
Similarly, we create the book
, publisher
and publish
tables.
Finally, we create the table publish
which was translated from a tertiary relationship.
article_id
, book_id
and publisher_id
to ensure their existence in the entity table article
, book
and publisher
.
Alter table¶
We can alter a table (e.g. name, attribute name, attribute type, constraints) using the alter statement. For instance,
The first alter statement drop the primary key, and the second one recreate a new primary using (article_id, publisher_id)
composition.
Some DBMS Implementation Specific Details¶
Note that in some DBMS implementation, we need to drop the foreign key constraints before we can drop the primary key. For instance MySQL, the foreign key constraint automatically create an index on the attribute, in this case article_id
and book_id
. It uses the existing index from the existing primary key constraint.
In case of MySQL, we need to first find out the name of the foreign key constraints
Then execute the following statements before
the drop primary key
statement.
Then execute the following statements after the add primary key
statement
For more detals of alter table statement for MySQL, refer to the MySQL documentation.
Drop table¶
To drop a table, we could run
Injecting value¶
To inject values, we use the insert statement.
Note that we can omit the schema (id, name)
when values for all columsn are present. Furthermore, when inserting values into a table with foreign key constraint, e.g. publish
references article
, book
and publisher
, the values to be inserted must respect the existence of the referenced keys from the referenced tables.
Mass importing and exporting¶
In some situation, it is inefficient to inject values one by one via insert statement. Many DBMS implementations offer means to import and export data from text file or other format. For MySQL, please refer to this document and this document.
Querying table¶
To retrieve data stored in a table, we use the select statement.
publish
table. In this case, since we are retrieving all columns, we could re-write the above as follows,
Export to CSV¶
In some implemntation, such as MySQL, we could use the select statement to export the data in a table into a CSV file.
Join-Query¶
When querying multiple table, we would use the inner join.
For breivity, we could give aliases to the tables being joined.
The above queries produceThe left- and right- outter join queries can be expressed in a similar way by replacing inner
by left
or right
.
Where clause¶
Suppose we would like to find all the article names that are published by publisher p1
.
Note that instead of inner join
, we can rewrite the above query using equi-join
which is pushing the id matching to the filtering operation.
This is because the following equation holds in relational algebra
Self join¶
Suppose we want to find all articles that are published by both publisher p1
and p2
.
The following query
where
clause is applied to per tuple level. Since there is no tuple having publisher_id
as p1
and p2
at the same time, the result is an empty set.
In such situation, we need to join a table to itself.
publish
table twice, then the join are performed among the two clones and the article table.
Nested query¶
Alternatively to self-join, we could express the above query using nested query.
In the above, we find a nested query, the outer query joins publish
with article
and filters out those tuples with publisher_id
as p1
. The last predicate checks the article id must be found in the result of the nested query. The nested query joins the clones of the two tables and filter tuples with publisher_id
equal to p2
.
Aggregation¶
For analytic purpose, we need to aggregate values by group.
For example, the following statement counts the number of tuples in the publish
table.
Suppose we would like to counts the number of published article published by publisher p1
.
In the above the group by
clause specifies the attribute publisher_id
is the attribute the groups created by, i.e. tuples within each group should have the same publisher_id
.
The above SQL statement is equivalent to the following relational algebra express
Sorting¶
Suppose we want to sort the result of the last query by the counts in ascending order.
In the above the as cnt
creates an alias for the column count(*)
for the ease of references. The order by
clause specifies the order of the returned results.
Update and delete¶
To update tuples/records in table, we use the update statement.
The above SQL statement updates all tuples's name topublisher one
in publisher
table with the existing name as publisher 1
.
To delete tuples/records, we use the delete statement.