There are several statistical functions available. Some of the simplest are max(), min(), avg(), and count(), for the maximum value, minimum value, average value, and number of values in a set of records.
For example, if we want to find out the most recent purchase date, we can use:
select max(purchasedate) from albums
On March 31, 2005, we purchased something. Now, what is the average year that our albums came out?
select avg(year) from albums
The average year is right about 1847. Even if we’re really into the oldies that seems awfully odd. Let’s take a look at maximum and minimum years:
select max(year), min(year) from albums
The most recent year was 2004, but the earliest is the year 0!
What happened is that, for some of our albums, we didn’t know what the year was, and when we imported the records, somewhere in the import the data went from empty to zero. Display those albums with:
select album, artist, year from albums where year=0
No one came out with any albums in the year zero. We don’t want our unknown years to affect our minimums, our averages, or even our counts. MySQL, as it turns out, has a special value that stands for nothing. This is the NULL value. We should replace zero with NULL in this case so that albums without years do not affect year calculations.
update albums set year=NULL where year=0
select max(year), min(year), avg(year) from albums
Now, our earliest known year is 1949, and the average year that our albums came out was sometime in 1979.
Let’s take a look at grouping data. For example, we know how to find out the average year of all of the albums we purchased, but what about the same information for a single artist?
We can select out only the data we’re interested in by using a where clause in our select statement:
select min(year), max(year), avg(year) from albums where artist="Alice Cooper"
But what if we want to see all artists in our list? That’s what the group by clause is for. We can group our data by a field or set of fields. In this case, let’s group by artist.
select artist, min(year), max(year), count(album) from albums group by artist
Our Abba albums came out between 1974 and 1980. Our Alice Cooper albums came out between 1971 and 2003, and there were 22 of them.
Who do we have the most albums of? We can give any function a name and than sort by that name.
select artist, count(album) as albumcount from albums group by artist order by albumcount desc, artist
Alice Cooper and Elton John top the list at 22 and 21 albums, respectively. That “desc” means descending. By default, sorting goes from least to greatest. When we specify desc it will go from greatest to least. In this case, it puts the artists with the most albums on top of the list.
But what are those albums? When we group data, we only see one entry for each group. One way of seeing all of the data in one field is to use the “group_concat()” function.
select artist, count(album) as albumcount, group_concat(album) as album_list from albums group by artist order by albumcount desc, artist
Sometimes you’ll want to replace text inside of other text. You’ll use the “update” statement for this as well, using the body() function. For example, if you want to replace all occurrences of “Beetles” in your album names with “Beatles”, whether the word occurs on its own or inside some title such as “The Beetles at Carnegie Hall”, you would do this:
update albums set album=replace(album, "Beetles", "Beatles")
Since there is no where portion of the statement, this will affect all records in the albums table.