Primer¶
You’ll need an account to mysql if you don’t have one. I think Bill, Jeff or Molly can give one to you.
Once you have that, you can type
mysql -u <your usr name> -h wolphin.fsl.noaa.gov -p. You’ll then be asked for your password, which was set up when you got a mysql account.If that succeeds, you can make subsequent connections to mysql by setting up a
.my.conffile in your home directory. (If you’re in the amb-verif role account, there’s already such a file in the amb-verif home directory)----------------------- .my.cnf -------------------- [mysql] user=<your username> password=<your password> host=wolphin.fsl.noaa.gov skip-auto-rehash [mysqld_safe] timezone = '+0:00' ---------------------------------------------------------Once this file exists, you simply need to type
mysqlat the prompt to login.Here are some useful commands. All commands end with a semicolon.
show databases(lists all the databases)use <database>(e.g. ‘use ruc_ua’)show tables(will show all the tables in your selected database--probably a huge number)show tables like “%RAP%”;(to show all tables with “RAP” in the name. “%” matches any number of characters; “_” matches a single character)describe <table>shows the columns in the table.select * from <table>shows all the data in the tableselect * from <table> where <column> = <value> [and <another_column> = <another value> …pulls selected data out of a table. For the full instructions, see https://dev .mysql .com /doc /refman /5 .7 /en /select .html
To see if a query is going to take a long time, you can preface the query with ‘describe’:
describe <query>, where<query>could be any one of the select statements shown above.For instance:
mysql> describe select * from RAP_OPS_130_fract_bad_10 where hour = 0 order by fcst_len,date;Yields:
+----+-------------+--------------------------+------+---------------+------+---------+------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------------------+------+---------------+------+---------+------+------+-----------------------------+ | 1 | SIMPLE | RAP_OPS_130_fract_bad_10 | ALL | NULL | NULL | NULL | NULL | 112 | Using where; Using filesort | +----+-------------+--------------------------+------+---------------+------+---------+------+------+-----------------------------+ 1 row in set (0.00 sec)The number of rows considered column (“rows”) tells us that this is likely to be a very fast query. If the number of very large (over a million), or if multiple tables are listed in the describe output and they all have a large number of rows (the total number of rows considered is the product of all the rows listed for multiple tables), then your query will take a long time, but likely can be optimized to take a reasonable time. (Using multiple tables in a query isn’t discussed here specifically, but you may have a query you’ve borrowed that you’d like to edit. ‘Describe’ will be helpful to tell you if you’ve accidentally turned a reasonably fast query into a slow one.)
To pull data from a query into a text file If you’d like to get the your data into a text file for further processing, create a file like getit.sql:
----------------- getit.sql ----------------- use ruc_ua; select * from RAP_OPS_130_fract_bad_10 where hour = 0 order by fcst_len,date ; -----------------------------------------------Then do this “mysql < getit.sql >! output.txt
After this, ‘output.txt’ will have a tab-delimited copy of the output that can be pasted into a spreadsheet or otherwise processed.
Examples¶
Here are some examples, explained.
mysql> describe RAP_OPS_130_fract_bad_10;
+-----------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------+------+-----+---------+-------+
| hour | tinyint(4) | NO | | 0 | |
| date | date | YES | | NULL | |
| week | int(2) | YES | | NULL | |
| fcst_len | tinyint(4) | NO | | 0 | |
| avg_ob_ws | decimal(9,0) | YES | | NULL | |
| N | bigint(21) | NO | | 0 | |
| N_10 | decimal(23,0) | YES | | NULL | |
| fract | decimal(27,4) | YES | | NULL | |
+-----------+---------------+------+-----+---------+-------+...shows how the table RAP_OPS_130_fract_bad_10 is constructed.
mysql> select * from RAP_OPS_130_fract_bad_10 limit 10;
+------+------------+------+----------+-----------+------+------+--------+
| hour | date | week | fcst_len | avg_ob_ws | N | N_10 | fract |
+------+------------+------+----------+-----------+------+------+--------+
| 0 | 2020-01-04 | 0 | 3 | 44 | 3515 | 146 | 0.0415 |
| 0 | 2020-01-11 | 1 | 3 | 38 | 6391 | 152 | 0.0238 |
| 0 | 2020-01-18 | 2 | 3 | 39 | 6365 | 128 | 0.0201 |
| 0 | 2020-01-25 | 3 | 3 | 32 | 6392 | 122 | 0.0191 |
| 0 | 2020-02-01 | 4 | 3 | 32 | 6308 | 131 | 0.0208 |
| 0 | 2020-02-08 | 5 | 3 | 43 | 5304 | 152 | 0.0287 |
| 0 | 2020-02-15 | 6 | 3 | 41 | 6362 | 101 | 0.0159 |
| 0 | 2020-02-22 | 7 | 3 | 40 | 6430 | 50 | 0.0078 |
| 0 | 2020-02-29 | 8 | 3 | 38 | 6617 | 122 | 0.0184 |
| 0 | 2020-03-07 | 9 | 3 | 39 | 6645 | 166 | 0.0250 |
+------+------------+------+----------+-----------+------+------+--------+
10 rows in set (0.00 sec)...gives a glimpse of what’s in the table.
mysql> select * from RAP_OPS_130_fract_bad_10 where hour = 0 order by fcst_len,date;… pulls data out of the table only where ‘hour =0’, and orders the output by fcst_len (first) and then by date.
mysql> select hour,max(date) as date
,week(date) as week
,fcst_len
,round(avg(o_ws)/100,0) as avg_ob_ws
,count(*) as N
,sum(if(dw > 1000,1,0)) as N_10
, sum(if(dw > 1000,1,0)) /count(*) as fract
from RAP_OPS_130_dw
group by fcst_len,hour,week
order by fcst_len,hour,week
;… groups data together by week (“group by fcst_len,hour,week”). That is, for a given forecast_len, and (valid) hour, all data for a given week (mysql has a function called ‘week’ that finds the week of the year for a given date: “week(date)”) will be grouped together.
When grouping like this is done, ‘group’ functions such as ‘sum’, ‘avg’, ‘count’ can be used.
This also uses the if function: if(<condition>,<return this value if true>,<return this value if false>) to find the number of observations (during a week) when dw was > 1000, and we call that column “N_10”.
...more examples later...