What is a DB?
A database is a rectangular storage system, columns and rows of data, just like a spreadsheet. Unlike a spreadsheet, the way you read and write data in a DB is through code or the command line, not through a point-and-click interface like Google Sheets or Excel.
I’ll mostly be using “database” to mean a relational database management system, like PostgreSQL or SQLite, but there are other paradigms.
The code you use to interact with the DB is called “SQL” (Structured Query Language), but you don’t have to know SQL. In the code examples I’ll show, there will be a “vanilla SQL” version and a “tidy R” version.
Where is a DB?
A database consists of a server and one or more clients. “Server” and “clients” here just mean computer applications; the server can be a webserver living out in the cloud, or it can be your own laptop. The client can be a command line terminal, an R session, a SAS session, a webpage running JavaScript, etc.
I’ll demonstrate later on how to use a local DB, where the server and the client are both on your computer. Then I’ll discuss the PSA On-Farm database, which is hosted on Azure and you can access from anywhere with an internet connection.
What’s inside?
Inside a database there are one or more tables, which are analogous to individual tabs in a regular spreadsheet. What makes these tables different is that:
- Each column only contains one type of data (numbers, text, dates)
- The type of each column must be decided before you put any data in it
- There is nothing outside the “rectangle” (no plots, no notes in the margins, no comment boxes floating over top)
There are additional benefits to using a structured relational database:
- Each table has a “key” that connects an observation between tables
- Keys can map one row to another row, or many rows to one row, or vice versa
- Planning a structure (schema) helps you decide how to organize your data
Are there downsides to using a database?
- You have to learn how to use them, by reading this tutorial
Example
Let’s look at a toy database with two tables:
plot_id crop tillage
1 101 corn no-till
2 102 corn strip-till
3 103 soy strip-till
4 104 soy no-till
5 201 corn strip-till
6 202 soy no-till
7 203 corn no-till
8 204 soy strip-till
plot_id Mg_ha
1 101 10.0
2 102 12.0
3 103 4.5
4 104 4.5
5 201 11.0
6 202 3.0
7 203 9.5
8 204 5.0
If you were making a spreadsheet of yields, you would probably want all of this in one table, so you could associate each observation with the treatments. But I’m sure you’ve noticed that copy-pasting columns of treatments inevitably ends up with some errors that you have to go back and clean up. Then when you want to have another observation type, like cover crop biomass or ground cover or weed count, you have to do it all over again.
With a database, instead you record the minimally identifiable information, like plot_id and the observation. Then once all your data is recorded and stored and cleaned, you can join tables together for your analyses. Note that here we’re using plot_id as our key. That’s how we know which row from one table is connected to a row from another table.
This sort of structure is especially helpful where information from one table would need to be repeated many times. Imagine an experiment with 3 or 4 columns of treatment variables, and then a table of sensor readings every 15 minutes for a year (~35000 rows per plot!). You’d have a lot of duplicated information, and any mistakes (misspelling, miscoding treatments, rows out of order) would break the whole thing.
What do I do with it though?
In general, databases are for storage and organization, and you use other tools like R or SAS for analysis. Once you have the two tables, you join them either in the DB (via SQL) or locally (using merge or dplyr::full_join):
plot_id crop tillage Mg_ha
1 101 corn no-till 10.0
2 102 corn strip-till 12.0
3 103 soy strip-till 4.5
4 104 soy no-till 4.5
5 201 corn strip-till 11.0
6 202 soy no-till 3.0
7 203 corn no-till 9.5
8 204 soy strip-till 5.0
And then analysis is straightforward, using your normal tools.
Analysis of Variance Table
Response: Mg_ha
Df Sum Sq Mean Sq F value Pr(>F)
crop 1 81.281 81.281 173.4000 0.0001921 ***
tillage 1 3.781 3.781 8.0667 0.0468587 *
crop:tillage 1 0.281 0.281 0.6000 0.4818174
Residuals 4 1.875 0.469
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
crop tillage Mg_ha
1 corn no-till 9.75
2 soy no-till 3.75
3 corn strip-till 11.50
4 soy strip-till 4.75
Or if you prefer tidyverse-flavored R:
# A tibble: 4 x 4
# Groups: crop [2]
crop tillage mean sd
<chr> <chr> <dbl> <dbl>
1 corn no-till 9.75 0.354
2 corn strip-till 11.5 0.707
3 soy no-till 3.75 1.06
4 soy strip-till 4.75 0.354
LS0tCnRpdGxlOiAiMDAxIC0gV2hhdCBpcyBhIGRhdGFiYXNlPyIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IAogICAgaGlnaGxpZ2h0OiB0YW5nbwogICAgbnVtYmVyX3NlY3Rpb25zOiB5ZXMKICAgIHRoZW1lOiByZWFkYWJsZQogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCiAgICBjb2RlX2ZvbGRpbmc6IG5vbmUKICAgIGluY2x1ZGVzOgogICAgICBiZWZvcmVfYm9keTogYXNzZXRzL2hlYWRlci5odG1sCiAgICAgIGFmdGVyX2JvZHk6IGFzc2V0cy9mb290ZXIuaHRtbAotLS0KIAojIFdoYXQgaXMgYSBEQj8KQSBkYXRhYmFzZSBpcyBhIHJlY3Rhbmd1bGFyIHN0b3JhZ2Ugc3lzdGVtLCBjb2x1bW5zIGFuZCByb3dzIG9mIGRhdGEsIGp1c3QgbGlrZSBhIHNwcmVhZHNoZWV0LiBVbmxpa2UgYSBzcHJlYWRzaGVldCwgdGhlIHdheSB5b3UgcmVhZCBhbmQgd3JpdGUgZGF0YSBpbiBhIERCIGlzIHRocm91Z2ggY29kZSBvciB0aGUgY29tbWFuZCBsaW5lLCBub3QgdGhyb3VnaCBhIHBvaW50LWFuZC1jbGljayBpbnRlcmZhY2UgbGlrZSBHb29nbGUgU2hlZXRzIG9yIEV4Y2VsLgoKSSdsbCBtb3N0bHkgYmUgdXNpbmcgImRhdGFiYXNlIiB0byBtZWFuIGEgcmVsYXRpb25hbCBkYXRhYmFzZSBtYW5hZ2VtZW50IHN5c3RlbSwgbGlrZSBQb3N0Z3JlU1FMIG9yIFNRTGl0ZSwgYnV0IHRoZXJlIGFyZSBvdGhlciBwYXJhZGlnbXMuCiAKVGhlIGNvZGUgeW91IHVzZSB0byBpbnRlcmFjdCB3aXRoIHRoZSBEQiBpcyBjYWxsZWQgIlNRTCIgKFN0cnVjdHVyZWQgUXVlcnkgTGFuZ3VhZ2UpLCBidXQgeW91IGRvbid0ICpoYXZlKiB0byBrbm93IFNRTC4gSW4gdGhlIGNvZGUgZXhhbXBsZXMgSSdsbCBzaG93LCB0aGVyZSB3aWxsIGJlIGEgInZhbmlsbGEgU1FMIiB2ZXJzaW9uIGFuZCBhICJ0aWR5IFIiIHZlcnNpb24uCgojIyBXaGVyZSBpcyBhIERCPwoKQSBkYXRhYmFzZSBjb25zaXN0cyBvZiBhIHNlcnZlciBhbmQgb25lIG9yIG1vcmUgY2xpZW50cy4gIlNlcnZlciIgYW5kICJjbGllbnRzIiBoZXJlIGp1c3QgbWVhbiBjb21wdXRlciBhcHBsaWNhdGlvbnM7IHRoZSBzZXJ2ZXIgY2FuIGJlIGEgd2Vic2VydmVyIGxpdmluZyBvdXQgaW4gdGhlIGNsb3VkLCBvciBpdCBjYW4gYmUgeW91ciBvd24gbGFwdG9wLiBUaGUgY2xpZW50IGNhbiBiZSBhIGNvbW1hbmQgbGluZSB0ZXJtaW5hbCwgYW4gUiBzZXNzaW9uLCBhIFNBUyBzZXNzaW9uLCBhIHdlYnBhZ2UgcnVubmluZyBKYXZhU2NyaXB0LCBldGMuIAoKSSdsbCBkZW1vbnN0cmF0ZSBsYXRlciBvbiBob3cgdG8gdXNlIGEgbG9jYWwgREIsIHdoZXJlIHRoZSBzZXJ2ZXIgYW5kIHRoZSBjbGllbnQgYXJlIGJvdGggb24geW91ciBjb21wdXRlci4gVGhlbiBJJ2xsIGRpc2N1c3MgdGhlIFBTQSBPbi1GYXJtIGRhdGFiYXNlLCB3aGljaCBpcyBob3N0ZWQgb24gQXp1cmUgYW5kIHlvdSBjYW4gYWNjZXNzIGZyb20gYW55d2hlcmUgd2l0aCBhbiBpbnRlcm5ldCBjb25uZWN0aW9uLgoKIyMgV2hhdCdzIGluc2lkZT8KCkluc2lkZSBhIGRhdGFiYXNlIHRoZXJlIGFyZSBvbmUgb3IgbW9yZSB0YWJsZXMsIHdoaWNoIGFyZSBhbmFsb2dvdXMgdG8gaW5kaXZpZHVhbCB0YWJzIGluIGEgcmVndWxhciBzcHJlYWRzaGVldC4gV2hhdCBtYWtlcyB0aGVzZSB0YWJsZXMgZGlmZmVyZW50IGlzIHRoYXQ6CgogKiBFYWNoIGNvbHVtbiBvbmx5IGNvbnRhaW5zIG9uZSB0eXBlIG9mIGRhdGEgKG51bWJlcnMsIHRleHQsIGRhdGVzKQogKiBUaGUgdHlwZSBvZiBlYWNoIGNvbHVtbiBtdXN0IGJlIGRlY2lkZWQgYmVmb3JlIHlvdSBwdXQgYW55IGRhdGEgaW4gaXQKICogVGhlcmUgaXMgbm90aGluZyBvdXRzaWRlIHRoZSAicmVjdGFuZ2xlIiAobm8gcGxvdHMsIG5vIG5vdGVzIGluIHRoZSBtYXJnaW5zLCBubyBjb21tZW50IGJveGVzIGZsb2F0aW5nIG92ZXIgdG9wKQogClRoZXJlIGFyZSBhZGRpdGlvbmFsIGJlbmVmaXRzIHRvIHVzaW5nIGEgc3RydWN0dXJlZCByZWxhdGlvbmFsIGRhdGFiYXNlOgoKICogRWFjaCB0YWJsZSBoYXMgYSAia2V5IiB0aGF0IGNvbm5lY3RzIGFuIG9ic2VydmF0aW9uIGJldHdlZW4gdGFibGVzCiAqIEtleXMgY2FuIG1hcCBvbmUgcm93IHRvIGFub3RoZXIgcm93LCBvciBtYW55IHJvd3MgdG8gb25lIHJvdywgb3IgdmljZSB2ZXJzYQogKiBQbGFubmluZyBhIHN0cnVjdHVyZSAoc2NoZW1hKSBoZWxwcyB5b3UgZGVjaWRlIGhvdyB0byBvcmdhbml6ZSB5b3VyIGRhdGEKIApBcmUgdGhlcmUgZG93bnNpZGVzIHRvIHVzaW5nIGEgZGF0YWJhc2U/CgogKiBZb3UgaGF2ZSB0byBsZWFybiBob3cgdG8gdXNlIHRoZW0sIGJ5IHJlYWRpbmcgdGhpcyB0dXRvcmlhbAoKIyMgRXhhbXBsZQpMZXQncyBsb29rIGF0IGEgdG95IGRhdGFiYXNlIHdpdGggdHdvIHRhYmxlczoKCmBgYHtyIGVjaG8gPSBGQUxTRX0KdHJlYXRtZW50cyA8LSBkYXRhLmZyYW1lKAogIHBsb3RfaWQgPSBjKDEwMSwgMTAyLCAxMDMsIDEwNCwgMjAxLCAyMDIsIDIwMywgMjA0KSwKICBjcm9wID0gYygiY29ybiIsICJjb3JuIiwgInNveSIsICJzb3kiLCAiY29ybiIsICJzb3kiLCAiY29ybiIsICJzb3kiKSwKICB0aWxsYWdlID0gYygibm8tdGlsbCIsICJzdHJpcC10aWxsIiwgInN0cmlwLXRpbGwiLCAibm8tdGlsbCIsICJzdHJpcC10aWxsIiwgIm5vLXRpbGwiLCAibm8tdGlsbCIsICJzdHJpcC10aWxsIiksCiAgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFCikKCnlpZWxkcyA8LSBkYXRhLmZyYW1lKAogIHBsb3RfaWQgPSBjKDEwMSwgMTAyLCAxMDMsIDEwNCwgMjAxLCAyMDIsIDIwMywgMjA0KSwKICBNZ19oYSA9IGMoMTAsIDEyLCA0LjUsIDQuNSwgMTEsIDMsIDkuNSwgNSksCiAgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFCikKYGBgCgpgYGB7ciBwYWdlZC5wcmludCA9IEZ9CnRyZWF0bWVudHMKeWllbGRzCmBgYApJZiB5b3Ugd2VyZSBtYWtpbmcgYSBzcHJlYWRzaGVldCBvZiB5aWVsZHMsIHlvdSB3b3VsZCBwcm9iYWJseSB3YW50IGFsbCBvZiB0aGlzIGluIG9uZSB0YWJsZSwgc28geW91IGNvdWxkIGFzc29jaWF0ZSBlYWNoIG9ic2VydmF0aW9uIHdpdGggdGhlIHRyZWF0bWVudHMuIEJ1dCBJJ20gc3VyZSB5b3UndmUgbm90aWNlZCB0aGF0IGNvcHktcGFzdGluZyBjb2x1bW5zIG9mIHRyZWF0bWVudHMgaW5ldml0YWJseSBlbmRzIHVwIHdpdGggc29tZSBlcnJvcnMgdGhhdCB5b3UgaGF2ZSB0byBnbyBiYWNrIGFuZCBjbGVhbiB1cC4gVGhlbiB3aGVuIHlvdSB3YW50IHRvIGhhdmUgYW5vdGhlciBvYnNlcnZhdGlvbiB0eXBlLCBsaWtlIGNvdmVyIGNyb3AgYmlvbWFzcyBvciBncm91bmQgY292ZXIgb3Igd2VlZCBjb3VudCwgeW91IGhhdmUgdG8gZG8gaXQgYWxsIG92ZXIgYWdhaW4uCgpXaXRoIGEgZGF0YWJhc2UsIGluc3RlYWQgeW91IHJlY29yZCB0aGUgbWluaW1hbGx5IGlkZW50aWZpYWJsZSBpbmZvcm1hdGlvbiwgbGlrZSBgcGxvdF9pZGAgYW5kIHRoZSBvYnNlcnZhdGlvbi4gVGhlbiBvbmNlIGFsbCB5b3VyIGRhdGEgaXMgcmVjb3JkZWQgYW5kIHN0b3JlZCBhbmQgY2xlYW5lZCwgeW91IGNhbiAqKmpvaW4qKiB0YWJsZXMgdG9nZXRoZXIgZm9yIHlvdXIgYW5hbHlzZXMuIE5vdGUgdGhhdCBoZXJlIHdlJ3JlIHVzaW5nIGBwbG90X2lkYCBhcyBvdXIgKiprZXkqKi4gVGhhdCdzIGhvdyB3ZSBrbm93IHdoaWNoIHJvdyBmcm9tIG9uZSB0YWJsZSBpcyBjb25uZWN0ZWQgdG8gYSByb3cgZnJvbSBhbm90aGVyIHRhYmxlLiAKClRoaXMgc29ydCBvZiBzdHJ1Y3R1cmUgaXMgZXNwZWNpYWxseSBoZWxwZnVsIHdoZXJlIGluZm9ybWF0aW9uIGZyb20gb25lIHRhYmxlIHdvdWxkIG5lZWQgdG8gYmUgcmVwZWF0ZWQgbWFueSB0aW1lcy4gSW1hZ2luZSBhbiBleHBlcmltZW50IHdpdGggMyBvciA0IGNvbHVtbnMgb2YgdHJlYXRtZW50IHZhcmlhYmxlcywgYW5kIHRoZW4gYSB0YWJsZSBvZiBzZW5zb3IgcmVhZGluZ3MgZXZlcnkgMTUgbWludXRlcyBmb3IgYSB5ZWFyICh+MzUwMDAgcm93cyBwZXIgcGxvdCEpLiBZb3UnZCBoYXZlIGEgbG90IG9mIGR1cGxpY2F0ZWQgaW5mb3JtYXRpb24sIGFuZCBhbnkgbWlzdGFrZXMgKG1pc3NwZWxsaW5nLCBtaXNjb2RpbmcgdHJlYXRtZW50cywgcm93cyBvdXQgb2Ygb3JkZXIpIHdvdWxkIGJyZWFrIHRoZSB3aG9sZSB0aGluZy4KCiMjIFdoYXQgZG8gSSBkbyB3aXRoIGl0IHRob3VnaD8KCkluIGdlbmVyYWwsIGRhdGFiYXNlcyBhcmUgZm9yICoqc3RvcmFnZSoqIGFuZCAqKm9yZ2FuaXphdGlvbioqLCBhbmQgeW91IHVzZSBvdGhlciB0b29scyBsaWtlIFIgb3IgU0FTIGZvciAqKmFuYWx5c2lzKiouIE9uY2UgeW91IGhhdmUgdGhlIHR3byB0YWJsZXMsIHlvdSBqb2luIHRoZW0gZWl0aGVyIGluIHRoZSBEQiAodmlhIFNRTCkgb3IgbG9jYWxseSAodXNpbmcgYG1lcmdlYCBvciBgZHBseXI6OmZ1bGxfam9pbmApOgoKYGBge3IgcGFnZWQucHJpbnQgPSBGfQp5aWVsZF9kYXRhIDwtIG1lcmdlKHRyZWF0bWVudHMsIHlpZWxkcykKeWllbGRfZGF0YQpgYGAKCkFuZCB0aGVuIGFuYWx5c2lzIGlzIHN0cmFpZ2h0Zm9yd2FyZCwgdXNpbmcgeW91ciBub3JtYWwgdG9vbHMuCgpgYGB7ciBwYWdlZC5wcmludCA9IEZ9Cm1vZGVsIDwtIGxtKE1nX2hhIH4gY3JvcCAqIHRpbGxhZ2UsIGRhdGEgPSB5aWVsZF9kYXRhKQphbm92YShtb2RlbCkKYWdncmVnYXRlKAogIE1nX2hhIH4gY3JvcCp0aWxsYWdlLCAKICBkYXRhID0geWllbGRfZGF0YSwgCiAgRlVOID0gbWVhbgogICkKYGBgCgpPciBpZiB5b3UgcHJlZmVyIHRpZHl2ZXJzZS1mbGF2b3JlZCBSOgoKYGBge3IgcGFnZWQucHJpbnQgPSBGfQpsaWJyYXJ5KGRwbHlyLCB3YXJuLmNvbmZsaWN0cyA9IEZBTFNFKQoKeWllbGRfZGF0YSAlPiUgCiAgZ3JvdXBfYnkoY3JvcCwgdGlsbGFnZSkgJT4lIAogIHN1bW1hcmlzZV9hdCgKICAgIHZhcnMoTWdfaGEpLCAKICAgIGxpc3Qofm1lYW4oLiksIH5zZCguKSkKICAgICkKYGBgCg==