<div dir="ltr"><div>Ha!  Well, sometimes excellence goes unrecognized and even unnoticed.  Your writing and explanations deserve recognition and applause!  I humbly offer mine.  Too often we have to suffer through mediocre writing.  Not so with you.  So thanks again!<br></div>Dave<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 20, 2015 at 7:51 PM, Alex Carver <span dir="ltr">&lt;<a href="mailto:agcarver+ale@acarver.net" target="_blank">agcarver+ale@acarver.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">You&#39;re quite welcome, I&#39;m glad you got something out of it.  As for<br>

technical writing, I wrote too much in grad school and still have to<br>

write too much in my current job to want to go into voluntary technical<br>

writing professionally. ;)<br>

<br>

Of course that changes if I ever attempt to secure a professorship at a<br>

university somewhere in which case I&#39;d be writing lesson plans and<br>

textbooks.  But that&#39;s a bit further off and the writing is still not<br>

completely voluntary but at least I&#39;d have some advance warning. ;)<br>

<div class="HOEnZb"><div class="h5"><br>

On 2015-03-20 15:44, David Jackson wrote:<br>

&gt; Hi Alex,<br>

&gt;<br>

&gt; No doubt we&#39;ll get to Normal Form at some point, but I want to thank you<br>

&gt; for your most excellent explanation here.  Whatever text we read when we do<br>

&gt; get to it would most definitely benefit from your write-up!  Have you ever<br>

&gt; considered technical writing?  I would bet you&#39;ve done a lot, because I<br>

&gt; know that explanations this clear don&#39;t just happen accidentally.  I wish<br>

&gt; my texts were as clear as your explanations!<br>

&gt;<br>

&gt; Thanks again!<br>

&gt; Dave<br>

&gt;<br>

&gt; On Sat, Mar 14, 2015 at 7:09 PM, Alex Carver &lt;<a href="mailto:agcarver%2Bale@acarver.net">agcarver+ale@acarver.net</a>&gt;<br>

&gt; wrote:<br>

&gt;<br>

&gt;&gt; I know it was an assignment so you&#39;re limited but it was something to<br>

&gt;&gt; keep in mind as the construction of that database is not the best<br>

&gt;&gt; example of a proper design.<br>

&gt;&gt;<br>

&gt;&gt; The rule isn&#39;t for MySQL specifically, it&#39;s for database design in<br>

&gt;&gt; general and is called Normal Form.  It applies to all relational<br>

&gt;&gt; databases independent of the engines that support them (MySQL, Postgres,<br>

&gt;&gt; etc.)  The concept isn&#39;t that the data is atomic more that it is not<br>

&gt;&gt; repeated unnecessarily which maximizes storage (this is still a critical<br>

&gt;&gt; thing for hypergiant databases used by NASA, Google, Amazon, etc.) and<br>

&gt;&gt; also allows better query structure and operation.<br>

&gt;&gt;<br>

&gt;&gt; There are multiple levels of Normal Form starting with Zero, which has<br>

&gt;&gt; no optimizations, going up to Five, which has the most (and sometimes<br>

&gt;&gt; very complicated) optimizations (there are sometimes &quot;extra&quot; unofficial<br>

&gt;&gt; levels referenced above Five but they dont&#39; make much sense.)  Where you<br>

&gt;&gt; stop all depends on how easy it is to visualize the database structure<br>

&gt;&gt; at the higher NF levels (so that it can be implemented) and whether the<br>

&gt;&gt; data is a natural fit.  In most cases you end up with a hybrid database<br>

&gt;&gt; where most is at some NF level and small portions are at a lower level.<br>

&gt;&gt;<br>

&gt;&gt; The most common textbook examples for addressing NF start off with basic<br>

&gt;&gt; address books/contacts lists.  You start with an address book with a<br>

&gt;&gt; person&#39;s name and postal address:<br>

&gt;&gt;<br>

&gt;&gt; John Smith  123 Main St, Springville, IL 12345<br>

&gt;&gt; Jane Doe    456 Side St, Springhills, IL 12345<br>

&gt;&gt; Jim  Johns  789 Road St, Springvalley, IL 12345<br>

&gt;&gt; Jill Dale   123 High Rd, Fallglen, OH 23456<br>

&gt;&gt; Joe  Jims   456 Low Rd,  Fallcrest, OH 23456<br>

&gt;&gt;<br>

&gt;&gt; In a zero NF database, these would be recorded one at a time, lots of<br>

&gt;&gt; repeated information (obviously the ZIP and state).  So it&#39;s obvious<br>

&gt;&gt; that there is a minimization that can occur with the state at a minimum<br>

&gt;&gt; by breaking it out into a table and then relating the address book table<br>

&gt;&gt; with the state table.  Now when you store a state, you only have an<br>

&gt;&gt; integer between 0 and 49 (assuming only the 50 states) and integer<br>

&gt;&gt; storage is more efficient than a string of two characters in most<br>

&gt;&gt; database engines.<br>

&gt;&gt;<br>

&gt;&gt; A less obvious one is the ZIP which actually could be broken into a<br>

&gt;&gt; small set of foreign lookup tables.  The number of ZIPs is limited and<br>

&gt;&gt; only certain ZIPs exist in certain states because the US is divided into<br>

&gt;&gt; 10 regions.  So you could have at least one table that matched a ZIP<br>

&gt;&gt; with a state and use that in the address book instead.  Then you store<br>

&gt;&gt; the ZIP only once, the state is already matched to the ZIP and thus the<br>

&gt;&gt; state is only stored once.  The address book then only needs to point to<br>

&gt;&gt; the ZIP table and the ZIP table maps to the state so you can recreate<br>

&gt;&gt; the state information without having to have a &quot;state&quot; column in the<br>

&gt;&gt; address book.<br>

&gt;&gt;<br>

&gt;&gt; By example:<br>

&gt;&gt; ZIP_State_Table:<br>

&gt;&gt; ID  ZIP      State<br>

&gt;&gt; 1   12345    1<br>

&gt;&gt; 2   12346    1<br>

&gt;&gt; 3   23456    2<br>

&gt;&gt; 4   23457    2<br>

&gt;&gt;<br>

&gt;&gt; State_Table:<br>

&gt;&gt; ID       State<br>

&gt;&gt; 1        IL<br>

&gt;&gt; 2        OH<br>

&gt;&gt;<br>

&gt;&gt; Now the address book only needs:<br>

&gt;&gt; Name, Street Address, City, ZIP_ID<br>

&gt;&gt;<br>

&gt;&gt; Note that the state field is not required.  You get the state by<br>

&gt;&gt; chaining the three tables together.  And the ZIP doesn&#39;t have to be<br>

&gt;&gt; explicitly stored, just the ID (so that other postal codes could be used<br>

&gt;&gt; later such as Canadian).  You could extend this to a table of Cities<br>

&gt;&gt; since ZIPs encode onto cities and that would eliminate the need to store<br>

&gt;&gt; the city in the address book.  This is getting closer to Four and Five<br>

&gt;&gt; NF since you&#39;re abstracting a lot of data and really paring down the<br>

&gt;&gt; original table.  The reality is that most places like Amazon do exactly<br>

&gt;&gt; this.  You don&#39;t need to store everyone&#39;s city over and over (and waste<br>

&gt;&gt; the space doing so) because you know every city.  Just record them once<br>

&gt;&gt; and point to them.<br>

&gt;&gt;<br>

&gt;&gt; The other part of NF is where pointers should go.  This is less an<br>

&gt;&gt; atomicity of data and more of a logical structuring.  For your database,<br>

&gt;&gt; all animals need food so the animal table should really carry the food.<br>

&gt;&gt;  The food table itself is only the NF conversion to minimize data<br>

&gt;&gt; replication.  Since foods can be eaten by more than one animal, the<br>

&gt;&gt; current arrangement of your example tables (with animal ID in the food<br>

&gt;&gt; table) causes unnecessary replication of data (a food gets repeated)<br>

&gt;&gt; which goes against NF rules.  The map table I suggested goes up another<br>

&gt;&gt; level of NF because it avoids replication of data in the animal table,<br>

&gt;&gt; too, since one animal can eat more than one food.  But, of the two<br>

&gt;&gt; evils, replicating the animal was the lesser evil because it&#39;s more<br>

&gt;&gt; logical in a database sense compared to replicating food mainly because<br>

&gt;&gt; the food is more likely to be replicated with greater frequency than the<br>

&gt;&gt; animals (any individual animal might eat less than five food items but<br>

&gt;&gt; any food item can potentially be eaten by an entire zoo&#39;s population so<br>

&gt;&gt; you have many more potential repetitions of the food than the animal).[1]<br>

&gt;&gt;<br>

&gt;&gt; In the end there&#39;s no real limit to how far you should go in terms of NF<br>

&gt;&gt; other than whether it makes implementing the database much harder than<br>

&gt;&gt; needed for the task.  For Amazon, high level NF is crucial otherwise<br>

&gt;&gt; they&#39;ll burn through disk space in a hurry.  For a small DB, maybe not<br>

&gt;&gt; critical at all.  My personal address book might only be 50 entries so<br>

&gt;&gt; do I need to spend the time doing it?  Not really, there&#39;s no savings<br>

&gt;&gt; for me except that it&#39;s just good practice.<br>

&gt;&gt;<br>

&gt;&gt; NF is not hard and fast but it&#39;s very useful for making databases work<br>

&gt;&gt; better.  It also allows you to eventually slice and dice your data in<br>

&gt;&gt; ways that might otherwise be very difficult to achieve with flat or low<br>

&gt;&gt; NF databases.  It&#39;s harder to create an example of this since easy to<br>

&gt;&gt; type and follow database structures don&#39;t lend themselves to difficult<br>

&gt;&gt; analysis (by design).  But big research databases are a prime place<br>

&gt;&gt; where this happens.  A researcher may piece together data in various<br>

&gt;&gt; ways to extract trends but this would be difficult to do if the database<br>

&gt;&gt; was a monolithic table or otherwise used a very low order NF that didn&#39;t<br>

&gt;&gt; break out all the data as much as possible.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; [1] Another way to visualize this is to switch the animals to zoo<br>

&gt;&gt; visitors (humans) and the food that the visitors eat at the zoo (which<br>

&gt;&gt; is usually a limited set of park fare like hot dogs, hamburgers, chicken<br>

&gt;&gt; wings, soda, etc.).  If you had to track every visitor and what they ate<br>

&gt;&gt; for lunch, are you more likely to put the pointer to food in the visitor<br>

&gt;&gt; table or a pointer to the visitor in the food table?  If you put the<br>

&gt;&gt; food pointer in the visitor table, then each visitor might have two or<br>

&gt;&gt; three records (one visitor bought a soda, a hot dog, and a soft pretzel<br>

&gt;&gt; so three records for John Doe).  The food table here would still only<br>

&gt;&gt; have the few items you sell  at the zoo (hot dog, hamburger, chicken,<br>

&gt;&gt; soda, water, pretzel, ice cream.)<br>

&gt;&gt;<br>

&gt;&gt; But how many of those visitors at a busy zoo bought a hot dog?  If the<br>

&gt;&gt; pointer is reversed and your food table points at a visitor then your<br>

&gt;&gt; food table might suddenly have 10,000 hot dog entries.<br>

&gt;&gt;<br>

&gt;&gt; If your zoo gets 100,000 people a day (big zoo) and every person buys<br>

&gt;&gt; three food items on average then your day&#39;s table of visitors will be<br>

&gt;&gt; 300,000 records and the food table has just 7 using the first<br>

&gt;&gt; arrangement (visitor points to food).  If you reverse the pointer (food<br>

&gt;&gt; points to visitor), your visitor table is 100,000 records (every<br>

&gt;&gt; visitor) and each item in your food table might have been purchased by a<br>

&gt;&gt; third of the visitors for a total of over 230,000 records.<br>

&gt;&gt;<br>

&gt;&gt; The final tally, visitor points to food is 300,007 records and food<br>

&gt;&gt; points to visitor is 330,000 records.  Net savings of 30,000 records.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; On 2015-03-14 14:34, David Jackson wrote:<br>

&gt;&gt;&gt; Thanks Alex!  Wow.  Very impressive analysis!<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; This has been for an assignment, so I couldn&#39;t change the tables, but<br>

&gt;&gt; your<br>

&gt;&gt;&gt; way of gluing the info together makes sense to me, so thanks!  I think<br>

&gt;&gt;&gt; they&#39;re just trying to keep it simple for the class<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; It looks like for MySQL you want to &quot;atomize&quot; the data as much as<br>

&gt;&gt; possible<br>

&gt;&gt;&gt; to keep the structure out of the way of possible relations among the<br>

&gt;&gt;&gt; data??  I guess it&#39;s a judgment call to not get too &quot;atomic&quot; though, huh?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Dave<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; On Fri, Mar 13, 2015 at 5:55 PM, Alex Carver &lt;<a href="mailto:agcarver%2Bale@acarver.net">agcarver+ale@acarver.net</a>&gt;<br>

&gt;&gt;&gt; wrote:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; There are four JOINs available to you, LEFT, RIGHT, INNER (this one is<br>

&gt;&gt;&gt;&gt; usually just the unadorned JOIN keyword) and OUTER.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; To see if you have an animal with no food and given that your food table<br>

&gt;&gt;&gt;&gt; is on the right of the animal table in the query you offered you need to<br>

&gt;&gt;&gt;&gt; use a LEFT JOIN.  The LEFT JOIN ensures that the left table takes<br>

&gt;&gt;&gt;&gt; precedence during row generation such that missing matches result in<br>

&gt;&gt;&gt;&gt; NULLs in the fields from the right-side tables:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; So change your query to:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; SELECT id, name, family, feed FROM animal LEFT JOIN food ON <a href="http://animal.id" target="_blank">animal.id</a> =<br>

&gt;&gt;&gt;&gt; food.animalid;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; That gives you everything.  Then add a WHERE clause to filter out<br>

&gt;&gt;&gt;&gt; animals missing food:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; SELECT id, name, family, feed FROM animal LEFT JOIN food ON <a href="http://animal.id" target="_blank">animal.id</a> =<br>

&gt;&gt;&gt;&gt; food.animalid WHERE feed IS NULL;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; The unadorned JOIN (which is a INNER or STRAIGHT JOIN) will only show<br>

&gt;&gt;&gt;&gt; records where there are matches in all joined tables.  The RIGHT JOIN<br>

&gt;&gt;&gt;&gt; variant will reverse the precedence of the tables (LEFT and RIGHT JOINs<br>

&gt;&gt;&gt;&gt; work the same, the precedence is reversed and it&#39;s mainly used when you<br>

&gt;&gt;&gt;&gt; join many tables so that you get the precedence correct for the needed<br>

&gt;&gt;&gt;&gt; query).<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; On an aside, the table as constructed isn&#39;t quite proper according to<br>

&gt;&gt;&gt;&gt; the rules of Normal Form (how databases are laid out).  Your food table<br>

&gt;&gt;&gt;&gt; should not have an animal ID.  Instead your animal should have a food ID<br>

&gt;&gt;&gt;&gt; if it can only eat one type of food.  Alternatively there should be a<br>

&gt;&gt;&gt;&gt; third table which maps animal ID to food ID that way an animal can eat<br>

&gt;&gt;&gt;&gt; more than one food.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Two table variant:<br>

&gt;&gt;&gt;&gt; Animal: {id, name, family, weight, foodid}<br>

&gt;&gt;&gt;&gt; Food: {id, feed}<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Three table variant:<br>

&gt;&gt;&gt;&gt; Animal: {id, name, family, weight}<br>

&gt;&gt;&gt;&gt; Food: {id, feed}<br>

&gt;&gt;&gt;&gt; AnimalFoodMap: {animalid, foodid}<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; In the three table variant you join the animal and food tables using the<br>

&gt;&gt;&gt;&gt; AnimalFoodMap as &quot;glue&quot;:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; SELECT name, family, weight, feed FROM Animal LEFT JOIN AnimalFoodMap ON<br>

&gt;&gt;&gt;&gt; <a href="http://animal.id" target="_blank">animal.id</a> = AnimalFoodMap.animalid LEFT JOIN Food ON<br>

&gt;&gt;&gt;&gt; AnimalFoodMap.foodid = Food.id;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; If an animal has more than one food type, the animal is listed once in<br>

&gt;&gt;&gt;&gt; the Animal table, the foods are each listed only once in the Food table<br>

&gt;&gt;&gt;&gt; and the AnimalFoodMap table contains the appropriate pairings.  Your<br>

&gt;&gt;&gt;&gt; example has more than one food per animal and also one repeated food so<br>

&gt;&gt;&gt;&gt; let&#39;s modify it:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Animal (table):<br>

&gt;&gt;&gt;&gt; id  name    family   weight<br>

&gt;&gt;&gt;&gt; 1  Seymore  Snake    10<br>

&gt;&gt;&gt;&gt; 2  Gerard   Giraffe   120<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Food (table):<br>

&gt;&gt;&gt;&gt; id  feed<br>

&gt;&gt;&gt;&gt; 1   mice<br>

&gt;&gt;&gt;&gt; 2   leaves<br>

&gt;&gt;&gt;&gt; 3   grass<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Now the magic AnimalMapTable:<br>

&gt;&gt;&gt;&gt; animalid   foodid<br>

&gt;&gt;&gt;&gt; 1          1<br>

&gt;&gt;&gt;&gt; 1          2<br>

&gt;&gt;&gt;&gt; 2          2<br>

&gt;&gt;&gt;&gt; 2          3<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; (Note that I&#39;m skipping over some things such as keys that should be<br>

&gt;&gt;&gt;&gt; used to ensure there are no duplicates)<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; After doing this there are several additional ways to write up queries<br>

&gt;&gt;&gt;&gt; to make an output similar to what you have (single line for an animal<br>

&gt;&gt;&gt;&gt; plus all the entries for the foods) which requires some more complicated<br>

&gt;&gt;&gt;&gt; query structure.  But for now, the query I wrote above would output:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; id name      family  feed<br>

&gt;&gt;&gt;&gt; 1  Seymore   Snake   mice<br>

&gt;&gt;&gt;&gt; 1  Seymore   Snake   leaves<br>

&gt;&gt;&gt;&gt; 2  Gerard    Giraffe leaves<br>

&gt;&gt;&gt;&gt; 2  Gerard    Giraffe grass<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; On 2015-03-13 10:52, David S Jackson wrote:<br>

&gt;&gt;&gt;&gt;&gt; Hi,<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; I&#39;m a MySQL newbie.  I&#39;m just starting to use Python to talk to MySQL<br>

&gt;&gt;&gt;&gt;&gt; databases, but first I need to understand the MySQL query language!<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; So, I have two tables:  (zoo) animal and food.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; animal has the columns ID, NAME, FAMILY, WEIGHT<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; food has  ID, ANIMALID, FEED<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; So if I go: select id, name, family, feed from animal JOIN food ON<br>

&gt;&gt;&gt;&gt;&gt; <a href="http://animal.id" target="_blank">animal.id</a>=food.animalid;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; I get something like:<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; ID   NAME        FAMILY      FEED<br>

&gt;&gt;&gt;&gt;&gt; 1    Seymore   Snake        mice, leaves<br>

&gt;&gt;&gt;&gt;&gt; 2    Gerard      Giraffe       leaves, grass<br>

&gt;&gt;&gt;&gt;&gt; ...<br>

&gt;&gt;&gt;&gt;&gt; etc<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; So, if I have a situation where I want to see whether an animal has<br>

&gt;&gt; been<br>

&gt;&gt;&gt;&gt;&gt; added the animal table but may have not been added to the food table,<br>

&gt;&gt;&gt;&gt;&gt; how would I compose that query?<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; That is, the animal could have been added to the animal table and could<br>

&gt;&gt;&gt;&gt;&gt; get an id, say 10, but the food table could have 20 foods already<br>

&gt;&gt;&gt;&gt;&gt; entered and the animal-id would be used several times.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; I was thinking, is there a way I can ask, &quot;does the highest <a href="http://animal.id" target="_blank">animal.id</a><br>

&gt;&gt;&gt;&gt;&gt; number equal the highest food.animalid number&quot;?  If not, what animal is<br>

&gt;&gt;&gt;&gt;&gt; not getting fed?<br>

<br>

_______________________________________________<br>

Ale mailing list<br>

<a href="mailto:Ale@ale.org">Ale@ale.org</a><br>

<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>

See JOBS, ANNOUNCE and SCHOOLS lists at<br>

<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>

</div></div></blockquote></div><br></div>