<div dir="ltr"><div>Ha! Well, sometimes excellence goes unrecognized and even unnoticed. Your writing and explanations deserve recognition and applause! I humbly offer mine. Too often we have to suffer through mediocre writing. Not so with you. So thanks again!<br></div>Dave<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 20, 2015 at 7:51 PM, Alex Carver <span dir="ltr"><<a href="mailto:agcarver+ale@acarver.net" target="_blank">agcarver+ale@acarver.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">You're quite welcome, I'm glad you got something out of it. As for<br>
technical writing, I wrote too much in grad school and still have to<br>
write too much in my current job to want to go into voluntary technical<br>
writing professionally. ;)<br>
<br>
Of course that changes if I ever attempt to secure a professorship at a<br>
university somewhere in which case I'd be writing lesson plans and<br>
textbooks. But that's a bit further off and the writing is still not<br>
completely voluntary but at least I'd have some advance warning. ;)<br>
<div class="HOEnZb"><div class="h5"><br>
On 2015-03-20 15:44, David Jackson wrote:<br>
> Hi Alex,<br>
><br>
> No doubt we'll get to Normal Form at some point, but I want to thank you<br>
> for your most excellent explanation here. Whatever text we read when we do<br>
> get to it would most definitely benefit from your write-up! Have you ever<br>
> considered technical writing? I would bet you've done a lot, because I<br>
> know that explanations this clear don't just happen accidentally. I wish<br>
> my texts were as clear as your explanations!<br>
><br>
> Thanks again!<br>
> Dave<br>
><br>
> On Sat, Mar 14, 2015 at 7:09 PM, Alex Carver <<a href="mailto:agcarver%2Bale@acarver.net">agcarver+ale@acarver.net</a>><br>
> wrote:<br>
><br>
>> I know it was an assignment so you're limited but it was something to<br>
>> keep in mind as the construction of that database is not the best<br>
>> example of a proper design.<br>
>><br>
>> The rule isn't for MySQL specifically, it's for database design in<br>
>> general and is called Normal Form. It applies to all relational<br>
>> databases independent of the engines that support them (MySQL, Postgres,<br>
>> etc.) The concept isn't that the data is atomic more that it is not<br>
>> repeated unnecessarily which maximizes storage (this is still a critical<br>
>> thing for hypergiant databases used by NASA, Google, Amazon, etc.) and<br>
>> also allows better query structure and operation.<br>
>><br>
>> There are multiple levels of Normal Form starting with Zero, which has<br>
>> no optimizations, going up to Five, which has the most (and sometimes<br>
>> very complicated) optimizations (there are sometimes "extra" unofficial<br>
>> levels referenced above Five but they dont' make much sense.) Where you<br>
>> stop all depends on how easy it is to visualize the database structure<br>
>> at the higher NF levels (so that it can be implemented) and whether the<br>
>> data is a natural fit. In most cases you end up with a hybrid database<br>
>> where most is at some NF level and small portions are at a lower level.<br>
>><br>
>> The most common textbook examples for addressing NF start off with basic<br>
>> address books/contacts lists. You start with an address book with a<br>
>> person's name and postal address:<br>
>><br>
>> John Smith 123 Main St, Springville, IL 12345<br>
>> Jane Doe 456 Side St, Springhills, IL 12345<br>
>> Jim Johns 789 Road St, Springvalley, IL 12345<br>
>> Jill Dale 123 High Rd, Fallglen, OH 23456<br>
>> Joe Jims 456 Low Rd, Fallcrest, OH 23456<br>
>><br>
>> In a zero NF database, these would be recorded one at a time, lots of<br>
>> repeated information (obviously the ZIP and state). So it's obvious<br>
>> that there is a minimization that can occur with the state at a minimum<br>
>> by breaking it out into a table and then relating the address book table<br>
>> with the state table. Now when you store a state, you only have an<br>
>> integer between 0 and 49 (assuming only the 50 states) and integer<br>
>> storage is more efficient than a string of two characters in most<br>
>> database engines.<br>
>><br>
>> A less obvious one is the ZIP which actually could be broken into a<br>
>> small set of foreign lookup tables. The number of ZIPs is limited and<br>
>> only certain ZIPs exist in certain states because the US is divided into<br>
>> 10 regions. So you could have at least one table that matched a ZIP<br>
>> with a state and use that in the address book instead. Then you store<br>
>> the ZIP only once, the state is already matched to the ZIP and thus the<br>
>> state is only stored once. The address book then only needs to point to<br>
>> the ZIP table and the ZIP table maps to the state so you can recreate<br>
>> the state information without having to have a "state" column in the<br>
>> address book.<br>
>><br>
>> By example:<br>
>> ZIP_State_Table:<br>
>> ID ZIP State<br>
>> 1 12345 1<br>
>> 2 12346 1<br>
>> 3 23456 2<br>
>> 4 23457 2<br>
>><br>
>> State_Table:<br>
>> ID State<br>
>> 1 IL<br>
>> 2 OH<br>
>><br>
>> Now the address book only needs:<br>
>> Name, Street Address, City, ZIP_ID<br>
>><br>
>> Note that the state field is not required. You get the state by<br>
>> chaining the three tables together. And the ZIP doesn't have to be<br>
>> explicitly stored, just the ID (so that other postal codes could be used<br>
>> later such as Canadian). You could extend this to a table of Cities<br>
>> since ZIPs encode onto cities and that would eliminate the need to store<br>
>> the city in the address book. This is getting closer to Four and Five<br>
>> NF since you're abstracting a lot of data and really paring down the<br>
>> original table. The reality is that most places like Amazon do exactly<br>
>> this. You don't need to store everyone's city over and over (and waste<br>
>> the space doing so) because you know every city. Just record them once<br>
>> and point to them.<br>
>><br>
>> The other part of NF is where pointers should go. This is less an<br>
>> atomicity of data and more of a logical structuring. For your database,<br>
>> all animals need food so the animal table should really carry the food.<br>
>> The food table itself is only the NF conversion to minimize data<br>
>> replication. Since foods can be eaten by more than one animal, the<br>
>> current arrangement of your example tables (with animal ID in the food<br>
>> table) causes unnecessary replication of data (a food gets repeated)<br>
>> which goes against NF rules. The map table I suggested goes up another<br>
>> level of NF because it avoids replication of data in the animal table,<br>
>> too, since one animal can eat more than one food. But, of the two<br>
>> evils, replicating the animal was the lesser evil because it's more<br>
>> logical in a database sense compared to replicating food mainly because<br>
>> the food is more likely to be replicated with greater frequency than the<br>
>> animals (any individual animal might eat less than five food items but<br>
>> any food item can potentially be eaten by an entire zoo's population so<br>
>> you have many more potential repetitions of the food than the animal).[1]<br>
>><br>
>> In the end there's no real limit to how far you should go in terms of NF<br>
>> other than whether it makes implementing the database much harder than<br>
>> needed for the task. For Amazon, high level NF is crucial otherwise<br>
>> they'll burn through disk space in a hurry. For a small DB, maybe not<br>
>> critical at all. My personal address book might only be 50 entries so<br>
>> do I need to spend the time doing it? Not really, there's no savings<br>
>> for me except that it's just good practice.<br>
>><br>
>> NF is not hard and fast but it's very useful for making databases work<br>
>> better. It also allows you to eventually slice and dice your data in<br>
>> ways that might otherwise be very difficult to achieve with flat or low<br>
>> NF databases. It's harder to create an example of this since easy to<br>
>> type and follow database structures don't lend themselves to difficult<br>
>> analysis (by design). But big research databases are a prime place<br>
>> where this happens. A researcher may piece together data in various<br>
>> ways to extract trends but this would be difficult to do if the database<br>
>> was a monolithic table or otherwise used a very low order NF that didn't<br>
>> break out all the data as much as possible.<br>
>><br>
>><br>
>> [1] Another way to visualize this is to switch the animals to zoo<br>
>> visitors (humans) and the food that the visitors eat at the zoo (which<br>
>> is usually a limited set of park fare like hot dogs, hamburgers, chicken<br>
>> wings, soda, etc.). If you had to track every visitor and what they ate<br>
>> for lunch, are you more likely to put the pointer to food in the visitor<br>
>> table or a pointer to the visitor in the food table? If you put the<br>
>> food pointer in the visitor table, then each visitor might have two or<br>
>> three records (one visitor bought a soda, a hot dog, and a soft pretzel<br>
>> so three records for John Doe). The food table here would still only<br>
>> have the few items you sell at the zoo (hot dog, hamburger, chicken,<br>
>> soda, water, pretzel, ice cream.)<br>
>><br>
>> But how many of those visitors at a busy zoo bought a hot dog? If the<br>
>> pointer is reversed and your food table points at a visitor then your<br>
>> food table might suddenly have 10,000 hot dog entries.<br>
>><br>
>> If your zoo gets 100,000 people a day (big zoo) and every person buys<br>
>> three food items on average then your day's table of visitors will be<br>
>> 300,000 records and the food table has just 7 using the first<br>
>> arrangement (visitor points to food). If you reverse the pointer (food<br>
>> points to visitor), your visitor table is 100,000 records (every<br>
>> visitor) and each item in your food table might have been purchased by a<br>
>> third of the visitors for a total of over 230,000 records.<br>
>><br>
>> The final tally, visitor points to food is 300,007 records and food<br>
>> points to visitor is 330,000 records. Net savings of 30,000 records.<br>
>><br>
>><br>
>> On 2015-03-14 14:34, David Jackson wrote:<br>
>>> Thanks Alex! Wow. Very impressive analysis!<br>
>>><br>
>>> This has been for an assignment, so I couldn't change the tables, but<br>
>> your<br>
>>> way of gluing the info together makes sense to me, so thanks! I think<br>
>>> they're just trying to keep it simple for the class<br>
>>><br>
>>> It looks like for MySQL you want to "atomize" the data as much as<br>
>> possible<br>
>>> to keep the structure out of the way of possible relations among the<br>
>>> data?? I guess it's a judgment call to not get too "atomic" though, huh?<br>
>>><br>
>>> Dave<br>
>>><br>
>>><br>
>>> On Fri, Mar 13, 2015 at 5:55 PM, Alex Carver <<a href="mailto:agcarver%2Bale@acarver.net">agcarver+ale@acarver.net</a>><br>
>>> wrote:<br>
>>><br>
>>>> There are four JOINs available to you, LEFT, RIGHT, INNER (this one is<br>
>>>> usually just the unadorned JOIN keyword) and OUTER.<br>
>>>><br>
>>>> To see if you have an animal with no food and given that your food table<br>
>>>> is on the right of the animal table in the query you offered you need to<br>
>>>> use a LEFT JOIN. The LEFT JOIN ensures that the left table takes<br>
>>>> precedence during row generation such that missing matches result in<br>
>>>> NULLs in the fields from the right-side tables:<br>
>>>><br>
>>>> So change your query to:<br>
>>>><br>
>>>> SELECT id, name, family, feed FROM animal LEFT JOIN food ON <a href="http://animal.id" target="_blank">animal.id</a> =<br>
>>>> food.animalid;<br>
>>>><br>
>>>> That gives you everything. Then add a WHERE clause to filter out<br>
>>>> animals missing food:<br>
>>>><br>
>>>> SELECT id, name, family, feed FROM animal LEFT JOIN food ON <a href="http://animal.id" target="_blank">animal.id</a> =<br>
>>>> food.animalid WHERE feed IS NULL;<br>
>>>><br>
>>>> The unadorned JOIN (which is a INNER or STRAIGHT JOIN) will only show<br>
>>>> records where there are matches in all joined tables. The RIGHT JOIN<br>
>>>> variant will reverse the precedence of the tables (LEFT and RIGHT JOINs<br>
>>>> work the same, the precedence is reversed and it's mainly used when you<br>
>>>> join many tables so that you get the precedence correct for the needed<br>
>>>> query).<br>
>>>><br>
>>>> On an aside, the table as constructed isn't quite proper according to<br>
>>>> the rules of Normal Form (how databases are laid out). Your food table<br>
>>>> should not have an animal ID. Instead your animal should have a food ID<br>
>>>> if it can only eat one type of food. Alternatively there should be a<br>
>>>> third table which maps animal ID to food ID that way an animal can eat<br>
>>>> more than one food.<br>
>>>><br>
>>>> Two table variant:<br>
>>>> Animal: {id, name, family, weight, foodid}<br>
>>>> Food: {id, feed}<br>
>>>><br>
>>>> Three table variant:<br>
>>>> Animal: {id, name, family, weight}<br>
>>>> Food: {id, feed}<br>
>>>> AnimalFoodMap: {animalid, foodid}<br>
>>>><br>
>>>> In the three table variant you join the animal and food tables using the<br>
>>>> AnimalFoodMap as "glue":<br>
>>>><br>
>>>> SELECT name, family, weight, feed FROM Animal LEFT JOIN AnimalFoodMap ON<br>
>>>> <a href="http://animal.id" target="_blank">animal.id</a> = AnimalFoodMap.animalid LEFT JOIN Food ON<br>
>>>> AnimalFoodMap.foodid = Food.id;<br>
>>>><br>
>>>> If an animal has more than one food type, the animal is listed once in<br>
>>>> the Animal table, the foods are each listed only once in the Food table<br>
>>>> and the AnimalFoodMap table contains the appropriate pairings. Your<br>
>>>> example has more than one food per animal and also one repeated food so<br>
>>>> let's modify it:<br>
>>>><br>
>>>> Animal (table):<br>
>>>> id name family weight<br>
>>>> 1 Seymore Snake 10<br>
>>>> 2 Gerard Giraffe 120<br>
>>>><br>
>>>> Food (table):<br>
>>>> id feed<br>
>>>> 1 mice<br>
>>>> 2 leaves<br>
>>>> 3 grass<br>
>>>><br>
>>>> Now the magic AnimalMapTable:<br>
>>>> animalid foodid<br>
>>>> 1 1<br>
>>>> 1 2<br>
>>>> 2 2<br>
>>>> 2 3<br>
>>>><br>
>>>> (Note that I'm skipping over some things such as keys that should be<br>
>>>> used to ensure there are no duplicates)<br>
>>>><br>
>>>> After doing this there are several additional ways to write up queries<br>
>>>> to make an output similar to what you have (single line for an animal<br>
>>>> plus all the entries for the foods) which requires some more complicated<br>
>>>> query structure. But for now, the query I wrote above would output:<br>
>>>><br>
>>>> id name family feed<br>
>>>> 1 Seymore Snake mice<br>
>>>> 1 Seymore Snake leaves<br>
>>>> 2 Gerard Giraffe leaves<br>
>>>> 2 Gerard Giraffe grass<br>
>>>><br>
>>>><br>
>>>> On 2015-03-13 10:52, David S Jackson wrote:<br>
>>>>> Hi,<br>
>>>>><br>
>>>>> I'm a MySQL newbie. I'm just starting to use Python to talk to MySQL<br>
>>>>> databases, but first I need to understand the MySQL query language!<br>
>>>>><br>
>>>>><br>
>>>>> So, I have two tables: (zoo) animal and food.<br>
>>>>><br>
>>>>> animal has the columns ID, NAME, FAMILY, WEIGHT<br>
>>>>><br>
>>>>> food has ID, ANIMALID, FEED<br>
>>>>><br>
>>>>><br>
>>>>> So if I go: select id, name, family, feed from animal JOIN food ON<br>
>>>>> <a href="http://animal.id" target="_blank">animal.id</a>=food.animalid;<br>
>>>>><br>
>>>>><br>
>>>>> I get something like:<br>
>>>>><br>
>>>>> ID NAME FAMILY FEED<br>
>>>>> 1 Seymore Snake mice, leaves<br>
>>>>> 2 Gerard Giraffe leaves, grass<br>
>>>>> ...<br>
>>>>> etc<br>
>>>>><br>
>>>>> So, if I have a situation where I want to see whether an animal has<br>
>> been<br>
>>>>> added the animal table but may have not been added to the food table,<br>
>>>>> how would I compose that query?<br>
>>>>><br>
>>>>> That is, the animal could have been added to the animal table and could<br>
>>>>> get an id, say 10, but the food table could have 20 foods already<br>
>>>>> entered and the animal-id would be used several times.<br>
>>>>><br>
>>>>> I was thinking, is there a way I can ask, "does the highest <a href="http://animal.id" target="_blank">animal.id</a><br>
>>>>> number equal the highest food.animalid number"? If not, what animal is<br>
>>>>> not getting fed?<br>
<br>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</div></div></blockquote></div><br></div>