t+1 - Latest Comments in Some research on generic/EAV tables

Re: Some research on generic/EAV tables

meow mix coupons — Wed, 09 Mar 2011 13:58:38 -0000

Very nice, I'm gonna try some of these, i don't know if they will work with me as i with to, but i'm gonna give this a try.
...

Re: Some research on generic/EAV tables

Matt Wilson — Wed, 06 May 2009 15:55:50 -0000

"just denormalise all your data"

Yeah, that seems to be the heart of big table, couch DB, and simple DB
are all about. Saying "just denormalize" makes it seem like
converting a database from SQL to something like a gigantic
spreadsheet is no big deal. But for a serious project, in order to
make sure that my stuff still works, I'd need to implement all sorts
of constraints and checks in my application layer. By the time I
finish, I will probably have written a crappy subset of SQL.

On the other hand, I think there are a lot of projects that can use a
simple denormalized database very well. There's room in the ecosystem
for all sorts of different critters, but I don't think one can easily
be exchanged for another.

I'll take a look at that PDF.

Re: Some research on generic/EAV tables

Mike A — Wed, 06 May 2009 15:28:59 -0000

Oops, also, 6NF joins are constant time (not linear) for constant number of rows, bad mistake. (for more on this, see page 29 on http://www.anchormodeling.c...

I'm glad you enjoyed the post. I got interested in the EAV/relational debate after going to a Google Devfest where they explained AppEngine and BigTable. Now, AppEngine is cool, but I remember distinctly a point in the presentation where the explained BigTable and someone at the back asked a question "How do we do joins?". The guy from the Google team replied "Ah, well, ok, you need to rewrite your app so there are no joins - just denormalise all your data".

There was an audible 5 second stony silence as people exchanged anxious glances, including from the Google team.

From talking to people over the next few weeks, I realised that, at that moment, I'd guess about 30% of people in the room realised that the Emperor had no clothes (one guy actually walked out), the rest somehow managed to convince themselves that It's Google So It Must Be Good™.

And I just think that's not right.

Re: Some research on generic/EAV tables

Mike A — Wed, 06 May 2009 10:25:38 -0000

Sorry, I meant to say: 'Migration is also much easier, as you can drop and add tables, without having to change code.'

Re: Some research on generic/EAV tables

Matt Wilson — Wed, 06 May 2009 10:15:51 -0000

Hi Mike, You wrote a lot of interesting stuff. I've never looked at 6NF but I'm going to now. Am reading a book "SQL for Smarties" and the author spends a lot of time referring back to Codd's original work.

Thanks for the post! I'll maybe have some intelligent things to say after I do my homework.

Re: Some research on generic/EAV tables

Mike A — Wed, 06 May 2009 09:58:25 -0000

Seems to me that the problem is that SQL was never really 'relational'.

The massive warts on the face of SQL is the NULLs. These things get everywhere and distort application logic into special casing them. Hack on top of ugly hack. We are forced to live with it as an accepted religion. Ever nullable column in a table represents a missing table, on average perhaps a hundred lines of hacky conditionals, weeks of programmer time already wasted, and months on maintenance issues ahead.

See, what happened after SQL was that common programmers discovered the normal forms, but of course, these being wrapped in all that academic, mathematical nonsense, the masses never really got them, and so they settled on ideas like '3NF is good enough' and 'sometimes you have to denormalise'.

The fact is, that even 5NF is not good enough for any serious application. The only decent policy, and one that data warehouses are starting to realise, is 6NF throughout (not DKNF). This gives you all of the flexibility benefits of EAV (adding an attribute means just adding a new table - none of your existing code needs to change), all of the benefits of 5NF and lower (real data integrity), and as added bonus, you never have to worry about NULLs (there are none), and you can historize your data (turning your database into revision system for free). Migration is also much easier, as you can d.

As for JOIN performance, think about this - we know EAV sucks for performance as described above. Relational joins can also suck - but not if they are in 6NF. That's because, disregarding the number of rows, the time taken to join two tables, not to mention the sorting, filtering etc. is proportional to the number of attributes in the tables, which is a massive waste of time if you have 30 attributes in each table but only need 2 or 3 of them for the query, which is often the case.

Because of this, developers and DBAs become disillusioned with their carefully 5NF databases, and hence they are vulnerable to EAV.

6NF, on the other hand, does linear time joins, no matter how many attributes there are in the schema (as you only join what you need). That really matters when you have massive amounts of data, and is why such snake oil as EAV databases like BigTable are doomed. The advent of SSDs and cheap RAM will only make this more pertinent.

If only people had actually listened to Codd.

Check out anchor modelling for an example (although the naming scheme is a waste of time).

Re: Some research on generic/EAV tables

GB — Wed, 22 Oct 2008 10:03:19 -0000

True.
But this is also the downfall of databases.
How have you solved your initial problem? You still can not ask your original schema to tell you all about a person's preferences without knowing all the tables that exist beforehand. Which means everytime you come up with a new preference table, you have to add it to queries requiring to look at all preferences.
Databases do not seem to be good at managing information of this kind.

Re: Some research on generic/EAV tables

Matt Wilson — Wed, 15 Oct 2008 12:01:32 -0000

Hi Doug, thanks for the comment. After I wrote this up I saw this pretty good EAV implementation here:

http://permalink.gmane.org/...

It's flexible and it lets me use constraints to make sure nothing silly happens. I bet it is probably slower performance-wise than the really inflexible approach of keeping every preference in a separate table, and I don't see how I could store information about the nature of the preferences, like which preference outranks another.

I've had bad luck with relying on the UI to protect the integrity of the data. A lot of times, I end up with numerous different interfaces; a command-line deal, some web app stuff, and so forth.

I know it works for others, but I've been bitten more than once by not locking down my database against silly users.

Anyhow, I've been trying to learn couchdb in my very scarce free time. It's a completely different take on databases -- every object has its own set of attributes, but there's no guarantee that any two objects share anything. Seems like the antithesis of the relational model. So it probably makes a much better fit for some things.

Completely unrelated -- you a programmer now too? How much does the fact that you learned stuff like SAS influence how you program now? Looking back on the Fed, I can't believe they would just scoop up complete novice programmers, and then just expect them to navigate and improve that code, and in general, it worked OK.

Re: Some research on generic/EAV tables

Doug R — Tue, 14 Oct 2008 18:08:02 -0000

I don't know, Matt, there's something to be said for Entity-Attribute-Value, if it's used judiciously. Adding a new column to a table every time you want some sort of new behavior in your application is just not a sustainable way to program.

What I like to do is have first-class properties (i.e. database columns) on my model objects if a) I know every single object of this type will need this to be valued, or b) I am concerned about data integrity, or c) I plan to query this information frequently. Then for other things that may not be defined for every instance of that type, and data integrity doesn't matter, use attributes. For example, in a standard CMS application, your Page object might have a column for "page_url" since every page needs to have a URL. But I might use an attribute record for something like "main image caption", since not every page necessarily has an image, and if the caption is missing or just some random text I don't care. Now when you add a second image to the page, and a second caption, you don't need to add a new column, you just add a new record.

Tom's example seems completely ridiculous, because everybody has first name and last name and birthday...What he advocates is adding a "<couple of="" flex="" fields="" for="" oddities="">" So now you've got data in these extra fields, except you have a fixed number of them, and there's no context around what data is actually in those fields, which means you still can't query it easily, and you still can't enforce data integrity. I prefer attributes to this mess.

To address your problem of people putting in preferences for non-existent shifts, this sounds like a user-interface problem, not a database design problem. If you are going to let them type in any old thing, then of course you can't have foreign keys, and will have to do application level validation.

Re: Some research on generic/EAV tables

Mac — Mon, 13 Oct 2008 03:03:38 -0000

Matt,

Its good we have diverse databases to work with. The fact that RDBMS gives you the needed performance (except Join query) and no matter what Tom says, Just ask a developer what its like redesigning requirement change at later stage of developments.

The EAV model is definitely much more flexible and gives the maximum freedom to the developers. And removes the join query as there are no two tables to join.

I think you need to check schemaless semantic database which is packed with Brainwave Platform a complete development and deployment suite mostly built in python.

-Mac