Subject Re: [firebird-support] Design Question: Storing and using generic text "tags"
Author Helen Borrie
At 02:47 PM 26/06/2009, you wrote:
>I'll be adding a tags system to an existing application so that users can
>enter one or more tags to help search and categorize various items, similar
>to what photo management software or blogs do.
>
>The user will be able to: a) type a comma separated tag list when
>adding/updating items, b) perform a search based on the entry of one or more
>tags (and/or), and c) generate a report detailing or summarizing items "by
>tag".
>
>Ideally the user should not be limited to the number of tags per item but a
>reasonable limit would be ok. Ideally the grunt work should be possible
>within the DATABASE and not rely on complex code in the client app. Ideally
>it should be easy for someone to add new items with tags directly using a
>database tool rather than the application.
>
>e.g.
>Item1: tags=family,holiday
>Item2: tags=holiday,skiing
>Item3: tags=skiing,simon,funny
>
>Search "family or simon" would return Item1 and Item2.
>Report by tag might list:
>family
> Item1
>funny
> Item3
>holiday
> Item1
> Item2
>simon
> Item3
>skiing
> Item2
> Item3
>
>I figure that there are three main ways to implement this:
>
>1) Have a single TAGS column on the ITEM table containing comma separated
>tags.
> Store as-typed tags in this column.
> Use LIKE for searching (complex for and/or situations)
> For the report fetch distinct TAGS, pull apart the individual tags and
>remove dups, formulate a complex query to pull back the items in the
>appropriate order, OR pull in all ITEM rows and post-process on the client.
> This might be ok if the user was forced to enter at least one and up to a
>small number of tags to limit the extent of the report.
> Perhaps the LIST function added in FB 2.1 would help pull the tags apart.
>
>2) Have a fixed number of TAGn columns (TAG1, TAG2, ...) on the ITEM table,
>each storing an individual tag.
> Restrict number of tags that can be entered. Pull apart tags on entry and
>populate in separate columns.
> Use exact matching on each TAGn column for searching (again complex for
>and/or situations)
> For the report fetch distinct list of values in the TAGn columns,
>formulate a complex query to pull back the items in the appropriate order.
>
>3) Have a separate TAG table with a many-to-many mapping table to reference
>ITEMs.
> Pull apart tags on entry and store new tags in separate TAG rows, create
>mapping row for each tag.
> Use exact matching on TAG rows for searching, join to ITEM table.
> For the report simply join ITEM to TAG through mapping, order by TAG.
>
>Design 1) is a less complex table design and would make it easy to add items
>using a database tool (simple INSERT) but the searching and reporting would
>be complex and likely slow.

Yea, Lord! Arrays, lists and such are anathematic to relational databases. As you already note, get the list in there is the easy part. Updating it? yech! Searching on it? yech!

>Design 3) makes it easy to search/report but more complex to add items.

What's complex about it? For keyword searching I use two tables: one to hold the keywords (which I pump in, all upper case, from a linked list in the application, composed from user selections, user input or some routine that dumbly goes through and picks out all words except those in an exception list. The other table holds an intersection for keywords and documents, with FKs to the two tables and a unique constraint on the keyword and the document ID. (Actually, I'm working on just such a project right now.)

I have an executable SP to which I send a keyword and a document id as arguments. It simply inserts a record in the intersection table. Done. It doesn't have to be a SP though: it's a drop-dead-simple parameterised DSQL statement.

>Perhaps a stored proc could be used to add items,
>passing all ITEM columns and a comma-separated tags string and the stored
>proc could pull apart the tags and create the TAG rows and mapping rows
>after inserting the ITEM. I don't like design 2) but it may be a reasonable
>compromise.

Train-wreck.

>Is it easy to process a comma-separated string passed as a parameter to a
>stored proc? E.g. Split into separate values, strip leading/trailing
>whitespace. Any simple non-udf solution in FB1.5.4 or is FB2.1 or other
>required?

It's not rocket science; but it will be harder to do for non-ansi character sets. But in a relational database, who needs it? It's a pain to have to parse every time you want something from it. And who wants that redundancy, anyway? And why blow away the benefit of knowing about the commonalities of owners that share tags?

>Has anyone implemented such a tag system before? How did you do it?

See above. It wins over anything else I've concocted (including parsing lists!). It's fast. It's consistent, ACID-wise. And, when your customer asks, "Is there some way I can use the tags to figure out the main interest groups?" you have a ready answer and need only write the client code, report, or whatever.

./heLen