Making a Drupal Folksonomy Tag Cloud

kentbye's picture
| | | |

I created a tag cloud for my website, and I'd like to see this feature added as a dynamic Drupal module. I thought I'd briefly go through the steps that I went through to give a leg up for anyone who wants to code this up in PHP.

The hardest part is the algorithm that automatically determines the distribution of font sizes based upon the frequency distribution of tags.

Below is my distribution that I used to determine the font sizes:

[inline:ECPtags.jpg=Distrubution of Tag Frequency]

Notice that my tag distribution exhibits some Power Law behavior of the Long-Tail of the Internet

More technical details below...

The first step to creating a tag cloud in Drupal is to use Morbus Iff's free-tagging patch to go through and tag all of your archived blog posts. This saves your entered folksonomy free tags as regular taxonomy terms.

UPDATE: Morbus Iff writes, "My freetagging is currently in Drupal HEAD (along with AJAX support, which makes it amazingly useful) - and I backported it to Drupal 4.6 (but not 4.6.1; that's what you've linked to in your HOWTO)."

I then went into PHPMyAdmin to hack into two MySQL databases within Drupal: "term_data" & "term_node"

I exported all of the data into a CSV file so that I could import it into an XL spreadsheet.

The data in "term_data" are used to correlate the folksonomy tag "name" with the "tid"

The "vid" variable also in "term_data" is the vocabulary id that can be used to isolate groups of terms into separate tag clouds. In my case, my "Folksonomy Tags" vocabulary "vid" was 4.

The data in "term_node" are used to count the total number of occurences of a folksonomy tag "tid" across the entire site. I order the data according to tid, and created a counter in XL using four total columns

Columns
A = "nid" = data from term_node
B = "tid" = data from term_node
C = "0" = I copied "=IF(B2=B3,1,0)" into the entire column to give a flag of 0 at the last value of each tid
D = Count = I copied "=IF(C1=0,1,D1+1)" to count the total number of occurences of each tid

I imagine these counts could easily be coded up in PHP.

I copied the values of 0 (Column D), tid (Column B) & Count (Column D) into a separate column and sorted in the order of 0, tid, & Count

This gives you the total number of occurences of each folksonomy tag "tid."

This frequency will determine the relative font sizes of the tag "name" in the tag cloud

The next step is to correlated the tid number with the tag "name" using the data from term_data.

The hardest step for dynamically automating this into a Drupal module is determining how to automate the font size distribution based upon the frequency of tags.

I just plotted my tag distribution in XL and eyed it

[inline:ECPtags.jpg=Distrubution of Tag Frequency]

I qualitatively determined that I could break up the distribution by dividing the tag frequency total by 10 -- round down -- and then add 1.

For example, the New Media tag occurs 53 times.
FONT SIZE = round(53/10) + 1
FONT SIZE = 6

Collaboration = 41 times = font size 5
Transparency = 30 times = font size 4
Open Source = 23 times = font size 3
Folksonomy = 15 times = font size 2
del.icio.us = 6 times = font size 1

This algorithm doesn't work in all cases, but it works for now.

UPDATE: I figured out a more universal way to break up the distribution. I'll post more about it after I finish the tweaking and graphing it.

The Drupal font size range seems to be from 1 to 7 which gives 7 possible sections.

I'll have to see if any solution pops to mind, but I think I'll pass the baton to someone with a computer science background to figure it out and code it up in PHP for the whole Drupal Community.

UPDATE 6-20-05: I've come up with an algorithm for a more even distribution of font sizes. More tomorrow.

This type of dynamic tag cloud aggregator could provide a very helpful organizational and navigational tool for Drupal sites.

UPDATE: Morbus Iff pointed me towards tagadelic, and I included an update in the previous post, but not this one.

It also appears as though Bèr Kessels, who is listed as the developer of the tagadelic module, just left an anonymous comment down below pointing me towards tagadelic, and I will definitely follow-up with him to see if we can't update the algorithm to show a dynamic range of font sizes correlated to the frequency -- and get a more even distribution by taking the logarithm of the frequency since it seems as though most folksonomy distributions exhibit Power Law behavior.

That is awesome Kent! We

That is awesome Kent! We just did it over on our main blog too, http://www.developmentseed.org/blog/. We are using the 'tag cloud' in the side bar as our main navigation, http://www.developmentseed.org/blog/poptags. We are using term_popular.module http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/frjo/term_popular.module... it is great - Ian added the block.

kentbye's picture

Will Look into Popular Module Too

It appears as though the developer of tagadelic -- another tag cloud-like Drupal module -- also just left a comment above

I going to bounce my algorithm off of him after I get finished tweaking it.

I'll have to look into the term_popular.module module as well -- It looks as though they have implemented it into a block module which is how I've envisioned how I'd have a tag cloud implemented.

Back to work on the algorithm.

tagadelic

There already was a module that did what you want. So maybe you want to improve the algorythm for generating fontsizes in there? Maybe we should collaborate to get one goodmodule, instead of two less good ones?

tagadelic page on webschuur.com
tagadelic page on drupal.org
tagadelic example

kentbye's picture

YES! to tag cloud collaboration

Maybe we should collaborate to get one goodmodule, instead of two less good ones?

Definitely! Although I don't know any PHP -- I can only come up with the algorithm that does it. I'll pass it along to you and see what we can come up with.

I figured out a pretty good algorithm for evenly distributing the font sizes for many more different cases -- I'm going to try it out in Microsoft XL first and then post some updated graphs.

I'm definitely willing to pass along the algorithms for you to code up into your tagadelic module.

I'll get in touch with you again after I post it.

By the way, Morbus Iff pointed me out to tagadelic and I actually updated my previous post with a link and should've updated this one as well.

I'll update the post above as well with tagadelic links.

I am the "anonymous"

Hello.

Indeed I am the anonymous poster in the abovementioned comment.

I just updated the APIs for tagadelic. It now gives a lot of flexibility for generating tag pages. With url schemes similar to taxonomy (/tagadelic/chunk/1,2,5) you can compile nice pages.

About term popular: Can we incorporate our code into one project? I think, that if we use some of your algorythms, a lock from term popular and my module, we are all setteld. And then we have one working module, instead of two modules in some pre-beta state.

kentbye's picture

Tagcloud Follow-up

Hey Bèr,
I actually didn't write any of the term popular module.
Here's the CVS link it:
http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/frjo/#dirlist

I did however write an algorithm for the font distribution aspects of the tag cloud -- and I have a few flowcharts for making personalized tagclouds.

I'll follow-up with an e-mail to you.