-
Sunday, 31 July 2005
-
Collaborative filtering is hard
Collaborative filtering is a technique commonly used on large shopping sites to return products a user may be interested in based on their previous product ratings or purchases. There are two key approaches to doing this:
- Figure out which items are similar to other items, then present all the highly rated items that are similar to my highly rated items;
- Figure out which users are similar in taste to other users, then present all items highly rated by similar users.
There is a fascinating report discussing Amazon.com's technique. The report talks about taking the cosine of a multi-dimensional vector in order to measure users' similarity. Amazon.com uses the first approach listed above, gauging item similarity based on which items are often purchased together. The first approach works well when a user has purchased very few items, which is of course important as all customers start out as brand new customers.
I am working on a rebuild of my tramping site which uses a collaborative filtering engine based on method two (and written, I should add, in the days when there were no reports on how to do it online). But I'm planning to switch to method 1. The question is how one gauges object similarity.
Well, one sneaky approach is to let users do it for you. Instead of working out some obscure formula that might need to be different for different types of object, how about simply watching users' search results. Objects that show up together in search results are by definition similar in some way. For example, I might search for "routeburn" or run an advanced search for 2-day easy tracks near Christchurch. Of course the degree of similarity depends on the amount of results returned. If there are a lot then the likely similarity is low; if there are few, then the matches are likely very similar in some particular respect. And since an actual user is performing the search, then the aspect of similarity must (on average) be something significant to real users. Sure, I could search for all tracks with a "K" in the name, but it's not too likely that many users will do that.
So in this technique we gauge object similarity based not on items that are purchased together, but on items that are found in search results together. I have no idea whether this will work or not! I hope it will.
One concern I have is that it will be biased toward old objects, as the objects will simply have been exposed to more searches. Perhaps all the objects that fail to match have their score degraded by a certain amount swo that the average score remains 0. Another idea might be to expire all the scores on a regular basis (keeping the old scores for a little while to help with matches, but not updating them).
Unfortunately with this technique there is one record for each pair of objects: I have about 700 objects, so there will be about 250000 scores. This is a concern but inevitable, I think. Generating these empty records is time consuming, and I'm wondering if I can do this on-the-fly. When a search is performed, it will only take one SQL query to update the product scores, which I'm hoping will be reasonably fast -- especially as I probably won't bother running it at all if too many search results are returned.
-
Wednesday, 27 July 2005
-
past and future
I've lost a lot in my life recently. I won't make a list because that would be tacky. Big things and little things, let's say. (Not much of my life seeps into my weblog these days.) And even if those changes and losses are 100% the direct result of decisions I've made, and they are, it's still hard to see possibilities become impossibilities. To see potential futures turn into dust and pop out of existence like one of Buffy's vanquished vampires.
I was for no particular reason poking around in my comment quarantine yesterday, which is where comments stay until you reply to the email that verifies your email address. It was interesting to see quite a number of rather malevolent comments in there. Comments from obnoxious strangers, comments from people who have known people I've known. I'm far from perfect, and sometimes I think it is the stupidest thing in the world to write anything personal at all online where people can quote your foolishness back to you and forward it to their friends. I know it sounds like an imaginary universe that I'm the centre of, but my comment quarantine is actually a bit like that, or at least it was, plus a whole lot of comment spam. Thank goodness that in SQL there is an inverse relationship between the power of a query and its length. You can empty a database table in seconds. To delete just a few records takes longer. I guess I should set my quarantine to delete old messages automatically so I can maintain a state of blissful ignorance.
The trouble with me is that I'm a funny combination of cautious risk-taker. I'm naturally cautious, but I compensate by daring myself to do this or that. I kinda need to or I'd never do anything. That's a gross oversimplification, but maybe tangentially related to the truth. Not the best way of dealing with the world, and perhaps a really effective way of hurting people.
Perhaps I'm just over-analysing. I don't think I'd hire me to do PR for myself, it must be said.
-
Monday, 25 July 2005
-
Yellow
Some 6000 years ago, after the last ice age, sea levels were higher. Sea cliffs were cut into the hillside on the southwest side of Banks Peninsula. Sea levels have since dropped, leaving flat land and cliffs and stacks far from the sea.
Today, I noticed one of these sea cliffs was littered with large white birds. But we don't have large white cliff birds. I pondered for a few seconds whether they were kotuku (white herons) or spoonbills but of course they weren't. Those aren't cliff birds. The crests gave them away: cockatoos. Like white spectres. Like a memorial. Perhaps fifteen or twenty scattered across the cliff. I think I'd heard that there was a floick of cockatoos living on the peninsula. How exciting to see it.
Some days are meant to be remembered, not lived. And some are best forgotten. Not sure what to think of this one. Not sure about much.
-
Tuesday, 19 July 2005
-
Chocolate orange Crème Brûlée
What is dessert if not a safe anchorage in the face of the storm? When things turn to custard, you may as well do it with a little flare, and a big blue flame.
For this recipe you need four half-cup ramekins (small, straight sided china dishes), a roasting dish, and a butane blowtorch. You can buy the blowtorch from a hardware store (don't tell them what you're planning to do with it) or you can spend more on the same thing at a cookshop. You may also need a cigarette lighter to ignite the blowtorch. The gas jet from mine blows matches out every time.
You'll also need 30g cooking chocolate, 1 cup (250mL) cream, 4 large eggs, 1/4 cup sugar, 3/4 cup milk, 1 tbsp (15mL) Grand Marnier or Cointreau, 1 tsp (5mL) orange zest, and some brown sugar.
Preheat your oven to 165°C. Heat 1/4 cup cream until just boiling. Pour over chocolate in a small bowl and allow to soften, then whisk. Whisk together 3 large egg yolks, half a whole egg, sugar, and the chocolate mixture. Heat remaining cream and milk until just boiling and whisk into egg mixture. Whisk in liqueur and orange zest. Skim any froth from the surface.
Divide the liquid between ramekins, and place ramekins in a roasting dish. Fill the dish with hot water to a level halfway up the sides of the ramekins. Bake in the middle of the oven for 40 minutes, until set but still with some movement. Be very careful moving the roasting dish as the water is boiling! Cool custards then cover and chill for several hours.
Immediately before serving, sift a generous coating of brown sugar over the custards. Assemble your guests, wave their eyebrows goodbye (or get them to stand back a little) and caramelise ruthlessly with your blowtorch (the desserts, not the guests).
Serve with a little fruit or a fruity sauce and a scoop of good ice cream.
Another simpler recipe is here. Personally, I prefer the heat to remain in the caramelised crust.
-
Thursday, 7 July 2005
-
Doctor Who
If you're an alien, how come you sound like you're from the North?
Lots of planets have a North!
Apparently the world ends next week. I hope they have the budget for it.
-
Wednesday, 6 July 2005
-
Appropriate topics for song lyrics
I've been flicking through the pages of Nick Hornby's 31 Songs recently. One chapter is devoted to a comparison of Aimee Mann's "I've had it" to Ani DiFranco's "You had time." Both are songs filled with resignation, and loosely addressing the life of a musician. Hornby says of Mann's song, "there's something a little troubling about the song's breathtaking melodic strength. Here's the thing: which came first, the tune or the words? Because if it was the tune, then that makes you wonder why Mann thought music that sublime was best served by her travails in music. Wasn't there a break-up that meant ths much to her, or a parent, or a childhood memory?"
Hornby goes on to suggest that love songs are the best as themes of love largely keep out of the way of what the music is doing: they don't spoil the music's "purity". He suggests that love is a "natural metaphor for music itself. . . . you can see straight through the words to the music."
I think Hornby's right. A love song expressing the same old nonsense isn't going to be remembered for its lyrics, but for its catchy tune.
But I think he's wrong too. Lyrics aren't pollution floating along on the waves of the tune. A song is a partnership between words and music, biased one way or the other. Perhaps rock and pop is often more tune-biased, while folk is more lyric-biased (the tunes are often tradional ones with freshly rolled lyrics). And there's bias in listeners too. I think individual listeners either exalt the music or they exalt the lyrics. I don't think either is wrong. Personally, I go for lyrics, and I believe that stories hide everywhere, not just in the realms of romance. There is no divine hierarchy of story topics. I think that it is a mark of Mann's talent that her lyrics tackle such diverse themes. Even her love songs are really more about human relationships than "pure" love. I like that. It's far more interesting.
-
Submarine daydreams
I was thinking earlier, as I watched an articulated truck easing out of a driveway onto the road outside my window, how it looked just like a fat eel sliding out of its hole. Then I realised I'd already imagined the road was a seascape once before. I wonder why that is. I guess my brain is full of well-trodden paths.
-
Metaphors and pointers
Daemons, wireless headphones, electrons, Reality and metaphor
In Northern Lights, each person shares their life with an animal called a "daemon" that is in some ways an embodiment of their soul. The daemon and the person live and die together and share thoughts and feelings while maintaining semi-independent existences. The daemon and the person experience deep physical and emotional pain when they are seperated by more than a few metres.
And the point of telling you all this is that I was thinking today, while I was pinching a JavaScript book from a coworker's desk, that wireless headphones are very similar. Wear them more that a couple of metres from the base unit and you experience severe audio-induced head pain. You want to be reunited with your seat, but you don't want to take them off. So you try these experimental sallies, reaching out with your limbs like half an octopus while keeping your head as close to your desk as possible, and trying to maintain a little poise (as a little is all you have) simultaneously.
I've often thought the world is really constructed out of metaphors and ideas, and the physical reality surrounding us is no more than a set of references to those ideas.
Take a sub-atomic particle as an example. We think there are countless electrons in the universe. What if there's really only one electron that is being referred to or mirrored countless times? Could we tell the difference? When we destroy an electron, are we really simply destroying the reference?
In computer science, you may copy a variable or simply create another reference to it. In the language I use most often, what happens when you "copy" a variable depends on what type of variable it is. If it's a simple one such as a number, a duplicate is made. If it's a complicated one such as a structure or an array (these are basically indexed lists of other variables) then the variable is really just a pointer where the data is stored in memory, and the pointer, not the data, is duplicated. The implication of this is that if the original variable value changes, the duplicate will change too.
And so that's how my philosophy is shaped by my work.
-
Tuesday, 5 July 2005
-
Oil, accounts, and high speed crashes
I just finished my tax return. It's been a long evening of wine, olive oil, dukkah, and arithmetic. Amazingly, the tax forms made it through unscathed. Through the wonders of inebriated accounting I only owe the government a few hundred more dollars.
I find tax forms frustrating as I'm just too busy to be wasting evenings on this soul-destroying garbage. Lucky for me then I don't have a soul. I do however have a low tolerance for tediousness.
Cheer up Matthew I say to myself. Nasa just crashed a space probe into a comet at 37000km/h, and they did it just for you. I'm sure that won't make sense in the rational light of morning, which is all the more reason why it makes perfect sense now.
-
Sunday, 3 July 2005
-
Geek
-
- A person regarded as foolish, inept, or clumsy.
- A person who is single-minded or accomplished in scientific or technical pursuits but is felt to be socially inept.
- A carnival performer whose show consists of bizarre acts, such as biting the head off a live chicken. [Source]
What a remarkable language we have. Personally, I find the word offensive, not for its circus origins, but just like I would any term used to attempt to pigeonhole me. Certainly I have limitations, but I overcome a few more every day.
-
When good botanists go bad, they start denying climate change based on disreputable sources. George Monbiot examines David Bellamy's claims that the glaciers are in fact advancing, not retreating, and finds that they lead back to a 16-year old journal paper that doesn't exist.
It is hard to convey just how selective you have to be to dismiss the evidence for climate change. You must climb over a mountain of evidence to pick up a crumb: a crumb which then disintegrates in the palm of your hand. You must ignore an entire canon of science, the statements of the world's most eminent scientific institutions, and thousands of papers published in the foremost scientific journals. You must, if you are David Bellamy, embrace instead the claims of an eccentric former architect, which are based on what appears to be a non-existent data set. And you must do all this while calling yourself a scientist.
|
|