Tuesday, November 25, 2008

Semantic Gifts

The holidays are coming right up, and for many the season means a perennial challenge selecting the right gift for family, loved ones, and secret santas. Well fret no longer because there is a new way to find the perfect idea, Semantic Gifts (http://semanticgifts.com).

Here's how to use it:

Step 1: Select whatever information you have about their Facebook, Twitter, FriendFeed, or other public profile.
Step 2: Get gift ideas chosen for their unique interests.

Like any good oracle if the recommendations miss you can always try again by selecting "More please." The app will return gift ideas until it runs out of picks it feels enthusiastic about.

Gifts are chosen using an algorithm that finds matches using two approaches. First a natural language processing filter analyzes the things your friend has said and matches the concepts they've discussed to gifts associated with similar topics. Second, a topic modeler looks at everything they've said and creates a profile, making some guesses about what they might be like. We can then recommend gifts that we've identified for people with similar overall interests. It's this combination of "what are they like" as well as "what interests can we determine they have" that can give us a good idea about what gift they'd enjoy.

Semantic Gifts is the first consumer-facing app using the analytics suite created over the past few months under the Morning Set banner. The engine could be described as a concept mining/information retrieval program with a heavy focus on NLP using "dirty" sources like RSS and micro blog feeds, so gift recommendation is a perfect fit for the technology. The two big challenges that Semantic Gifts posed for the app were the tendency to overfit because of the often small input size, and the need to perform the analysis very quickly.

The first problem was solved by creating tiers of analysis intensity based on how much information our crawler is able to extract. Very small and very large inputs are worked over lightly, though for opposite reasons - it's easy to draw too many conclusions when you have little to work with, and you don't need to work very hard at all when they've said a lot. Fortunately the system is tuned in such a way that average blog and twitter feed sizes fall right in between.

The second problem was speed. Especially when you start to re-index statistical machine learning models at a corpus large enough to be useful, you can blow the budget of a bootstrapped startup just on EC2 processor spikes. When developing the analytics suite this problem was avoided by handling it asynchronously. Give me your data today, then come back tomorrow for your model. For Semantic Gifts that clearly wouldn't work, so we looked for components of the suite which would give us the most bang for the processor buck and also ways to "pre-bake" as much as possible. Though there is a substantial laundry list of things to add the compromise appears successful so far.

Anyway I hope Semantic Gifts is as fun to use as it was to make, and I look forward to getting some good feedback.

0 Comments:

Post a Comment

<< Home