How to compute someone’s Whuffie
June 23rd, 2008
Imagine yourself in a world where nanotechnology has made scarcity and the associated traditional form of money a thing of the past. In this world, the only currency is the goodwill that people give electronically to one another and everyone’s overall resulting reputation score is accessible by anyone in real-time. This reputation is Whuffie and the term and world was coined and imagined by Cory Doctorow in his sci-fi novel, Down and Out in the Magic Kingdom.
Fast rewind to present time. We are a world where people increasingly publish digitally their life i.e. are “life streaming”: they publish pictures, blog posts, twits, videos, wikis, etc. Other people subscribe to these life streams (RSS/friendfeed), give attention to the ones they find the most relevant and sometimes comment positively or negatively on these life stream items. These comments are themselves life streaming items and subject to views and positive/negative comments from others.
One thing is missing to get us closer to Cory’s vision: real-time computation of anyone’s Whuffie, the Web 2.0 equivalent of your FICO score. How do we compute it?
I have only found one blog post so far on the the problem of the so-called Whuffie algorithm, but I was not convinced by the arbitrary number of points won/lost for specific actions, and by the difficulty of implementing the tracking of some of these actions:
Trash talk somebody: -1000
For every conference you attend: +200 (Plus bonus +5 for each #tweet)
I know that Jeff Ward wrote that he was just posting for fun on this one, but since there seemed to be interest in the comments for an actual implementation, I decided tonight in BART to take a stab at what such algorithm would look like.
Here are the basic principles:
- The algorithm should take into account how many positive/negative comments or citations your life stream items have got from other people, weighted by the Whuffie score of each of these people.
- The use of the weight here is important as it allows to remove completely the arbitrary point amounts: for instance, instead of “For every conference you speak at: +10,000″, speaking at a conference would essentially be equivalent to posting a summary of your speaking engagement and have the conference organizers or the conference itself comment on it/cite you on their Web site, with the Whuffie value of the comment being a function of the Whuffie of the conference or conference organizers themselves.
- The positive/negative nature of the comment would be determined via semantic analysis or microformats votelinks or voting nanoformats (vote:for:this article, +1/-1).
- If the positive/negative nature of the comments cannot be determined, a positive Whuffie point amount of a lesser amount would be attributed, weighted by the Whuffie of the entity issuing the comment.
- If no comment is available, views should be used (# of time a video was viewed), agained weighted by the Whuffie of who viewed it if possible. Views should contribute less Whuffie points then comments.
- In all cases, for each item published a number of points should be provided multiplied by the number of followers the person/entity has on the site where the life stream item is posted on (# of subscribers to RSS feed, # of Twitter followers, # of Flickr contacts, etc.).
I don’t really have a precise idea of what these point amounts should be. Let’s say +10 for a positive comment, -10 for a negative comment, +5 for a comment, +3 for a view, and +1 for a published item.
Let’s also say that these points would be weighted by 1/100 of the Whuffie of the person commenting, viewing or following the publisher/life stream item. so, if my Whuffie is 1,000,000 and I view an image of someone, but do not comment on it, that gives 10,000 Whuffie points to the person who posted this image.
Of course this algorithm reduces the number of arbitrary constants to a few, but these are still arbitrary. So, the next question that came to my mind is whether there is a set of constant values that would be better than another, better for instance at achieving the goal of a Whuffie system.
What is such goal? do we want a bell curve distribution of Whuffie scores, a very spiky curve or a very flat curve. Do we want Whuffie to last indefinitely, or to self-destroy over time (with the objective of preventing social capital to be too concentrated among too few people). I think this is where I should have started, but that I will the subject of another post hopefully. In the meantime, I will get good ideas/suggestions from you.
Another interesting problem is how we fight spam and reputation hacking in such a system. I think one partial answer would be to allow Internet hosts to have their own Whuffie, and to use that as an additional weighting factor. Ideas here are welcome as well.
Using hAtom for pagination of microformatted content
June 1st, 2008
An interesting debate was started by André Luís last month on the microformats-discuss mailing list on the benefits of hAtom. I didn’t have time to read it in details at the time, but read the discussion today and here’s my summary.
For those not familiar with hAtom, it is an XHTML microformat for RSS-like feeds.
Great, but why would someone want to do that, given that blogging platforms already generate the RSS/Atom feed for you? Are there use cases for providing hAtom in addition to the Atom feed?
Zhang Zhen pointed to the WebSlice upcoming IE8 feature, which reuses hAtom syntax and will allow users to subscribe to a portion of a webpage. This pointer is interesting, but not quite exactly hAtom.
Toby Inkster mentioned that hAtom could help avoid the use of blogging software by essentially allowing the Atom feed to be generated by a service (but as André noted, this wasn’t really his question):
<link rel="service.feed" type="application/atom+xml" href="hatom2atom.php?uri=http://example.org/page.html" mce_href="hatom2atom.php?uri=http://example.org/page.html"/>
Brian Suda explained that hAtom could be used by Web crawlers to extract valuable metadata, and by browser plugins to provide a better user experience as a user is reading Web content.
While all these were valid benefits, the one that captured the attention of the group was the use of hAtom and rel='next' or rel='prev' for pagination of microformats, i.e. linking microformat entries listed on multiple pages together.
Let’s say you have a collection of hCalendar entries or hCard entries on your Web site, you could mark these up as hAtom and use a link between them, so that a microformat parser could navigate the site and generate a single collection of hCalendar and hCard entries.
What business model for decentralized social networks? decrypting Matt Mullenweg’s recent keynote
February 25th, 2008
Decentralized social networks seem to be the talk of the town these days (in tech circles at least). Blogger Robert Scoble has given attention and created a minor scandal of a Facebook policy that forbids the use of scripts to extract data from Facebook Web pages (Note: Facebook just recently allowed accounts to be closed). Around the same time project DiSo has started with the goal to build a decentralized version of Facebook based on the open source Wordpress personal publishing platform, and workgroup DataPortability.org has kicked off to define best practices to make personal data easily movable, reusable, remixable, etc. across Web services. Just two days ago at his Northern Voice 2008 keynote, Matt Mullenweg, creator of WordPress, seemed to be almost hinting at what his company was up to with their recent $29.5M round of funding: a better, open-source alternative to closed social systems like Facebook that would use social filters to bring more relevant content.

As I mentioned in my previous post on business platforms of Web companies, one key aspect of these business platforms is that “they retain control over who gets to see the information and how”. Having a point of mediation is an essential part of online capitalism. Without it, there is no point of value extraction and no big business.
The natural question then is: if so many techies are excited about the inevitable advent of decentralized and portable social networks and related personal data, and if that means essentially that there is little point of control anymore for these Websites, how are businesses going to make big money out of this?
If we put aside the ad-based revenue model that Matt M. does not seem to keen on, as well as the “pro account” business model that would expand on some existing commercially available pro services, as well as the usual ways of making money with open source, here are two models that I think could work:
- Relevancy services: This is would be an expansion of services such as Akismet, Wordpress’ spam filtering service, which is currently free for personal use. Matt insisted strongly in his keynote how content relevancy (i.e. no spam) is really what users value, and how spam from bad users is what kills social systems. Perhaps a high-quality filtering system that would combine the Akismet filter and a social filter (a filter based on your social graph) is something people would be ready to pay for.
- The ring tone business model. This model consists in deriving transaction fees from digital goods sold on WordPress.com, such as themes and widgets. Because Wordpress.com knows which blogs use which themes and widgets, this would be easily done there. It may be a bit harder for users of the Wordpress open source software itself. This would be the equivalent of the ringtone business. Matt Mullenweg revealed himself that “People want their online presence to be an expression of themselves and in that regards, being able to customize the design is critical”. Matt even compared a blog as a locker, which are typically heavily personalized.
This list does not mean to be exhaustive, but seeks only to start a discussion on a subject that is getting more and more relevant. I would be curious to see what others think.
The meaning of vcard’s “fn”
February 2nd, 2008
Martin McEvoy recently resurrected a thread on the replacement of “fn” by “title” in the hAudio microformat. The main point is that “fn” (formatted name) is a bit cumbersome for a song’s name/title. This offered me the opportunity to give my interpretation of the meaning of “formatted name”, which I will summarize here.
A formatted name is a locale-specific (typically of the locale the name is from) serialization of a structured representation of a person’ name. It is useful for display and print, for instance on the label of an envelope, where conformance to local name ordering practices is desired for politeness reasons.
Now some explanation of why formatted names are important for people’s names.
For those who don’t know, there are different name ordering conventions in different parts of the world. Just as an example, given name first, family name second is common in Western countries, whereas family name first, given name second is common in Eastern countries.
So, computer people who want to store names of persons for different places in the world have to deal with the following problem: they want to be able to distinguish family names from given names and other names (middle, mother’s, etc.) since it helps for searching, for identification and for avoiding duplicate entries, but they don’t want to be impolite either and send a letter with a name formatting that does not comply with the locale of the person.
One solution to this problem could be to identify all the different types of name ordering conventions, for instance, by locale and locale region, then code these rules in some programming language, then keep the information about the locale of the person, or infer it from the country they were born in, or the place they live, or something else, and then compute the formatted name from the database or structured or tokenized representation.
That is obviously a lot of work, and also not completely fail-proof. For instance, a Japanese person living in the U.S. might still want their name to be printed on letters with the last name first. If you add honorific titles, prefixes, suffixes, abbreviated forms, etc. to the problem mix, it is even more work. Usually at this point, what the computer people do is go back to the problem they were addressing (usually not an international name storage problem, but something else like a customer data storage problem for a U.S. bank or an electronic business card problem) and realize that if they spend so much time on each issue (”Why are we doing this again?”), and that no much will come out of their work if they continue on this path (at least no fast enough for the next quarter). This is the exact situation that myself and my IFX colleagues faced a couple years ago, and I’m hypothesizing that this is the same problem that the vCard people faced.
The only easy solution is to store the name of the person in a structured format, but keep a copy of the preferred formatting in a separate field. This is what we did at IFX, and this is what I’m guessing the vCard people did.
All this to say that the meaning of “formatted name” is to me very specific to those names for which there is value in maintaining two representations, one structured and one serialized, because reconstructing one from the other is difficult. To go back to the original thread raised by Martin, and given the above, I don’t think that “fn” should be used for a song’s name.
Punctuation as markup
February 2nd, 2008
Like many, I spent most of my days in markup. Sometimes to the point that I forget what it was invented for, to address what problem. During these times of doubt and confusion, I like to go back in time and read the works of the pioneers.
Yesterday night, I read this great article Markup Systems and the Future of Scholarly Text Processing, which dates back from before XML, HTML, SGML, or GML even!
There is a section on the different kinds of markup, in particular punctuational markup and descriptive markup. After reading this, it occurred to me that the following three representations are strictly equivalent (markup is highlighted in bold):
- Plain text markup using the period “markup” to signify the end a sentence that is a statement (see. this section of HyperGrammar for a grammar refresher):
The teacher asked who was chewing gum. - XML markup that specifies that a piece of text is a sentence and a statement (notice no end punctuation here):
<sentence><statement>The teacher asked who was chewing gum</statement></sentence> - Plain old semantic HTML that specifies that a piece of text is a sentence and a statement (notice no end punctuation here):
<span class=”sentence statement”>The teacher asked who was chewing gum</span>
In the last case, you can use CSS code to add a period at the end of each sentence that is a statement:
.sentence.statement:after {
content: '.'
}
This means also that if we are being strict, combining punctuation with HTML or XML markup, when descriptive markup and CSS styling suffice, is a bad practice since it is semantically redundant, or in other words, one of the two is useless.
There are plenty other examples to explore: for instance, quotes (”) as markup that is an alternative to the <q></q> or <blockquote></blockquote> HTML markup. This may be pushed to the extreme that each space in plain text is viewed as markup to distinguishes words from one another.
This is probably an epiphany just for me, but I thought I’d post it anyway!
Property vs. subject referencing and property inheritance in Microformats
January 29th, 2008
Microformats are standardized ways to add meaning to objects such as persons, places, events published on the Web. Microformats such as hCard, a format to add meaning to persons and places, are rooted in common patterns in Web content structure and class namings, as well as in semantics and structures borrowed from existing standards such as vCard. Microformats work very well when the object tree structure maps perfectly to the content tree structure, and they don’t work so well or don’t work at all when the two trees don’t map.
One technique used by microformats is the include-pattern, which allows properties to be referenced from within an object. I call this technique “property referencing”. For instance, in the following sentence: “We have two office locations in San Francisco: 665 3rd Street and 123 Folsom Street.”, “665 3rd Street” and “123 Folsom Street” are two adr objects, which do not contain a locality property because this property is inferred from the context. Well, according to the include pattern this property can be referred by including in the span of class adr an empty a element of class include that will point to the “San Francisco” fragment:
"... in <span id="sf" class="locality">San Francisco</span>. <span class="adr"><span class="street-address">665 3rd Street</span><a href="#sf" class="include"/></span> ..."
The problem with this technique typically relies on one or more empty anchor (”a”) element(s), and as such has been criticized as not accessible for non graphical user agents such as screen readers, which get confused by these empty links.
One technique I have suggested is the use of subject referencing and property inheritance:
- subject referencing means that the property is refers to the subject it qualifies rather than the subject refers to the property that qualifies it.
- property inheritance means that if a property is attached to a container object then all objects contained in that container object inherit this property, unless it is overridden
Here is my example above with these two concepts in action:
Our company has office locations in <a href="#adrlist1" rev="propertyOf"><span class="locality">San Francisco</span></a>: <ul id="adrlist1"><li id="adr1" class="adr"><span class="street-address">665 3rd Street</span>, and</li><li id="adr2" class="adr"><span class="street-address">123 Folsom</span></li>
</ul>
In this markup, we have:
- “San Francisco” is a locality property of
adrlist1, which is a container (ul) - “665 3rd Street” and “123 Folsom” are contained in adrlist1 and as such inherit the “San Francisco” locality property.
Introduction to RDFa
January 7th, 2008
RDFa is the W3C’s proposal for the lowercase semantic Web. As such, it is similar in spirit to the microformats initiative, but strives to leverage existing W3 work such as RDF. Manu Sporny produced an educational video on RDFa that you can watch below.
As Manu reminded me on the uf-discuss list, this video is more for education than for evangelization. Also, some aspects that I am personally interested in are missing, such as: drawbacks/features comparison and cohabitation with microformats, or explanation of choices made, but it is nonetheless very clear and quite entertaining for a topic that many would probably find very boring.
