You are neck deep in signals. You followed a couple of them to read this. Some are handled automatically by your tools. Others are ignored. A few of them actually span from one system to the next. And the best signals probably require you to interact at least partially with the content in order to make the bigger decision: TL;DR! I’m going to continue my rantings on internet news (Skip back to Part 3 if you like) by breaking down these signals into the simplest bits I can think of: Who, What, and Where.
Who is the most important signal to most people. And most systems are built on it. Facebook has friends. G+ has circles. Twitter has your follow list. Your email program has an address book, but anyone can send you a message, and most people probably read the vast majority of email sent to their address (provided it passes a bayes filter). But none of these systems know anything about each other. Effort spent organizing your G+ circles is wasted in twitter. This is a shame.
You probably group people in some logical high level ways. I have circles for friends, family and co-workers. But I also have circles for people that I simply am following. I am some sort of stalker. It probably wouldn’t be difficult for you to order these in terms of how important each group is. Or maybe for most people, simply being explicitly named in a circle might be enough to distinguish you from the unwashed masses. In other words there is a relationship signal, where your BFF and Mom and Boss might be 1.0, but some dude who just says funny things once in awhile is a 0.3.
Some content is broadcast widely: a tweet is public and sent to the world. A G+ post might be “Limited” indicating more directionality. A mailing list might have 100 subscribers or 4. Or an email might be sent directly to you. In other words, the ‘Who’ signal is diluted by being sent to more people… and amplified as it is sent to fewer people. The private email from your boss is highly directional, and probably far more important than the mailing list from a co-worker sent to a team of 10 engineers. Well… I guess that depends on what your boss is like… So you might have a “Bob” signal, but then you probably also have a signal indicating how much “Bob” intended that for you. A public tweet is probably very “Bob”, but he didn’t mean it for you in particular. What a jerk.
What a message actually is about is probably the signal that requires the most thinking about. And this information is often not being well used today. If you read “Hacker News” you are proclaiming an interest in their subject matter (Hard Core Porn & Internet Start-Ups), which is probably totally different than the news you get from TMZ (except when Paul Graham is caught in a tryst with Paris Hilton and the content ought to appear on both publications). But this is a case where a simple keyword learning system would do the job. I read a number of sites and use them exclusively for a small number of keywords. I have no interest in Lyndsey Lohan, but every now and then TMZ will talk about William Shatner. (probably not often enough to justify the feed in today’s world, but that doesn’t stop me from trying) There’s a keyword signal that right now is being lost. And unfortunately for, the TMZ is particularly loud… they post 30-40 items per day, and I only want 1-2 per week that hit my key words. That feed is pretty worthless to me.
Some publications naturally segregate their content simply by the nature of what they are. A site is about Technology or Comedy or Celebrity Gossip or Sports. You choose to opt in and out of these publications in broad strokes. But once you get inside of them, you start realizing that most items are a lost cause. I bet Google Reader has a fantastic trove of information about user engagement with every feed on earth. You can interact with an item to differing degrees: A user might scroll past an item, or ”Mark All As Read” a hundred items at a time… I can expand and read a paragraph… or I can click through and possibly read a thousand words.
Using this sort of data, you could start sampling out the keywords that the user is interested in. Every source on-line now has meta tags, and there is tons of interesting research that has been done on pulling out keywords for any hunk of test. Any time you positively interact with a tag, it can simply nudge up that signal. The obvious goal here is that you can subscribe to more content, knowing that the keywords that are valuable to you will signal their way up.
Where a piece of content came from tells you a lot too. I have MacNN as an RSS feed. I’ve told my RSS reader that I prefer that site to a dozen other practically interchangable Mac news sites. But I’m not interested in 95% of the things that come from that feed. On the other hand, I might read every item from I Can Haz Cheesburger. I mean, they take longer to load than to read, and sometimes they make me giggle. I know that a MacNN item might take me 3-5 minutes to read. But ICHC might take 2 seconds. One has a poor hit rate, but a higher priority, the other has a huge hit rate, but almost no priority.
Each publication has its own internal signals as well. The number of posts attached to a story for example… the fact that the piece of news appeared on a specific publication has a certain weight… it is now a 1.0 on the site name signal… but the interaction on that site might indicate a much less active item. If the publication typically has 100 posts, and this item has 50, then maybe it weighs in with 0.5 on the site interaction signal. So a site has a reputation, and then an activity within it.
Quantity factors in as well. A site that publishes 100-200 items per day (Digg, Reddit) is less valuable to me than one that posts 25 (Slashdot). Call it Dilution. Call it Focus. I don’t know. But the Penny Arcade RSS feed posts only 10-15 items per week, and I click through on 3 items per week. By posting fewer items, the value of the signal “We just posted an Item” is FAR higher than the site that posts so many items per day that it becomes a part time job just to keep up.
The other part of the ‘where’ component opens another pandora’s box: are you an active consumer, marking items as read (like email) or a passive consumer, scrolling through items (like perhaps twitter). That’s a subject for another day.
Finally you can start putting it all together. An email sent directly to you by a family member that mentions the name of your child is going to register a lot of signals and show up very high on the priority list. As you work through the queue you will find items that are mentioning important keywords… or tweeted by members of your most exciting circles… eventually you’ll start moving out of the good stuff, and start finding items that are in your feeds only because they exist in your RSS feeds. They don’t mention keywords that appeal to you generally. They might come from a publication that has a 5% hit rate for you. In other words, the signal is getting awfully weak here. At some point, the system ought to abandon this content and start recommending things to you based on other user behavior.

Imagine you had an algorithm to help with signal farming – how would you know if it went wrong?
What integrity tests could you build in?
http://culturedigitally.org/2011/10/can-an-algorithm-be-wrong/
With google and other search engines talking about personalising your results – a supposed wide ranging query (eg a google news alert as email or rss feed) could just become another echo chamber.