A Lithium brand
A more neutral topic curation algorithm
SarahE
2
Kudos
15
Comments
May 16, 2016

 

Monday, Gizmodo published an article about the curation practices behind Facebook’s Trending module, and to what extent a curator’s personal biases affect what’s shown to Facebook’s billion-plus active users. (As you can imagine, this has caused some controversy.) Although Gizmodo focused on the role of human curators, those of us who work with algorithms and machine learning have had to confront the fact that biases can end up deeply encoded into supposedly objective systems, including Klout’s own topic classifier -- the system that identifies your Expert topics, as well as the relevant articles for the Explore tab. Here’s a quick roundup of how we in Klout’s Data Science team think about keeping the topic system as unbiased as possible.

 

Klout’s topic classifier in 30 seconds or less

The major inputs to our topic system are:

  1. The classifications we’re applying:
    1. An ontology of nearly 10,000 human-curated topics
    2. An underlying dictionary of over a million named entities and concepts
  2. The data being classified, which includes:
    1. Social media profiles (to be analyzed for topic expertise and interest)
    2. URLs published on social media and elsewhere (to be analyzed for topical content and served in the Explore tab)

 

Even without getting into the weeds of the pipeline that brings those two types of input together -- more on that below -- critical readers will already be able to spot a few areas where we’re vulnerable to bias. Let’s walk through them.

 

Any classification system contains value choices

Should “Autism” be placed under “Diseases” or “Neurology”? Is “the Tea Party movement” distinct from “conservative politics”? Is “Wizard Rock” really a thing? How we answer these questions shapes the experience of our users, and inevitably gives our ontology a point of view.

 

Longtime users may remember the early days of Klout, when topics were a, shall we say, messy combination of user-submitted tags and data-mined concepts. As the Data Science team worked on regularizing and improving the ontology, we’ve relied on the following principles:

 

  1. The ontology is a living thing; we should always have tools in place for updates
  2. The ontology should always have up-to-date guidelines defining:
    1. Scope -- what portion of the world we’re describing. (In our case, as much of it as possible.)
    2. Granularity -- at what level of detail we’re describing it. (Do we need to include every actor? Television show? Heavy metal subgenre?)
    3. Voice -- the tone we use in describing it. (Do we use scientific names? Full legal names for persons? Slang?)
  3. A little redundancy won’t hurt  -- our system can support topics with some conceptual overlap, so err on the side of inclusion. (For example, both “Gun Rights” and “Gun Control” are topics in the Klout ontology.)
  4. Users are the best source of feedback -- our users have a broader range of perspectives than we do; make it easy for them to alert us to problems

 

Even so, any time your application or audience changes, it’s important to reassess your classification scheme. One major flaw in Klout’s topic ontology is that it was developed for a U.S. audience, and still needs significant work for other countries and languages.

 

Staying alert for sins of omission

In addition to the human-curated topics in the ontology, we also use a dictionary of concepts and entities derived from Freebase. Freebase is a widely-used resource in the data science world, but “widely-used” is not the same thing as “perfect”, by any means. The biggest issue with Freebase is what it leaves out; like Wikipedia, it was collectively sourced, and like Wikipedia, it’s biased toward the interests of its editors, and sparse in some areas like cosmetic products and fashion terms, requiring us to develop ways to supplement the dictionary. The moral of the story: it pays to look critically at any pre-packaged data set you plan to use.

 

Boosting inclusivity

Next, let’s consider the URLs we collect for the Explore tab. The majority are URLs that have been shared on social media, which means they are dominated by the topics most discussed on social media: politics, celebrity news, music, etc. What we sometimes call “niche topics”, like molecular biology, or Wicca, or wheelchairs, naturally are present in fewer URLs. Does that count as a bias? It’s unclear, but it’s not a good end-user experience and risks making some users feel marginalized. As a result, we’ve had to develop backup strategies to increase coverage for less common topics.

 

The fuzzy line between human bias and business logic

One of the more ironic tidbits in Gizmodo’s article was that Facebook’s curators were told to suppress news about Facebook -- that is, to interfere with the Trending algorithm to avoid the appearance that Facebook was interfering with the Trending algorithm. But that kind of decision is probably familiar to the product managers in the audience, whose goal it is to preserve the user experience. Similarly, a discovery feed like our Explore tab might recommend porn, or spam, or hate speech, and need to be tuned or overridden. To make it even more complicated, the definition of porn, or spam, or hate speech may change from region to region. Keeping those decisions from being made inconsistently or thoughtlessly is really difficult, but our approach has been to define a single owner who both documents the rules and is accessible to discuss individual cases. As others have pointed out, Facebook’s mistake may not have been having curatorial tools, but isolating the employees using them.

 

Fine, but what about the actual topic algorithm?

Eagle-eyed readers will have noticed that we haven’t touched on the nuts and bolts of how Klout’s topic system actually assigns topics. The challenges of data modeling and debugging machine learning algorithms are pretty well surveyed elsewhere, and how we handle those challenges at Klout would require a dedicated blog post. However, there’s less discussion of how to handle human biases when collecting training or validation data -- how people’s points of view get encoded into the data a given algorithm is trying to approximate. The two approaches often recommended could be described as micromanaging versus crowdsourcing; either a) have an in-house process that includes well-defined guidelines, trained judges, and a reconciliation process for disagreements, or b) have lightweight guidelines but a large number of judges, in the hopes that individual biases will be muted. There are tradeoffs to either approach; our team has recently been relying mostly on in-house validation data, mostly because it’s friendly to our development schedule. But what’s more important, in our experience, is that the potential weaknesses of the training/validation data are known and discussed and documented ahead of time, so that they can be distinguished from problems with the model itself.


No system is perfect, and keeping out bias takes continual work. Although a focus on documentation, consistency, and validation will take you a long way, the very best defense against unintentional bias is a diverse team, who can bring multiple points of view. Want to come work with us?

 

 


IMG_750.jpgSarah Ellinger is the Lead Data Analyst for Klout/Lithium’s Data Science team. She is responsible for overseeing the content of the topic ontology, as well as monitoring the performance of the topic classification system. Sarah attended U.C. Berkeley’s School of Information and has over a decade of experience in taxonomy and web content classification at tech companies large and small. She can be found on Twitter discussing information science and Game of Thrones spoilers @sarahellinger.

15 Comments
eddie00007
N/A

You can free pyramid solitaire card game however, many people like play alone as there's no need to search for other players,  or to worry about how other players will feel when you beat them 
click here to play free pyramid solitaire card game http://solitairetimes.com/pyramid

appgonn
N/A
JackWilson
N/A
I read your entire article and I genuinely like it. and Remove In-app purchases & License verification to Lucky Patcher Apk
ashuyadav
N/A

You can free pyramid solitaire card game however, many people like play alone as there's no need to search for other players,  or to worry about how other players will feel when you beat them 
click here to play free pyramid solitaire card  best Android split application 

tekxpert
N/A

The Klout’s Data Science team think about keeping the topic system as unbiased as possible. I agree with you thanks for sharing a nice Article. best kodi add ons

david515
N/A

Such intelligent work on the subject and ideal way of writing here. I am really impressed! This post is a helpful overview of the particular topic and very actionable. Interesting approach! This is really great, unique and very informative post, I like it. thanks imo for pc, kik for pc.

Cheng9
N/A

Some really good stuff here. You should visit us and check out what we can do for you. 

alisondaewon
N/A

The point that we as Facebook users ever desired neutrality speaks to a belief in digital democracy. That is the difference that Facebook have themselves fixed up. Paper Help Service - Ordercollegepapers.Com They intentionally positioned themselves as a distribution system, which is explicitly not an editorial object.

stephenhawking
N/A

All things considered, whenever your application or crowd changes, it's imperative to reassess your characterization conspire. One noteworthy blemish in Klout's subject metaphysics is that it was created for a U.S. group of onlookers, and still needs critical work for different nations and dialects.

 

Web Source: Essay Writing Service

tricksbite
N/A

i must say that you have written a great piece of info, thanks for posting this Smiley Happy

 

meanwhile, you can have a look at these articles  iosemus pokemon go tweaked  cisco packet tracer for mac

ellisbentley
N/A

Thanks for this great post, I find it very interesting and very well thought out and put together. I look forward to reading your work in the future.

imessage for pc

garageband for windows

facetime for windows

kik for pc

uktvnow apk

anoshalassi
N/A

Dear customers we will talk with you now on how to work on us to buy and sell شراء اثاث مستعمل جدةused and what are the most important steps Which we adopt We have many and many channels of communication with customers, the most important are the channels of communication through the Internet and through the site of the company buy used furniture and also our pages on social networking sites such as Facebook and Twitter, where the latest news of the company and also We communicate with customers directly and respond to all inquiries …. Second: It is also a channel of communication with customers and the most important but direct شراء الاثاث المستعمل بجدة is through your visit to any of our branches that have already mentioned with the previous address and be dealing directly as you have used شراء اثاث مستعمل بجدةthat you are trying to sell or you have the desire to buy either Our furniture is used in exhibitions of used محلات شراء الاثاث المستعمل بجدة . Third: After contacting us in any of the above mentioned forms and in the case that the client has used furniture that wants to dispense with and sell a representative of our shops to go to the client’s house and preview the purposes that the client wants to sell and also assess them and determine the appropriate price and offer them to the customer and we are our customers We take into account that we give our customers the highest prices in the market at all, but within the appropriate limits of the value of furniture used by them as we do not never achieve the purposes of customers to win their confidence in the sales and purchase and in the case of agreement between our representative and the customer pay We agreed to send a team of workers trained on the operations of jaw and installation for domestic purposes and remove the removal of objects from the house of the client to our workshops … ارقام شراء الاثاث المستعمل بجدة Fourth: We never show the furniture used in our exhibitions before passing on the workshops for maintenance of our ارقام محلات شراء الاثاث المستعمل بجدةwhere we have very professional technicians In the maintenance of home furniture, where if the house furniture have any kind of damage or شركة شراء اثاث مستعمل بجدة we do good repairs to it and re-renewal and tidy and organize and fit in the appropriate shape us and our company until it is presented in the appropriate manner that fits well from the Saudi oil shelf Home furniture

aryangeek
N/A
markatescil
N/A
Alot of blogs I see these days don't really provide marka tescil, that I'm interested in, but I'm most definately interested in this one. Just thought that I would post and let you know. Nice! thank you so much! Thank you for sharing. Trademark, Patent Registration and protection options for all domestic and international design registration, recent developments and news, marka tescil Turkey and globally, changing legislation and they brought, legal interpretations and opinions. marka tescil, Patent Tescil ve Tasarım Tescil için yurtiçi ve yurtdışı tüm koruma seçenekleri, son gelişmeler ve haberler, Türkiye ve Global ölçekte değişen mevzuatlar ve getirdikleri, hukuki yorum ve görüşler. Tpe Marka Sorgulama Marka Tescil Sınıfları Madrid Protokolü Dünya Patent Sorgulama isim hakkı sorgulama Patent Araştırma patent sorgulama marka sorgulama sitesi marka sorgulama marka tescili patent alma
Alexa23
N/A

Gbwhatsapp Is Like WhatsApp Which is used to Make Two Whatsapp Account In Signal Android Phone With Different Mobile Numbers.