What IT recruiters know about you -- whether you're looking or not

What recruiters know about you is about to get a whole lot deeper than what you put on your resume. An emerging class of search engines is taking a big data approach to recruiting by crawling the Web for every bit of data about you, assembling it into a master profile, rating your knowledge, skill levels and interests, and serving it up to recruiters who can filter it by location, skill, the school you attended and a range of other criteria.

Today the technology is mainly used as a tool for finding scarce software development talent. But that could broaden into more types of jobs, including high-tech, legal, medical and engineering, according to the vendors and an analyst. (Read about some of the privacy implications of this technology.)

What Gild's algorithms tell recruiters about you

Here are two examples of Gild's algorithms for evaluating skills and knowledge based on what the company finds out about a subject on the Web. The company has created more than 50,000 of these "features," or rules, for its Gild Source service.

Using Bayesian analysis, Gild claims it can predict how skilled a subject might be even little data is available, such as when a person has no open-source code available for evaluation.

Raw data: In an online profile you describe yourself as proficient in C/C++.

Conclusion: You're not very good at either.

Logic: C and C++ are completely different languages. So why would you lump them together? Listing them together indicates that you may have just put them in as checklist items.

Raw data: On Twitter you recently said that Celery sucks.

Conclusion: You have knowledge of Python, Django and Celery.

Logic: The fact that you dislike the asynchronous processing toolkit, written in Python and used extensively in Python Web development, means you're not only familiar with Celery but almost certainly are knowledgeable about Python and Django, with which Celery is commonly used.

Last year Red Hat hired more than 1,000 people. But it wasn't easy to find the software development and engineering talent needed to fill many of those seats. "We use LinkedIn Recruiter extensively," says CIO Lee Congdon.

But the top-notch talent that the open-source software developer is looking for doesn't always bother keeping an updated resume on LinkedIn or elsewhere, and many of the best software engineers don't need to look on job boards for a better position.

So this year, Red Hat decided to be more proactive. It began using a cloud-based service from Gild that takes a big data approach, mining the social Web to identify and evaluate qualified talent.

Working with Gild, Red Hat was able to quickly come up with a ranked list of prospective software engineering candidates, complete with contact information that in some cases Gild harvests from the prospect's source code. "We're very satisfied with the early results," Congdon says.

Red Hat tested the tool by scoring some known quantities: People who had been previously hired. In each case Gild Source's report accurately scored them as a good fit for the job. While Congdon declined to discuss specific hires, he says the correlation between traditional recruiting methods and Gild Source, as Gild's service is called, has been "notable."

In addition to identifying new prospects, it also correctly identified qualified individuals that Red Hat had previously found using its traditional recruiting tools. Gild Source gave Red Hat a longer list but it also correctly identified candidates they had already considered qualified using traditional recruiting methods. The fact that it included the same people in its list validated the tool, in Congdon's eyes.

Comparing services

Gild, along with competitors RemarkableHire, TalentBin and Entelo, are part of an emerging niche of companies that mine social activity on the Web to help recruiters discover and evaluate skilled technical talent quickly -- without waiting for qualified potential candidates to self-identify by building and updating a profile on online job boards and/or LinkedIn.

How RemarkableHire processes your info

Raw data: Your Ruby repositories on GitHub have a large number of reputable followers.

Conclusion: You are skilled in Ruby development.

Logic: You are making contributions that the community deems valuable. If those followers are highly rated by RemarkableHire's algorithms, they carry even more weight, resulting in an even higher aptitude score.

Raw data:You tweet about Java frequently.

Conclusion: None.

Logic: One-way social contributions that lack a response from the community are meaningless. Some talented Java developers tweet about Java, but so do poor Java developers and recruiters looking to fill Java developer roles.

Gild has 6 million profiles. TalentBin claims to have "tens of millions," while RemarkableHire says "we are in the single-digit millions of complete/matched/merged profiles." But as with other types of search engines, says Scott Rothrock, president and co-founder of RemarkableHire, what matters is the ability to put the best possible matches on the first few pages of results.

It's best to take those numbers with a grain of salt, says Peter Kazanjy, CEO at TalentBin, because everyone defines a profile differently. Profiles may be incomplete, or information from different sources may not be matched up into a single profile.

Content from some sources, such as GitHub, may be crawled and fully indexed while other data simply establishes that, for example, the user has a Twitter profile without indexing or analyzing the subject's tweets. The profile record might include a link to the Twitter account but not know that the person has been tweeting extensively about Ruby.

So it's important to understand when comparing services not just which sites the service includes in its search results, but what gets indexed and analyzed from those sites and what doesn't.

How TalentBin processes your info

Raw data: You are the sender on a number of email messages on a Objective-C online email list referencing Core Audio, Core Data and Core Animation in the text of the email.

Conclusion: You have familiarity with iOS and Mac OS X development, especially as regards the audio, data processing and UI animation parts of the language. As such, your experience would be relevant in rich iOS apps that deal with audio and stored user state.

Logic: Core Audio is the library in iOS that is used for audio processing, while Core Data is used for storing user data and synching it with iTunes, while Core Animation is the toolkit that allows for rich animations.

Raw data: You are a member of both the Quantified Self Meetup and Cassandra Users Group Meetup on Meetup.com, and have frequently RSVP'd to their events.

Conclusion: You would be an interesting candidate for a Fitbit, NikeFuel, RunKeeper or Jawbone-type wearable computing software engineering role.

Logic: As a member of the "Quantified Self" meetup, you have demonstrated an interest in the instrumentation of the human body, and as a member of the Cassandra User Group, you have shown an interest in a key tool used for the management and analysis of the "big data" that these various wearable computing companies create.

Raw data: You are listed as an inventor, with five others, on a patent filed in 2012, regarding VMware virtual machine memory handling and moving virtual machines across wide area networks.

Conclusion: You have experience with virtual machine memory, high-performance networking and virtualization, and worked at VMware recently.

Logic: As one of five listed inventors, you were likely a key contributor on the project, and thus have familiarity with the underlying technology in the patent and, more largely, at VMware as an organization.

These services are available by subscription; you pay for use of the tool, not by the search or according to the number of names returned.

Prices range from $6,000 per year per seat for TalentBin, to $349/month for RemarkableHire, to $8,400/year ($700/month) for Gild Source. Gild also offers a 90-day license for $2,700 or $900/month.

The startups are benefitting from a growing trend in recruiting. In response to the high demand for high-tech talent, many large organizations have assembled sourcing teams. These are specialized recruiting groups that look for highly qualified people, which include "passive candidates" who aren't necessarily looking for a job, says Sarah White, principal strategist with Sarah White Associates, an analysis firm that specializes in recruiting technology.

She thinks the idea could spread well beyond just recruiting software engineers. "Two years ago these product didn't even exist, but we are already seeing it go beyond the developer and software engineering area" to other technical disciplines and even sales and marketing, she says.

While people in other positions tend to have a smaller online footprint than open source software developers, there's still plenty to mine, these vendors argue, both in social media and in other areas, such as patent databases for engineering roles and PubMed in the healthcare field.

Congdon is a believer. "It will be interesting to watch the dynamics in the marketplace," he says: "In the future, your online body of work will speak more loudly in the recruiting process than will your resume and interviewing skills."

Different approaches

At one level, all of the vendors in this space do the same thing. "The base technical approach is not dissimilar to that of a public search engine," says RemarkableHire's Rothrock. But their approaches vary, as do the online sites that each crawls. And the tools are evolving on a monthly basis, both in terms of features and the number of sites on the Web that each monitors.

To identify potential candidates who are about to start looking for a new job, Entelo looks for "social insights" ranging from layoff announcements to changes to a person's social profile.

Gild Source's stock in trade lies in its rankings of developers' code stored on open source sites. "We predictively pull data only on developers," says Dr. Vivienne Ming, chief scientist.

Gild Source's service continuously crawls 65 social sites on the Web, including GitHub and Stack Overflow, where developers might hang out, answer questions and contribute code. It pulls in all of the data it finds, processes it, stores the results in a 20-plus gigabyte Mongo database, and assembles the far-flung data into more than eight million individual profiles that include both structured and unstructured data. Users of the service -- companies looking to fill jobs -- can filter results by categories such as location, degree or school, and can link back to code examples.

The results Gild Source offers up take into account how other people in online forums rank each person's expertise as well as Gild Source's evaluation of the code they've written for open source projects. It then issues an overall knowledge score as well as a ranking for specific skills and influence in the open source community.

For developers who don't contribute code to open source projects, Gild Source has developed predictive algorithms using Bayesian analysis. "We are deeply machine learning-driven," says Ming. "We can predict someone's skill level from the surrounding information. It's highly effective."

RemarkableHire uses what it calls "social evidence" that people are knowledgeable in a particular skill by looking, among other things, for recognition by their peers and indications that they've provided the best answers to questions posted online. "We look for signals within the content that someone has expertise in a particular skill," says Rothrock. The company then provides skills proficiency ratings of one to four stars for each subject.

Joy Garlock, manager of professional recruiting at Gannet Digital Division, has been using RemarkableHire for the last few months to find and interview multiple candidates and extended two offers in the first month after signing up. (She declined to talk about the outcome of the offers.)

The candidates "weren't even looking," Garlock says. "This is an opportunity for us to be in their world as opposed to them coming to us."

TalentBin focuses on discovering talent rather than qualifying it, but the company does offer a "level of intensity" score in particular skills (such as the Ruby programming language) that correlate with the prospect's interest level in a given area, says CEO Kazanjy. He hopes to expand beyond TalentBin's core software engineering jobs to positions in engineering and healthcare by mining some 40 different online sources, including social media, vertical communities, online publications such as PubMed, mailing lists and patent databases. "This approach is extensible to any sort of knowledge worker," he argues.

The fledgling businesses have been successful enough to get the attention of at least one online job board. Dice.com, a tech-focused site, recently launched a similar service, called OpenWeb. That tool excels at the complex process of assembling the bits and pieces of data it gathers from across the Web into a master profile for each individual, analyst White says.

Building a profile

"It's an algorithmic challenge" to correctly match up data from various sources and assemble those into accurate master profiles with high degree of confidence -- and that's part of the secret sauce, says Ming at Gild Source. It uses a three-stage process that ranges from evaluating common email addresses and user IDs to computer-based photo matching. While most matching can be done algorithmically, about 2% of the profiles still need manual attention, she says.

Megan Hopkins, director of human resources and talent at VigLink, a website link monetization service, used TalentBin to find Ruby engineers and quickly came up with a targeted list of potential candidates. "There's no weeding through crap to find good people," she says.

Privacy issues

Do people who aren't actively looking for a job really want to be contacted? "I worried about that," says Joy Garlock, manager of professional recruiting at Gannet Digital Division. But so far, she says, the people she has contacted have been responsive. "We've received a lot of good feedback," she says. But most people probably don't know how the recruiter got their name or how much the recruiter knows about them.

Red Hat CIO Lee Congdon worries that, as these services become more popular, there may be legal, ethical or privacy implications associated with using data mining of the public Web to assemble dossiers on prospective candidates. But the vendors stress that all online data is fair game, and consultant Sarah White, of Sarah White Associates, agrees. "Anything you put out publicly is public information," she says.

On the other hand, a programmer who embeds his Skype ID and email address in source code likely did that to reach other programmers, not to provide a back-door contact mechanism for recruiters.

With the exception of RemarkableHire, most HR data-mining services don't have a clearly articulated mechanism for subjects to opt out, and subjects don't have the opportunity to review the profiles for accuracy or to correct erroneous data. White argues that this should be less of a problem if recruiters use the tools to find potential candidates, rather than as a way to exclude prospects.

But some recruiters do use the rankings from at least one product -- Gild Source -- in an "exclusionary sense" to pull up information on active job candidates and vet them, says Dr. Vivienne Ming, chief scientist at Gild Source.

Peter Kazanjy, CEO at TalentBin, says his firm focuses on discovering talent, rather than qualifying it. "We've stayed away from making claims that this person is more talented at this skill than this other person, because you get into this interesting situation where one person could be chosen over another based on these algorithms."

Then there's the question of how data gets interpreted and processed. "What if a couple of hot-headed software engineers get into a debate on Twitter and in the forums and some of that gets wrapped into the analysis in [a negative] way?" posits Ron Hanscome, research director for human capital management at Gartner Inc.

The implication is that part of the analysis for, say, the fit within a company might be negatively affected by an assessment of personality as hot-headed. Or it might be included or linked to in the profile in some way, creating a negative impression of the person.

While that's not the kind of data the services say they're looking for today, those are the kinds of things that could happen in the future, particularly if usage broadens to a larger portion of the workforce, Hanscome says.

Congdon wonders if privacy concerns could lead to regulatory issues, particularly in Europe. "An interpretation of the data may be erroneous. There must be a mechanism to redress that," he says. And if use of the technology expands, he argues, either vendors or the government will need to come up with guidelines to ensure fairness.

TalentBin scours many of the same sites as Gild Source but does not evaluate actual code attributable to each person. "There isn't [just] one good way to write something," Hopkins argues, and she worries that by focusing too much on such rankings she might exclude perfectly qualified candidates.

Hopkins has now expanded use of the service to include sales, product management, social media and analyst titles. She acknowledges that some titles may have a smaller social media footprint than does your typical software engineer, but TalentBin "still pulls enough information on people to get a good idea of who they are and what they like to do."

Hopkins also makes good use of the profiles of prospective candidates, following links back to their postings in places like GitHub, both to match up technically qualified candidates with her company's culture and to pitch them in a way that lines up with their interests. "Our response rate has been dramatically higher" than she had with LinkedIn Recruiter, the company's previous recruiting tool.

"We get more emails and messages back," Hopkins says. "There's nothing more frustrating than reaching out to people all day and then getting only one response back," she says.

There are some looming privacy issues. (See related story, above.) But for right now, however, those don't seem to be stopping anyone. The technology continues to gain traction with recruiters, and the sweet spot remains tech hiring, although some are already laying the groundwork to broaden into other roles and disciplines. Ron Hanscome, research director for human capital management at Gartner, thinks any expansion may be limited to other fields such as electrical engineering or law.

Consultant White is more bullish. "I see a shift toward using the technology across the board for every type of job opening," she says.

While the tools won't do all of the work for hiring managers, they can provide a good starting point -- a qualified list of potential candidates. But the tools do vary in approach. So, says Hopkins, "If you're going to invest in this you have to try them all."

White agrees. "If you do demos with three or four vendors you'll get a better understanding of what they do, how this works, and which works best for the jobs for which you're hiring."

Not only will this trend continue, but data itself will eventually become a commodity, says TalentBin's Kazanjy. "Eventually, all of this information will be available to everyone." But for now, it's an arms race between vendors as to who has a bigger index and whose algorithms are better.

This story, "What IT recruiters know about you -- whether you're looking or not" was originally published by Computerworld.

Join the discussion
Be the first to comment on this article. Our Commenting Policies