Role of Decision Tree Classifier in Your Hiring Process

Role of Decision Tree Classifier in Your Hiring Process

It can be chaotic for the HR team to find the right candidate to hire for an open job position because organizations receive hundreds, thousands, or even hundreds of thousands of candidates that are available in the firm’s a resource/resume database.

It becomes almost impossible and inefficient to go through all the applications, resumes, and CV one by one.

So what if there were a way, to actually take historical data on who actually got hired, and map that to things that are found on their resume. We could construct a decision tree, that would let us go through an individual resume and specify if the candidate actually has a high likelihood of getting hired or not.

But What is a Decision Tree Classifier?

So decision trees classifier is an algorithm which basically gives you a flow chart of how to make some decision.

For example, you wanna go outside and play tennis but there are many different aspects of the weather that might influence your decision like humidity, temperature, whether it’s sunny or not. And when you have a decision like that, that depends on multiple attributes, multiple variables, a decision tree could be a good choice.

How does the Decision Tree help with Hiring?

So let’s make some totally fabricated hiring data. We have candidates that are just identified by numerical identifiers. We are going to pick some attributes that might be interesting to predict whether or not they’re a good hire or not.

  • How many years of work experience do they have?
  • Are they currently employed?
  • How many employers have they had previous, to this one?
  • What’s their level of education?
  • What degree do they have?
  • Did they go to what we classify as a top-tier school?
  • Did they do an internship while they were in college?

And, we can take a look at this historical data, and the dependent variable here is Hired. You know, did this person actually get a job offer, or not, based on that information?

When to hire a candidate?

As internships are actually a pretty good predictor, of how good a person is. You know, if they have the initiative, to actually go out and do an internship, and actually learn something at that internship, that’s a good sign.

When not to hire?

If the candidate has never held a job, and they never did an internship either, and that is something is required for the job then it's probably good to leave the candidate out of the hiring process.

Obviously, the decisions vary from position to position and the parameters can be set by the employer on what is required for the job position.

What happens behind the decision tree algorithm?

( You can skip this section if you don’t have a tech background )

The Decision Tree algorithm can be used for solving regression and classification problems. The goal of using a Decision Tree is to create a training model that can be used to predict the class or value of the target variable by learning simple decision rules inferred from prior data(training data).

Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes.

The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that the purity of the node increases with respect to the target variable. The decision tree splits the nodes on all available variables and then selects the split which results in the most homogeneous sub-nodes.

Let us look at some algorithms used in Decision Trees:

  1. ID3: It uses a greedy strategy by selecting the locally best attribute to split the dataset on each iteration.
  2. C4.5: It builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy with considering the Gain ratio as a base.
  3. CART: It can perform both classification and regression tasks and they create decision points by considering the Gini index unlike ID3 or C4.5
  4. CHAID or Chi-square Automatic Interaction Detector is a process that can deal with any type of variable be it nominal, ordinal, or continuous. In the regression tree, it uses F-test and in classification trees, it uses the Chi-Square test.
  5. MARS or Multivariate adaptive regression splines is an analysis specially implemented in regression problems when the data is mostly nonlinear in nature.

Let's Implement a Decision Tree

We will load some fake data on past hires. We use pandas to convert a CSV file into a DataFrame:

import numpy as np

import pandas as pd

from sklearn import tree

input_file = "PastHires.csv"

df = pd.read_csv(input_file, header = 0)

df.head()

Output:

Scikit-learn needs everything to be numerical for decision trees to work. So, we’ll map Y, N to 1,0 and levels of education to some scale of 0–2. In the real world, you’d need to think about how to deal with unexpected or missing data! By using map(), we know we’ll get NaN for unexpected values.

d = {'Y': 1, 'N': 0}
df['Hired'] = df['Hired'].map(d)
df['Employed?'] = df['Employed?'].map(d)
df['Top-tier school'] = df['Top-tier school'].map(d)
df['Interned'] = df['Interned'].map(d)
d = {'BS': 0, 'MS': 1, 'PhD': 2}
df['Level of Education'] = df['Level of Education'].map(d)
df.head()

Output:

Next, we need to separate the features from the target column that we’re trying to build a decision tree for.

features = list(df.columns[:6])
features

Output

Now actually construct the decision tree:

y = df["Hired"]
X = df[features]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X,y)

To read this decision tree, each condition branches left for “true” and right for “false”. When you end up at a value, the value array represents how many samples exist in each target value. So value = [0. 5.] mean there are 0 “no hires” and 5 “hires” by the time we get to that point. value = [3. 0.] means 3 no-hires and 0 hires.

from IPython.display import Image
from sklearn.externals.six import StringIO
import pydotplus

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data,
feature_names=features)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())

Output:

Thank you for reading this post, I hope you enjoyed and learn something new today. Feel free to contact me through my blog if you have questions, I will be more than happy to help.

Stay safe and Happy learning!