We introduced Attrition Analysis in Human Resources Management using the Kaplan and Meier Estimation in a previous post of this blog. In summary, we can say that the Kaplan and Meier Estimation (a non-parametric method) defines a survival function that allows us to calculate the probability (\(S(t)\)) that one employee leaves the organization after a while.
In the previous post, we explained how to calculate the survival function of all employees, understand the results, and represent it.

Attrition Analysis is one of the most common analyses that companies carry out. The loss of talented employees and finding and training new ones could take a long time, lose a competitive advantage, and generate elevated costs.
Attrition Analysis belongs to the Survival Analysis category, and there are three basic approaches to analyze it.
Parametric methods Semi-parametric methods Non-parametric methods Today, we will see the most popular non-parametric method for the attrition analysis: The Kaplan and Meier Estimation.

Your colleagues can decide to work with other software for data analysis, such as SPSS, Stata, and SAS. Every software has its own specific format to save and share the data sets. For example, in R, we have the RData format. The basic information in all these formats is the same. Still, how they organize data sets, the different types of classes they use, and how they manage some kinds of information (e.

When we want to analyze text data, it’s highly recommended to convert the data set to a Tidy Data Structure, which has three characteristics:
Each variable is a column Each observation (token) is a row Each type of observational unit is a table The name of this process is tokenization, and there are different approaches. To tokenize our text data, we can use the function unnest_tokens() from the package tidytext.

One of the most common statistical hypothesis tests, when we are analyzing populations, is the Student’s t-test. We can mainly use this test to evaluate a characteristic’s value in a population based on a sample. It’s important to highlight that the sample needs to follow a normal distribution for carrying out the Student’s t-test (also called normality). Let’s see three examples.
First example Firstly, we load the data set employees_for_ttest that we can find in the package HRdatsets.