You can convert a DataFrame column containing integers (like categorical class IDs or day numbers) into one-hot encoding easily using either pandas or scikit-learn.
Here are the main methods
pd.get_dummies() (most common and simple)import pandas as pd
df = pd.DataFrame({
'DayNum': [0, 1, 2, 0, 3]
})
# One-hot encode
one_hot = pd.get_dummies(df['DayNum'], prefix='Day')
# Combine back with original dataframe if needed
df_encoded = pd.concat([df, one_hot], axis=1)
print(df_encoded)
Output:
DayNum Day_0 Day_1 Day_2 Day_3
0 0 1 0 0 0
1 1 0 1 0 0
2 2 0 0 1 0
3 0 1 0 0 0
4 3 0 0 0 1
sklearn.preprocessing.OneHotEncoderThis is better when you need to apply the same transformation to train/test sets consistently.
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
df = pd.DataFrame({'DayNum': [0, 1, 2, 0, 3]})
encoder = OneHotEncoder(sparse_output=False)
one_hot = encoder.fit_transform(df[['DayNum']])
# Create DataFrame with readable column names
one_hot_df = pd.DataFrame(one_hot, columns=encoder.get_feature_names_out(['DayNum']))
df_encoded = pd.concat([df, one_hot_df], axis=1)
print(df_encoded)
Output:
DayNum DayNum_0 DayNum_1 DayNum_2 DayNum_3
0 0 1.0 0.0 0.0 0.0
1 1 0.0 1.0 0.0 0.0
2 2 0.0 0.0 1.0 0.0
3 0 1.0 0.0 0.0 0.0
4 3 0.0 0.0 0.0 1.0
import numpy as np
import pandas as pd
df = pd.DataFrame({'DayNum': [0, 1, 2, 0, 3]})
n_classes = df['DayNum'].nunique()
one_hot = np.eye(n_classes)[df['DayNum']]
one_hot_df = pd.DataFrame(one_hot, columns=[f'Day_{i}' for i in range(n_classes)])
df_encoded = pd.concat([df, one_hot_df], axis=1)
print(df_encoded)
If you plan to use this column for an LSTM, you might not need one-hot encoding — instead, you can use an embedding layer to learn a continuous representation of your integer categories.