user
How to convert a DataFrame's column containing integers into one-hot encoding?
alphonsio

You can convert a DataFrame column containing integers (like categorical class IDs or day numbers) into one-hot encoding easily using either pandas or scikit-learn.

Here are the main methods


Method 1 — Using pd.get_dummies() (most common and simple)

import pandas as pd

df = pd.DataFrame({
    'DayNum': [0, 1, 2, 0, 3]
})

# One-hot encode
one_hot = pd.get_dummies(df['DayNum'], prefix='Day')

# Combine back with original dataframe if needed
df_encoded = pd.concat([df, one_hot], axis=1)

print(df_encoded)

Output:

   DayNum  Day_0  Day_1  Day_2  Day_3
0       0      1      0      0      0
1       1      0      1      0      0
2       2      0      0      1      0
3       0      1      0      0      0
4       3      0      0      0      1

Method 2 — Using sklearn.preprocessing.OneHotEncoder

This is better when you need to apply the same transformation to train/test sets consistently.

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

df = pd.DataFrame({'DayNum': [0, 1, 2, 0, 3]})

encoder = OneHotEncoder(sparse_output=False)
one_hot = encoder.fit_transform(df[['DayNum']])

# Create DataFrame with readable column names
one_hot_df = pd.DataFrame(one_hot, columns=encoder.get_feature_names_out(['DayNum']))

df_encoded = pd.concat([df, one_hot_df], axis=1)
print(df_encoded)

Output:

   DayNum  DayNum_0  DayNum_1  DayNum_2  DayNum_3
0       0       1.0       0.0       0.0       0.0
1       1       0.0       1.0       0.0       0.0
2       2       0.0       0.0       1.0       0.0
3       0       1.0       0.0       0.0       0.0
4       3       0.0       0.0       0.0       1.0

Method 3 — Manually with NumPy (if you want full control)

import numpy as np
import pandas as pd

df = pd.DataFrame({'DayNum': [0, 1, 2, 0, 3]})
n_classes = df['DayNum'].nunique()

one_hot = np.eye(n_classes)[df['DayNum']]
one_hot_df = pd.DataFrame(one_hot, columns=[f'Day_{i}' for i in range(n_classes)])

df_encoded = pd.concat([df, one_hot_df], axis=1)
print(df_encoded)

Tip

If you plan to use this column for an LSTM, you might not need one-hot encoding — instead, you can use an embedding layer to learn a continuous representation of your integer categories.