by
104 5

Feature engineering with Pandas

preprocessing, eda, pd
Copy Embed Code
<iframe id="embedFrame" style="width:600px; height:300px;"
src="https://www.snip2code.com/Embed/4382950/Feature-engineering-with-Pandas?startLine=0"></iframe>
Click on the embed code to copy it into your clipboard Width Height
Leave empty to retrieve all the content Start End
# Length of str train['Name_length'] = train['Name'].apply(len) # create binary variable train['Has_Cabin'] = train["Cabin"].apply(lambda x: 0 if type(x) == float else 1) # create binary var using filters dataset['IsAlone'] = 0 dataset.loc[dataset['FamilySize'] == 1, 'IsAlone'] = 1 # fill na's with median or other statistic dataset['Fare'] = dataset['Fare'].fillna(train['Fare'].median()) # divide variable into quantiles train['CategoricalFare'] = pd.qcut(train['Fare'], 4) # divide variable into equal-range bins train['CategoricalAge'] = pd.cut(train['Age'], 5) # write a function that receives a value from a series and outputs the value of a # new feature dataset['Title'] = dataset['Name'].apply(get_title) # replace problematic values dataset['Title'] = dataset['Title'].replace('Ms', 'Miss') # replace in batch with a mapping dataset['Sex'] = dataset['Sex'].map( {'female': 0, 'male': 1} ).astype(int) # assign a value to a whole filtered selection dataset.loc[(dataset['Fare'] > 7.91) & (dataset['Fare'] <= 14.454), 'Fare'] = 1 # batch replacing in a column dataset['Title'] = dataset['Title'].replace(weirdTitlesList, 'Rare') # fill nan's with specific guesses for subgroups dataset.loc[ (dataset.Age.isnull()) & (dataset.Sex == i) & (dataset.Pclass == j),'Age'] = myGuess
If you want to be updated about similar snippets, Sign in and follow our Channels

blog comments powered by Disqus