Why pandas-select?¶
pandas-select
is a collection of DataFrame selectors that facilitates indexing
and selecting data. The main goal is to bring the power of tidyselect
to pandas.
Emphasise readability and conciseness by cutting boilerplate:
# pandas-select
df[AllNumeric()]
# vanilla
df.select_dtypes("number").columns
# pandas-select
df[StartsWith("Type") | "Legendary"]
# vanilla
df.loc[:, df.columns.str.startswith("Type") | (df.columns == "Legendary")]
Ease the challenges of indexing with hierarchical index and offers an alternative to slicers when the labels cannot be listed manually.
# pandas-select
df_mi.loc[Contains("Jeff", axis="index", level="Name")]
# vanilla
df_mi.loc[df_mi.index.get_level_values("Name").str.contains("Jeff")]
Play well with machine learning applications.
Respect the columns order.
Allow deferred selection when the DataFrame’s columns are not known in advance, for example in automated machine learning applications.
Offer integration with Scikit-learn.
from pandas_select import AnyOf, AllBool, AllNominal, AllNumeric, ColumnSelector from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, StandardScaler ct = make_column_transformer( (StandardScaler(), ColumnSelector(AllNumeric() & ~AnyOf("Generation"))), (OneHotEncoder(), ColumnSelector(AllNominal() | AllBool() | "Generation")), ) ct.fit_transform(df)