Why pandas-select?¶

pandas-select is a collection of DataFrame selectors that facilitates indexing and selecting data. The main goal is to bring the power of tidyselect to pandas.

Fully compatible with the DataFrame [] operator and the loc() accessor.

Emphasise readability and conciseness by cutting boilerplate:

# pandas-select
df[AllNumeric()]
# vanilla
df.select_dtypes("number").columns

# pandas-select
df[StartsWith("Type") | "Legendary"]
# vanilla
df.loc[:, df.columns.str.startswith("Type") | (df.columns == "Legendary")]

Ease the challenges of indexing with hierarchical index and offers an alternative to slicers when the labels cannot be listed manually.

# pandas-select
df_mi.loc[Contains("Jeff", axis="index", level="Name")]

# vanilla
df_mi.loc[df_mi.index.get_level_values("Name").str.contains("Jeff")]

Play well with machine learning applications.

Respect the columns order.
Allow deferred selection when the DataFrame’s columns are not known in advance, for example in automated machine learning applications.

Offer integration with Scikit-learn.

from pandas_select import AnyOf, AllBool, AllNominal, AllNumeric, ColumnSelector
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

ct = make_column_transformer(
    (StandardScaler(), ColumnSelector(AllNumeric() & ~AnyOf("Generation"))),
    (OneHotEncoder(), ColumnSelector(AllNominal() | AllBool() | "Generation")),
)
ct.fit_transform(df)