Why pandas-select?

pandas-select is a collection of DataFrame selectors that facilitates indexing and selecting data. The main goal is to bring the power of tidyselect to pandas.

Fully compatible with the DataFrame [] operator and the loc() accessor.

Emphasise readability and conciseness by cutting boilerplate:

# pandas-select
df[AllNumeric()]
# vanilla
df.select_dtypes("number").columns

# pandas-select
df[StartsWith("Type") | "Legendary"]
# vanilla
df.loc[:, df.columns.str.startswith("Type") | (df.columns == "Legendary")]

Ease the challenges of indexing with hierarchical index and offers an alternative to slicers when the labels cannot be listed manually.

# pandas-select
df_mi.loc[Contains("Jeff", axis="index", level="Name")]

# vanilla
df_mi.loc[df_mi.index.get_level_values("Name").str.contains("Jeff")]

Play well with machine learning applications.

  • Respect the columns order.

  • Allow deferred selection when the DataFrame’s columns are not known in advance, for example in automated machine learning applications.

  • Offer integration with Scikit-learn.

    from pandas_select import AnyOf, AllBool, AllNominal, AllNumeric, ColumnSelector
    from sklearn.compose import make_column_transformer
    from sklearn.preprocessing import OneHotEncoder, StandardScaler
    
    ct = make_column_transformer(
        (StandardScaler(), ColumnSelector(AllNumeric() & ~AnyOf("Generation"))),
        (OneHotEncoder(), ColumnSelector(AllNominal() | AllBool() | "Generation")),
    )
    ct.fit_transform(df)