Selection by labels

Core

Exact(values[, axis, level])

Select labels from a list, sorted by the order they appear in the list.

Everything([axis, level])

Select all labels.

LabelMask(cond[, axis, level])

Select labels where the condition is True.

List selectors

AnyOf(values[, axis, level])

Select labels from a list.

AllOf(values[, axis, level])

Same as AnyOf, except that a KeyError is raised for labels that don’t exist.

String selectors

StartsWith(pat[, case, axis, level])

Select labels that start with a prefix.

EndsWith(pat[, case, axis, level])

Select labels that end with a suffix.

Contains(pat[, case, flags, regex, axis, level])

Select labels that contain a pattern or regular expression.

Match(pat[, flags, axis, level])

Select labels that match a regular expression.

Data type selectors

HasDtype([include, exclude])

Select columns based on the column dtypes.

AllBool()

Select boolean columns.

AllNumeric()

Select numeric columns.

AllCat(*[, ordered])

Select categorical columns.

AllStr(*[, strict])

Select columns with dtype object, and string if pandas version >= 1.0.0.

AllNominal(*[, strict])

Select nominal columns.

Logical operators

All label selectors implement the following operators:

Operator

Description

~s

Inverse the selection.

s & t

Select elements in both selectors.

s | t

Select elements in the left side but not in the right side.

s ^ t

Select elements in the left side but not in the right side.

s - t

Select elements in the left side but not in the right side.

For all operators, if one operand is incompatible, it will be wrapped with Exact first. In that case, the axis and level arguments are inferred from the other operand.

In [1]: from pandas_select import AnyOf

In [2]: AnyOf("A", axis="index", level=2) & "B"
Out[2]: AnyOf(values={'A'}, axis='index', level=2) & Exact(values=['B'], axis='index', level=2)

In [3]: ["A", "B"] | AnyOf("B")
Out[3]: Exact(values=['A', 'B'], axis='columns', level=None) | AnyOf(values={'B'}, axis='columns', level=None)

Duplicates

Label selectors return a pandas.Index, which is interpreted by DataFrame [] and loc as a sequence of strings.

Warning

pandas_select will raise a RuntimeError when the selection contains duplicates. This is because selecting duplicates is probably not what you want. In this case, Pandas gives you a DataFrame that contains all columns with that name, for each column name you selected.

In [4]: import pandas as pd

In [5]: df = pd.DataFrame([[2, 1], [1, 2]], columns=["A", "A"], index=["a", "a"])

In [6]: df
Out[6]: 
   A  A
a  2  1
a  1  2

In [7]: df[["A", "A"]]
Out[7]: 
   A  A  A  A
a  2  1  2  1
a  1  2  1  2

In [8]: df.loc[["a", "a"]]
Out[8]: 
   A  A
a  2  1
a  1  2
a  2  1
a  1  2

In [9]: from pandas_select import AnyOf

In [10]: try:
   ....:     df[AnyOf("A")]
   ....: except RuntimeError as e:
   ....:     print(e)
   ....: 
Found duplicated values in selection