Selection by labels¶

Core¶

`Exact`(values[, axis, level])	Select labels from a list, sorted by the order they appear in the list.
`Everything`([axis, level])	Select all labels.
`LabelMask`(cond[, axis, level])	Select labels where the condition is True.

List selectors¶

`AnyOf`(values[, axis, level])	Select labels from a list.
`AllOf`(values[, axis, level])	Same as `AnyOf`, except that a `KeyError` is raised for labels that don’t exist.

String selectors¶

`StartsWith`(pat[, case, axis, level])	Select labels that start with a prefix.
`EndsWith`(pat[, case, axis, level])	Select labels that end with a suffix.
`Contains`(pat[, case, flags, regex, axis, level])	Select labels that contain a pattern or regular expression.
`Match`(pat[, flags, axis, level])	Select labels that match a regular expression.

Data type selectors¶

`HasDtype`([include, exclude])	Select columns based on the column dtypes.
`AllBool`()	Select boolean columns.
`AllNumeric`()	Select numeric columns.
`AllCat`(*[, ordered])	Select categorical columns.
`AllStr`(*[, strict])	Select columns with dtype `object`, and `string` if pandas version >= 1.0.0.
`AllNominal`(*[, strict])	Select nominal columns.

Logical operators¶

All label selectors implement the following operators:

Operator	Description
`~s`	Inverse the selection.
`s & t`	Select elements in both selectors.
`s \| t`	Select elements in the left side but not in the right side.
`s ^ t`	Select elements in the left side but not in the right side.
`s - t`	Select elements in the left side but not in the right side.

For all operators, if one operand is incompatible, it will be wrapped with Exact first. In that case, the axis and level arguments are inferred from the other operand.

In [1]: from pandas_select import AnyOf

In [2]: AnyOf("A", axis="index", level=2) & "B"
Out[2]: AnyOf(values={'A'}, axis='index', level=2) & Exact(values=['B'], axis='index', level=2)

In [3]: ["A", "B"] | AnyOf("B")
Out[3]: Exact(values=['A', 'B'], axis='columns', level=None) | AnyOf(values={'B'}, axis='columns', level=None)

Duplicates¶

Label selectors return a pandas.Index, which is interpreted by DataFrame [] and loc as a sequence of strings.

Warning

pandas_select will raise a RuntimeError when the selection contains duplicates. This is because selecting duplicates is probably not what you want. In this case, Pandas gives you a DataFrame that contains all columns with that name, for each column name you selected.

In [4]: import pandas as pd

In [5]: df = pd.DataFrame([[2, 1], [1, 2]], columns=["A", "A"], index=["a", "a"])

In [6]: df
Out[6]: 
   A  A
a  2  1
a  1  2

In [7]: df[["A", "A"]]
Out[7]: 
   A  A  A  A
a  2  1  2  1
a  1  2  1  2

In [8]: df.loc[["a", "a"]]
Out[8]: 
   A  A
a  2  1
a  1  2
a  2  1
a  1  2

In [9]: from pandas_select import AnyOf

In [10]: try:
   ....:     df[AnyOf("A")]
   ....: except RuntimeError as e:
   ....:     print(e)
   ....: 
Found duplicated values in selection