Skip to content

Commit 55589a8

Browse files
committedNov 26, 2014
(Slightly less) initial commit
1 parent cb2754d commit 55589a8

16 files changed

+2714
-2
lines changed
 

‎.gitignore

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
*.pyc
2+
*.egg-info
3+
*.egg
4+
/MANIFEST
5+
/dist/
6+
/docs/_build
7+
/build/

‎LICENSE

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Copyright 2014 Coursera Inc.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.

‎MANIFEST.in

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
include README.rst LICENSE

‎README.md

-2
This file was deleted.

‎README.rst

+105
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
**pandas-ply**: functional data manipulation for pandas
2+
=======================================================
3+
4+
**pandas-ply** is a thin layer which makes it easier to manipulate data with `pandas <http://pandas.pydata.org/>`_. In particular, it provides elegant, functional, chainable syntax in cases where **pandas** would require mutation, saved intermediate values, or other awkward constructions. In this way, it aims to move **pandas** closer to the "grammar of data manipulation" provided by the `dplyr <http://cran.r-project.org/web/packages/dplyr/index.html>`_ package for R.
5+
6+
For example, take the **dplyr** code below:
7+
8+
.. code:: r
9+
10+
flights %>%
11+
group_by(year, month, day) %>%
12+
summarise(
13+
arr = mean(arr_delay, na.rm = TRUE),
14+
dep = mean(dep_delay, na.rm = TRUE)
15+
) %>%
16+
filter(arr > 30 & dep > 30)
17+
18+
The most common way to express this in **pandas** is probably:
19+
20+
.. code:: python
21+
22+
grouped_flights = flights.groupby(['year', 'month', 'day'])
23+
output = pd.DataFrame()
24+
output['arr'] = grouped_flights.arr_delay.mean()
25+
output['dep'] = grouped_flights.arr_delay.mean()
26+
filtered_output = output[(output.arr > 30) & (output.dep > 30)]
27+
28+
**pandas-ply** lets you instead write:
29+
30+
.. code:: python
31+
32+
(flights
33+
.groupby(['year', 'month', 'day'])
34+
.ply_select(
35+
arr = X.arr_delay.mean(),
36+
dep = X.dep_delay.mean())
37+
.ply_where(X.arr > 30, X.dep > 30))
38+
39+
In our opinion, this **pandas-ply** code is cleaner, more expressive, more readable, more concise, and less error-prone than the original **pandas** code.
40+
41+
Explanatory notes on the **pandas-ply** code sample above:
42+
43+
* **pandas-ply**'s methods (like ``ply_select`` and ``ply_where`` above) are attached directly to **pandas** objects and can be used immediately, without any wrapping or redirection. They start with a ``ply_`` prefix to distinguish them from built-in **pandas** methods.
44+
* **pandas-ply**'s methods are named for (and modelled after) SQL's operators. (But keep in mind that these operators will not always appear in the same order as they do in a SQL statement: ``SELECT a FROM b WHERE c GROUP BY d`` probably maps to ``b.ply_where(c).groupby(d).ply_select(a)``.)
45+
* **pandas-ply** includes a simple system for building "symbolic expressions" to provide as arguments to its methods. ``X`` above is an instance of ``ply.symbolic.Symbol``. Operations on this symbol produce larger compound symbolic expressions. When ``pandas-ply`` receives a symbolic expression as an argument, it converts it into a function. So, for instance, ``X.arr > 30`` in the above code could have instead been provided as ``lambda x: x.arr > 30``. Use of symbolic expressions allows the ``lambda x:`` to be left off, resulting in less cluttered code.
46+
47+
Warning
48+
-------
49+
50+
**pandas-ply** is new, and in an experimental stage of its development. The API is not yet stable. Expect the unexpected.
51+
52+
(Pull requests are welcome. Feel free to contact us at pandas-ply@coursera.org.)
53+
54+
Using **pandas-ply**
55+
--------------------
56+
57+
Install **pandas-ply** with:
58+
59+
::
60+
61+
$ pip install pandas-ply
62+
63+
64+
Typical use of **pandas-ply** starts with:
65+
66+
.. code:: python
67+
68+
import pandas as pd
69+
from ply import install_ply, X, sym_call
70+
71+
install_ply(pd)
72+
73+
After calling ``install_ply``, all **pandas** objects have **pandas-ply**'s methods attached.
74+
75+
API reference
76+
-------------
77+
78+
Full API reference is available at `<http://pythonhosted.org/pandas-ply/>`_.
79+
80+
Possible TODOs
81+
--------------
82+
83+
* Extend ``pandas``' native ``groupby`` to support symbolic expressions?
84+
* Extend ``pandas``' native ``apply`` to support symbolic expressions?
85+
* Add ``.ply_call`` to ``pandas`` objects to extend chainability?
86+
* Version of ``ply_select`` which supports later computed columns relying on earlier computed columns?
87+
* Version of ``ply_select`` which supports careful column ordering?
88+
* Better handling of indices?
89+
90+
License
91+
-------
92+
93+
Copyright 2014 Coursera Inc.
94+
95+
Licensed under the Apache License, Version 2.0 (the "License");
96+
you may not use this file except in compliance with the License.
97+
You may obtain a copy of the License at
98+
99+
http://www.apache.org/licenses/LICENSE-2.0
100+
101+
Unless required by applicable law or agreed to in writing, software
102+
distributed under the License is distributed on an "AS IS" BASIS,
103+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
104+
See the License for the specific language governing permissions and
105+
limitations under the License.

‎docs/Makefile

+153
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line.
5+
SPHINXOPTS =
6+
SPHINXBUILD = sphinx-build
7+
PAPER =
8+
BUILDDIR = _build
9+
10+
# Internal variables.
11+
PAPEROPT_a4 = -D latex_paper_size=a4
12+
PAPEROPT_letter = -D latex_paper_size=letter
13+
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
14+
# the i18n builder cannot share the environment and doctrees with the others
15+
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
16+
17+
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
18+
19+
help:
20+
@echo "Please use \`make <target>' where <target> is one of"
21+
@echo " html to make standalone HTML files"
22+
@echo " dirhtml to make HTML files named index.html in directories"
23+
@echo " singlehtml to make a single large HTML file"
24+
@echo " pickle to make pickle files"
25+
@echo " json to make JSON files"
26+
@echo " htmlhelp to make HTML files and a HTML help project"
27+
@echo " qthelp to make HTML files and a qthelp project"
28+
@echo " devhelp to make HTML files and a Devhelp project"
29+
@echo " epub to make an epub"
30+
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
31+
@echo " latexpdf to make LaTeX files and run them through pdflatex"
32+
@echo " text to make text files"
33+
@echo " man to make manual pages"
34+
@echo " texinfo to make Texinfo files"
35+
@echo " info to make Texinfo files and run them through makeinfo"
36+
@echo " gettext to make PO message catalogs"
37+
@echo " changes to make an overview of all changed/added/deprecated items"
38+
@echo " linkcheck to check all external links for integrity"
39+
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
40+
41+
clean:
42+
-rm -rf $(BUILDDIR)/*
43+
44+
html:
45+
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
46+
@echo
47+
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
48+
49+
dirhtml:
50+
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
51+
@echo
52+
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
53+
54+
singlehtml:
55+
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
56+
@echo
57+
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
58+
59+
pickle:
60+
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
61+
@echo
62+
@echo "Build finished; now you can process the pickle files."
63+
64+
json:
65+
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
66+
@echo
67+
@echo "Build finished; now you can process the JSON files."
68+
69+
htmlhelp:
70+
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
71+
@echo
72+
@echo "Build finished; now you can run HTML Help Workshop with the" \
73+
".hhp project file in $(BUILDDIR)/htmlhelp."
74+
75+
qthelp:
76+
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
77+
@echo
78+
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
79+
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
80+
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/pandas-ply.qhcp"
81+
@echo "To view the help file:"
82+
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/pandas-ply.qhc"
83+
84+
devhelp:
85+
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
86+
@echo
87+
@echo "Build finished."
88+
@echo "To view the help file:"
89+
@echo "# mkdir -p $$HOME/.local/share/devhelp/pandas-ply"
90+
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/pandas-ply"
91+
@echo "# devhelp"
92+
93+
epub:
94+
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
95+
@echo
96+
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
97+
98+
latex:
99+
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
100+
@echo
101+
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
102+
@echo "Run \`make' in that directory to run these through (pdf)latex" \
103+
"(use \`make latexpdf' here to do that automatically)."
104+
105+
latexpdf:
106+
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
107+
@echo "Running LaTeX files through pdflatex..."
108+
$(MAKE) -C $(BUILDDIR)/latex all-pdf
109+
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
110+
111+
text:
112+
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
113+
@echo
114+
@echo "Build finished. The text files are in $(BUILDDIR)/text."
115+
116+
man:
117+
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
118+
@echo
119+
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
120+
121+
texinfo:
122+
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
123+
@echo
124+
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
125+
@echo "Run \`make' in that directory to run these through makeinfo" \
126+
"(use \`make info' here to do that automatically)."
127+
128+
info:
129+
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
130+
@echo "Running Texinfo files through makeinfo..."
131+
make -C $(BUILDDIR)/texinfo info
132+
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
133+
134+
gettext:
135+
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
136+
@echo
137+
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
138+
139+
changes:
140+
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
141+
@echo
142+
@echo "The overview file is in $(BUILDDIR)/changes."
143+
144+
linkcheck:
145+
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
146+
@echo
147+
@echo "Link check complete; look for any errors in the above output " \
148+
"or in $(BUILDDIR)/linkcheck/output.txt."
149+
150+
doctest:
151+
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
152+
@echo "Testing of doctests in the sources finished, look at the " \
153+
"results in $(BUILDDIR)/doctest/output.txt."

‎docs/conf.py

+261
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
# -*- coding: utf-8 -*-
2+
#
3+
# pandas-ply documentation build configuration file, created by
4+
# sphinx-quickstart on Tue Nov 18 19:40:12 2014.
5+
#
6+
# This file is execfile()d with the current directory set to its containing dir.
7+
#
8+
# Note that not all possible configuration values are present in this
9+
# autogenerated file.
10+
#
11+
# All configuration values have a default; values that are commented out
12+
# serve to show the default.
13+
14+
import sys, os
15+
import sphinx_rtd_theme
16+
17+
# If extensions (or modules to document with autodoc) are in another directory,
18+
# add these directories to sys.path here. If the directory is relative to the
19+
# documentation root, use os.path.abspath to make it absolute, like shown here.
20+
sys.path.insert(0, os.path.abspath('..'))
21+
22+
# -- General configuration -----------------------------------------------------
23+
24+
# If your documentation needs a minimal Sphinx version, state it here.
25+
#needs_sphinx = '1.0'
26+
27+
# Add any Sphinx extension module names here, as strings. They can be extensions
28+
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
29+
extensions = [
30+
'sphinx.ext.autodoc',
31+
'sphinx.ext.doctest',
32+
'sphinx.ext.coverage',
33+
'sphinxcontrib.napoleon'
34+
]
35+
36+
# Napoleon settings
37+
napoleon_google_docstring = True
38+
napoleon_numpy_docstring = True
39+
napoleon_include_private_with_doc = False
40+
napoleon_include_special_with_doc = True
41+
napoleon_use_admonition_for_examples = False
42+
napoleon_use_admonition_for_notes = False
43+
napoleon_use_admonition_for_references = False
44+
napoleon_use_ivar = False
45+
napoleon_use_param = True
46+
napoleon_use_rtype = True
47+
autodoc_member_order = 'bysource'
48+
49+
# Add any paths that contain templates here, relative to this directory.
50+
templates_path = ['_templates']
51+
52+
# The suffix of source filenames.
53+
source_suffix = '.rst'
54+
55+
# The encoding of source files.
56+
#source_encoding = 'utf-8-sig'
57+
58+
# The master toctree document.
59+
master_doc = 'index'
60+
61+
# General information about the project.
62+
project = u'pandas-ply'
63+
copyright = u'2014, Coursera'
64+
65+
# The version info for the project you're documenting, acts as replacement for
66+
# |version| and |release|, also used in various other places throughout the
67+
# built documents.
68+
#
69+
# The short X.Y version.
70+
version = '0.1.0'
71+
# The full version, including alpha/beta/rc tags.
72+
release = '0.1.0'
73+
74+
# The language for content autogenerated by Sphinx. Refer to documentation
75+
# for a list of supported languages.
76+
#language = None
77+
78+
# There are two options for replacing |today|: either, you set today to some
79+
# non-false value, then it is used:
80+
#today = ''
81+
# Else, today_fmt is used as the format for a strftime call.
82+
#today_fmt = '%B %d, %Y'
83+
84+
# List of patterns, relative to source directory, that match files and
85+
# directories to ignore when looking for source files.
86+
exclude_patterns = ['_build']
87+
88+
# The reST default role (used for this markup: `text`) to use for all documents.
89+
#default_role = None
90+
91+
# If true, '()' will be appended to :func: etc. cross-reference text.
92+
#add_function_parentheses = True
93+
94+
# If true, the current module name will be prepended to all description
95+
# unit titles (such as .. function::).
96+
#add_module_names = True
97+
98+
# If true, sectionauthor and moduleauthor directives will be shown in the
99+
# output. They are ignored by default.
100+
#show_authors = False
101+
102+
# The name of the Pygments (syntax highlighting) style to use.
103+
pygments_style = 'sphinx'
104+
105+
# A list of ignored prefixes for module index sorting.
106+
#modindex_common_prefix = []
107+
108+
109+
# -- Options for HTML output ---------------------------------------------------
110+
111+
# The theme to use for HTML and HTML Help pages. See the documentation for
112+
# a list of builtin themes.
113+
html_theme = 'sphinx_rtd_theme'
114+
115+
# Theme options are theme-specific and customize the look and feel of a theme
116+
# further. For a list of options available for each theme, see the
117+
# documentation.
118+
#html_theme_options = {}
119+
120+
# Add any paths that contain custom themes here, relative to this directory.
121+
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
122+
123+
# The name for this set of Sphinx documents. If None, it defaults to
124+
# "<project> v<release> documentation".
125+
#html_title = None
126+
127+
# A shorter title for the navigation bar. Default is the same as html_title.
128+
#html_short_title = None
129+
130+
# The name of an image file (relative to this directory) to place at the top
131+
# of the sidebar.
132+
#html_logo = None
133+
134+
# The name of an image file (within the static path) to use as favicon of the
135+
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
136+
# pixels large.
137+
#html_favicon = None
138+
139+
# Add any paths that contain custom static files (such as style sheets) here,
140+
# relative to this directory. They are copied after the builtin static files,
141+
# so a file named "default.css" will overwrite the builtin "default.css".
142+
#html_static_path = ['_static']
143+
144+
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
145+
# using the given strftime format.
146+
#html_last_updated_fmt = '%b %d, %Y'
147+
148+
# If true, SmartyPants will be used to convert quotes and dashes to
149+
# typographically correct entities.
150+
#html_use_smartypants = True
151+
152+
# Custom sidebar templates, maps document names to template names.
153+
#html_sidebars = {}
154+
155+
# Additional templates that should be rendered to pages, maps page names to
156+
# template names.
157+
#html_additional_pages = {}
158+
159+
# If false, no module index is generated.
160+
#html_domain_indices = True
161+
162+
# If false, no index is generated.
163+
#html_use_index = True
164+
165+
# If true, the index is split into individual pages for each letter.
166+
#html_split_index = False
167+
168+
# If true, links to the reST sources are added to the pages.
169+
#html_show_sourcelink = True
170+
171+
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
172+
#html_show_sphinx = True
173+
174+
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
175+
#html_show_copyright = True
176+
177+
# If true, an OpenSearch description file will be output, and all pages will
178+
# contain a <link> tag referring to it. The value of this option must be the
179+
# base URL from which the finished HTML is served.
180+
#html_use_opensearch = ''
181+
182+
# This is the file name suffix for HTML files (e.g. ".xhtml").
183+
#html_file_suffix = None
184+
185+
# Output file base name for HTML help builder.
186+
#htmlhelp_basename = 'pandas-plydoc'
187+
188+
189+
# -- Options for LaTeX output --------------------------------------------------
190+
191+
latex_elements = {
192+
# The paper size ('letterpaper' or 'a4paper').
193+
#'papersize': 'letterpaper',
194+
195+
# The font size ('10pt', '11pt' or '12pt').
196+
#'pointsize': '10pt',
197+
198+
# Additional stuff for the LaTeX preamble.
199+
#'preamble': '',
200+
}
201+
202+
# Grouping the document tree into LaTeX files. List of tuples
203+
# (source start file, target name, title, author, documentclass [howto/manual]).
204+
latex_documents = [
205+
('index', 'pandas-ply.tex', u'pandas-ply Documentation',
206+
u'Coursera', 'manual'),
207+
]
208+
209+
# The name of an image file (relative to this directory) to place at the top of
210+
# the title page.
211+
#latex_logo = None
212+
213+
# For "manual" documents, if this is true, then toplevel headings are parts,
214+
# not chapters.
215+
#latex_use_parts = False
216+
217+
# If true, show page references after internal links.
218+
#latex_show_pagerefs = False
219+
220+
# If true, show URL addresses after external links.
221+
#latex_show_urls = False
222+
223+
# Documents to append as an appendix to all manuals.
224+
#latex_appendices = []
225+
226+
# If false, no module index is generated.
227+
#latex_domain_indices = True
228+
229+
230+
# -- Options for manual page output --------------------------------------------
231+
232+
# One entry per manual page. List of tuples
233+
# (source start file, name, description, authors, manual section).
234+
man_pages = [
235+
('index', 'pandas-ply', u'pandas-ply Documentation',
236+
[u'Coursera'], 1)
237+
]
238+
239+
# If true, show URL addresses after external links.
240+
#man_show_urls = False
241+
242+
243+
# -- Options for Texinfo output ------------------------------------------------
244+
245+
# Grouping the document tree into Texinfo files. List of tuples
246+
# (source start file, target name, title, author,
247+
# dir menu entry, description, category)
248+
texinfo_documents = [
249+
('index', 'pandas-ply', u'pandas-ply Documentation',
250+
u'Coursera', 'pandas-ply', 'functional data manipulation for pandas',
251+
'Miscellaneous'),
252+
]
253+
254+
# Documents to append as an appendix to all manuals.
255+
#texinfo_appendices = []
256+
257+
# If false, no module index is generated.
258+
#texinfo_domain_indices = True
259+
260+
# How to display URL addresses: 'footnote', 'no', or 'inline'.
261+
#texinfo_show_urls = 'footnote'

‎docs/index.rst

+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
**pandas-ply**: functional data manipulation for pandas
2+
=======================================================
3+
4+
**pandas-ply** is a thin layer which makes it easier to manipulate data with `pandas <http://pandas.pydata.org/>`_. In particular, it provides elegant, functional, chainable syntax in cases where **pandas** would require mutation, saved intermediate values, or other awkward constructions. In this way, it aims to move **pandas** closer to the "grammar of data manipulation" provided by the `dplyr <http://cran.r-project.org/web/packages/dplyr/index.html>`_ package for R.
5+
6+
For example, take the **dplyr** code below:
7+
8+
.. code:: r
9+
10+
flights %>%
11+
group_by(year, month, day) %>%
12+
summarise(
13+
arr = mean(arr_delay, na.rm = TRUE),
14+
dep = mean(dep_delay, na.rm = TRUE)
15+
) %>%
16+
filter(arr > 30 & dep > 30)
17+
18+
The most common way to express this in **pandas** is probably:
19+
20+
.. code:: python
21+
22+
grouped_flights = flights.groupby(['year', 'month', 'day'])
23+
output = pd.DataFrame()
24+
output['arr'] = grouped_flights.arr_delay.mean()
25+
output['dep'] = grouped_flights.arr_delay.mean()
26+
filtered_output = output[(output.arr > 30) & (output.dep > 30)]
27+
28+
**pandas-ply** lets you instead write:
29+
30+
.. code:: python
31+
32+
(flights
33+
.groupby(['year', 'month', 'day'])
34+
.ply_select(
35+
arr = X.arr_delay.mean(),
36+
dep = X.dep_delay.mean())
37+
.ply_where(X.arr > 30, X.dep > 30))
38+
39+
In our opinion, this **pandas-ply** code is cleaner, more expressive, more readable, more concise, and less error-prone than the original **pandas** code.
40+
41+
Explanatory notes on the **pandas-ply** code sample above:
42+
43+
* **pandas-ply**'s methods (like ``ply_select`` and ``ply_where`` above) are attached directly to **pandas** objects and can be used immediately, without any wrapping or redirection. They start with a ``ply_`` prefix to distinguish them from built-in **pandas** methods.
44+
* **pandas-ply**'s methods are named for (and modelled after) SQL's operators. (But keep in mind that these operators will not always appear in the same order as they do in a SQL statement: ``SELECT a FROM b WHERE c GROUP BY d`` probably maps to ``b.ply_where(c).groupby(d).ply_select(a)``.)
45+
* **pandas-ply** includes a simple system for building "symbolic expressions" to provide as arguments to its methods. ``X`` above is an instance of ``ply.symbolic.Symbol``. Operations on this symbol produce larger compound symbolic expressions. When ``pandas-ply`` receives a symbolic expression as an argument, it converts it into a function. So, for instance, ``X.arr > 30`` in the above code could have instead been provided as ``lambda x: x.arr > 30``. Use of symbolic expressions allows the ``lambda x:`` to be left off, resulting in less cluttered code.
46+
47+
Warning
48+
-------
49+
50+
**pandas-ply** is new, and in an experimental stage of its development. The API is not yet stable. Expect the unexpected.
51+
52+
(Pull requests are welcome. Feel free to contact us at pandas-ply@coursera.org.)
53+
54+
Using **pandas-ply**
55+
--------------------
56+
57+
Install **pandas-ply** with:
58+
59+
::
60+
61+
$ pip install pandas-ply
62+
63+
64+
Typical use of **pandas-ply** starts with:
65+
66+
.. code:: python
67+
68+
import pandas as pd
69+
from ply import install_ply, X, sym_call
70+
71+
install_ply(pd)
72+
73+
After calling ``install_ply``, all **pandas** objects have **pandas-ply**'s methods attached.
74+
75+
API reference
76+
-------------
77+
78+
pandas extensions
79+
~~~~~~~~~~~~~~~~~
80+
81+
.. automodule:: ply.methods
82+
:members:
83+
:undoc-members:
84+
:show-inheritance:
85+
86+
`ply.symbolic`
87+
~~~~~~~~~~~~~~
88+
89+
.. automodule:: ply.symbolic
90+
:members:
91+
:undoc-members:
92+
:private-members:
93+
:show-inheritance:

‎dplyr-comparison.html

+1,209
Large diffs are not rendered by default.

‎ply/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
from methods import install_ply
2+
from symbolic import X, sym_call

‎ply/methods.py

+206
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
"""This module contains the **pandas-ply** methods which are designed to be
2+
added to panda objects. The methods in this module should not be used directly.
3+
Instead, the function `install_ply` should be used to attach them to the pandas
4+
classes."""
5+
6+
import symbolic
7+
8+
pandas = None
9+
10+
def install_ply(pandas_to_use):
11+
"""Install `pandas-ply` onto the objects in a copy of `pandas`."""
12+
13+
global pandas
14+
pandas = pandas_to_use
15+
16+
pandas.DataFrame.ply_where = _ply_where
17+
pandas.DataFrame.ply_select = _ply_select
18+
19+
pandas.Series.ply_where = _ply_where
20+
21+
pandas.core.groupby.DataFrameGroupBy.ply_select = _ply_select_for_groups
22+
23+
pandas.core.groupby.SeriesGroupBy.ply_select = _ply_select_for_groups
24+
25+
26+
def _ply_where(self, *conditions):
27+
"""Filter a dataframe/series to only include rows/entries satisfying a
28+
given set of conditions.
29+
30+
Analogous to SQL's ``WHERE``, or dplyr's ``filter``.
31+
32+
Args:
33+
`*conditions`: Each should be a dataframe/series of booleans, a
34+
function returning such an object when run on the input dataframe,
35+
or a symbolic expression yielding such an object when evaluated
36+
with Symbol(0) mapped to the input dataframe. The input dataframe
37+
will be filtered by the AND of all the conditions.
38+
39+
Example:
40+
>>> flights.ply_where(X.month == 1, X.day == 1)
41+
[ same result as `flights[(flights.month == 1) & (flights.day == 1)]` ]
42+
"""
43+
44+
if not conditions:
45+
return self
46+
47+
evalled_conditions = [symbolic.to_callable(condition)(self)
48+
for condition in conditions]
49+
anded_evalled_conditions = reduce(
50+
lambda x, y: x & y, evalled_conditions)
51+
return self[anded_evalled_conditions]
52+
53+
54+
def _ply_select(self, *args, **kwargs):
55+
"""Transform a dataframe by selecting old columns and new (computed)
56+
columns.
57+
58+
Analogous to SQL's ``SELECT``, or dplyr's ``select`` / ``rename`` /
59+
``mutate`` / ``transmute``.
60+
61+
Args:
62+
`*args`: Each should be one of:
63+
64+
``'*'``
65+
says that all columns in the input dataframe should be
66+
included
67+
``'column_name'``
68+
says that `column_name` in the input dataframe should be
69+
included
70+
``'-column_name'``
71+
says that `column_name` in the input dataframe should be
72+
excluded.
73+
74+
If any `'-column_name'` is present, then `'*'` should be
75+
present, and if `'*'` is present, no 'column_name' should be
76+
present. Column-includes and column-excludes should not overlap.
77+
`**kwargs`: Each argument name will be the name of a new column in the
78+
output dataframe, with the column's contents determined by the
79+
argument contents. These contents can be given as a dataframe, a
80+
function (taking the input dataframe as its single argument), or a
81+
symbolic expression (taking the input dataframe as ``Symbol(0)``).
82+
kwarg-provided columns override arg-provided columns.
83+
84+
Example:
85+
>>> flights.ply_select('*',
86+
... gain = X.arr_delay - X.dep_delay,
87+
... speed = X.distance / X.air_time * 60)
88+
[ original dataframe, with two new computed columns added ]
89+
"""
90+
91+
input_columns = set(self.columns)
92+
93+
has_star = False
94+
include_columns = []
95+
exclude_columns = []
96+
for arg in args:
97+
if arg == '*':
98+
if has_star:
99+
raise ValueError('ply_select received repeated stars')
100+
has_star = True
101+
elif arg in input_columns:
102+
if arg in include_columns:
103+
raise ValueError(
104+
'ply_select received a repeated column-include (%s)' %
105+
arg)
106+
include_columns.append(arg)
107+
elif arg[0] == '-' and arg[1:] in input_columns:
108+
if arg in exclude_columns:
109+
raise ValueError(
110+
'ply_select received a repeated column-exclude (%s)' %
111+
arg[1:])
112+
exclude_columns.append(arg[1:])
113+
else:
114+
raise ValueError(
115+
'ply_select received a strange argument (%s)' %
116+
arg)
117+
if exclude_columns and not has_star:
118+
raise ValueError(
119+
'ply_select received column-excludes without an star')
120+
if has_star and include_columns:
121+
raise ValueError(
122+
'ply_select received both an star and column-includes')
123+
if set(include_columns) & set(exclude_columns):
124+
raise ValueError(
125+
'ply_select received overlapping column-includes and ' +
126+
'column-excludes')
127+
128+
include_columns_inc_star = self.columns if has_star else include_columns
129+
130+
output_columns = [col for col in include_columns_inc_star
131+
if col not in exclude_columns]
132+
133+
# Note: This maintains self's index even if output_columns is [].
134+
to_return = self[output_columns]
135+
136+
# Temporarily disable SettingWithCopyWarning, as setting columns on a
137+
# copy (`to_return`) is intended here.
138+
old_chained_assignment = pandas.options.mode.chained_assignment
139+
pandas.options.mode.chained_assignment = None
140+
141+
for column_name, column_value in kwargs.iteritems():
142+
evaluated_value = symbolic.to_callable(column_value)(self)
143+
# TODO: verify that evaluated_value is a series!
144+
if column_name == 'index':
145+
to_return.index = evaluated_value
146+
else:
147+
to_return[column_name] = evaluated_value
148+
149+
pandas.options.mode.chained_assignment = old_chained_assignment
150+
151+
return to_return
152+
153+
154+
# TODO: Ensure that an empty ply_select on a groupby returns a large dataframe
155+
def _ply_select_for_groups(self, **kwargs):
156+
"""Summarize a grouped dataframe or series.
157+
158+
Analogous to SQL's ``SELECT`` (when a ``GROUP BY`` is present), or dplyr's
159+
``summarise``.
160+
161+
Args:
162+
`**kwargs`: Each argument name will be the name of a new column in the
163+
output dataframe, with the column's contents determined by the
164+
argument contents. These contents can be given as a dataframe, a
165+
function (taking the input grouped dataframe as its single
166+
argument), or a symbolic expression (taking the input grouped
167+
dataframe as `Symbol(0)`).
168+
"""
169+
170+
to_return = pandas.DataFrame()
171+
172+
for column_name, column_value in kwargs.iteritems():
173+
evaluated_value = symbolic.to_callable(column_value)(self)
174+
if column_name == 'index':
175+
to_return.index = evaluated_value
176+
else:
177+
to_return[column_name] = evaluated_value
178+
179+
return to_return
180+
181+
182+
class PlyDataFrame:
183+
"""The following methods are added to `pandas.DataFrame`:"""
184+
185+
ply_where = _ply_where
186+
ply_select = _ply_select
187+
188+
189+
class PlySeries:
190+
"""The following methods are added to `pandas.Series`:"""
191+
192+
ply_where = _ply_where
193+
194+
195+
class PlyDataFrameGroupBy:
196+
"""The following methods are added to
197+
`pandas.core.groupby.DataFrameGroupBy`:"""
198+
199+
ply_select = _ply_select_for_groups
200+
201+
202+
class PlySeriesGroupBy:
203+
"""The following methods are added to
204+
`pandas.core.groupby.SeriesGroupBy`:"""
205+
206+
ply_select = _ply_select_for_groups

‎ply/symbolic.py

+202
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
"""`ply.symbolic` is a simple system for building "symbolic expressions" to
2+
provide as arguments to **pandas-ply**'s methods (in place of lambda
3+
expressions)."""
4+
5+
6+
class Expression:
7+
"""`Expression` is the (abstract) base class for symbolic expressions.
8+
Symbolic expressions are encoded representations of Python expressions,
9+
kept on ice until you are ready to evaluate them. Operations on
10+
symbolic expressions (like `my_expr.some_attr` or `my_expr(some_arg)` or
11+
`my_expr + 7`) are automatically turned into symbolic representations
12+
thereof -- nothing is actually done until the special evaluation method
13+
`_eval` is called.
14+
"""
15+
16+
def _eval(self, context, **options):
17+
"""Evaluate a symbolic expression.
18+
19+
Args:
20+
context: The context object for evaluation. Currently, this is a
21+
dictionary mapping symbol names to values,
22+
`**options`: Options for evaluation. Currently, the only option is
23+
`log`, which results in some debug output during evaluation if
24+
it is set to `True`.
25+
26+
Returns:
27+
anything
28+
"""
29+
raise NotImplementedError
30+
31+
def __repr__(self):
32+
raise NotImplementedError
33+
34+
def __coerce__(self, other):
35+
return None
36+
37+
def __getattr__(self, name):
38+
"""Construct a symbolic representation of `getattr(self, name)`."""
39+
return GetAttr(self, name)
40+
41+
def __call__(self, *args, **kwargs):
42+
"""Construct a symbolic representation of `self(*args, **kwargs)`."""
43+
return Call(self, args=args, kwargs=kwargs)
44+
45+
46+
# Here are the varieties of atomic / compound Expression.
47+
48+
49+
class Symbol(Expression):
50+
"""`Symbol(name)` is an atomic symbolic expression, labelled with an
51+
arbitrary `name`."""
52+
53+
def __init__(self, name):
54+
self._name = name
55+
56+
def _eval(self, context, **options):
57+
if options.get('log'):
58+
print 'Symbol._eval', repr(self)
59+
result = context[self._name]
60+
if options.get('log'):
61+
print 'Returning', repr(self), '=>', repr(result)
62+
return result
63+
64+
def __repr__(self):
65+
return 'Symbol(%s)' % repr(self._name)
66+
67+
68+
class GetAttr(Expression):
69+
"""`GetAttr(obj, name)` is a symbolic expression representing the result of
70+
`getattr(obj, name)`. (`obj` and `name` can themselves be symbolic.)"""
71+
72+
def __init__(self, obj, name):
73+
self._obj = obj
74+
self._name = name
75+
76+
def _eval(self, context, **options):
77+
if options.get('log'):
78+
print 'GetAttr._eval', repr(self)
79+
evaled_obj = eval_if_symbolic(self._obj, context, **options)
80+
result = getattr(evaled_obj, self._name)
81+
if options.get('log'):
82+
print 'Returning', repr(self), '=>', repr(result)
83+
return result
84+
85+
def __repr__(self):
86+
return 'getattr(%s, %s)' % (repr(self._obj), repr(self._name))
87+
88+
89+
class Call(Expression):
90+
"""`Call(func, args, kwargs)` is a symbolic expression representing the
91+
result of `func(*args, **kwargs)`. (`func`, each member of the `args`
92+
iterable, and each value in the `kwargs` dictionary can themselves be
93+
symbolic)."""
94+
95+
def __init__(self, func, args=[], kwargs={}):
96+
self._func = func
97+
self._args = args
98+
self._kwargs = kwargs
99+
100+
def _eval(self, context, **options):
101+
if options.get('log'):
102+
print 'Call._eval', repr(self)
103+
evaled_func = eval_if_symbolic(self._func, context, **options)
104+
evaled_args = [eval_if_symbolic(v, context, **options)
105+
for v in self._args]
106+
evaled_kwargs = {k: eval_if_symbolic(v, context, **options)
107+
for k, v in self._kwargs.iteritems()}
108+
result = evaled_func(*evaled_args, **evaled_kwargs)
109+
if options.get('log'):
110+
print 'Returning', repr(self), '=>', repr(result)
111+
return result
112+
113+
def __repr__(self):
114+
return '{func}(*{args}, **{kwargs})'.format(
115+
func=repr(self._func),
116+
args=repr(self._args),
117+
kwargs=repr(self._kwargs))
118+
119+
120+
def eval_if_symbolic(obj, context, **options):
121+
"""Evaluate an object if it is a symbolic expression, or otherwise just
122+
returns it back.
123+
124+
Args:
125+
obj: Either a symbolic expression, or anything else (in which case this
126+
is a noop).
127+
context: Passed as an argument to `obj._eval` if `obj` is symbolic.
128+
`**options`: Passed as arguments to `obj._eval` if `obj` is symbolic.
129+
130+
Returns:
131+
anything
132+
133+
Examples:
134+
>>> eval_if_symbolic(Symbol('x'), {'x': 10})
135+
10
136+
>>> eval_if_symbolic(7, {'x': 10})
137+
7
138+
"""
139+
return obj._eval(context, **options) if hasattr(obj, '_eval') else obj
140+
141+
142+
def to_callable(obj):
143+
"""Turn an object into a callable.
144+
145+
Args:
146+
obj: This can be
147+
148+
* **a symbolic expression**, in which case the output callable
149+
evaluates the expression with symbols taking values from the
150+
callable's arguments (listed arguments named according to their
151+
numerical index, keyword arguments named according to their
152+
string keys),
153+
* **a callable**, in which case the output callable is just the
154+
input object, or
155+
* **anything else**, in which case the output callable is a
156+
constant function which always returns the input object.
157+
158+
Returns:
159+
callable
160+
161+
Examples:
162+
>>> to_callable(Symbol(0) + Symbol('x'))(3, x=4)
163+
7
164+
>>> to_callable(lambda x: x + 1)(10)
165+
11
166+
>>> to_callable(12)(3, x=4)
167+
12
168+
"""
169+
if hasattr(obj, '_eval'):
170+
return lambda *args, **kwargs: obj._eval(dict(enumerate(args), **kwargs))
171+
elif callable(obj):
172+
return obj
173+
else:
174+
return lambda *args, **kwargs: obj
175+
176+
177+
def sym_call(func, *args, **kwargs):
178+
"""Construct a symbolic representation of `func(*args, **kwargs)`.
179+
180+
This is necessary because `func(symbolic)` will not (ordinarily) know to
181+
construct a symbolic expression when it receives the symbolic
182+
expression `symbolic` as a parameter (if `func` is not itself symbolic).
183+
So instead, we write `sym_call(func, symbolic)`.
184+
185+
Args:
186+
func: Function to call on evaluation (can be symbolic).
187+
`*args`: Arguments to provide to `func` on evaluation (can be symbolic).
188+
`**kwargs`: Keyword arguments to provide to `func` on evaluation (can be
189+
symbolic).
190+
191+
Returns:
192+
`ply.symbolic.Expression`
193+
194+
Example:
195+
>>> sym_call(math.sqrt, Symbol('x'))._eval({'x': 16})
196+
4
197+
"""
198+
199+
return Call(func, args=args, kwargs=kwargs)
200+
201+
X = Symbol(0)
202+
"""A Symbol for "the first argument" (for convenience)."""

‎setup.py

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from distutils.core import setup
2+
setup(
3+
name = 'pandas-ply',
4+
version = '0.1.0',
5+
author = 'Coursera Inc.',
6+
author_email = 'pandas-ply@coursera.org',
7+
packages = [
8+
'ply',
9+
],
10+
description = 'functional data manipulation for pandas',
11+
long_description = open('README.rst').read(),
12+
license = 'Apache License 2.0',
13+
url = 'https://github.com/coursera/pandas-ply',
14+
classifiers = [],
15+
)

‎tests/test_all.sh

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/bash
2+
3+
ls test_*.py | xargs -n 1 python

‎tests/test_methods.py

+196
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
import unittest
2+
3+
from pandas.util.testing import assert_frame_equal
4+
from pandas.util.testing import assert_series_equal
5+
from ply.methods import install_ply
6+
from ply.symbolic import X
7+
import pandas as pd
8+
9+
install_ply(pd)
10+
11+
12+
def assert_frame_equiv(df1, df2, **kwargs):
13+
""" Assert that two dataframes are equal, ignoring ordering of columns.
14+
15+
See http://stackoverflow.com/questions/14224172/equality-in-pandas-
16+
dataframes-column-order-matters
17+
"""
18+
return assert_frame_equal(
19+
df1.sort(axis=1),
20+
df2.sort(axis=1),
21+
check_names=True, **kwargs)
22+
23+
test_df = pd.DataFrame(
24+
{'x': [1, 2, 3, 4], 'y': [4, 3, 2, 1]},
25+
columns=['x', 'y'])
26+
test_series = pd.Series([1, 2, 3, 4])
27+
28+
test_dfsq = pd.DataFrame(
29+
{'x': [-2, -1, 0, 1, 2], 'xsq': [4, 1, 0, 1, 4]},
30+
columns=['x', 'xsq'])
31+
32+
33+
class PlyWhereTest(unittest.TestCase):
34+
35+
def test_no_conditions(self):
36+
assert_frame_equal(test_df.ply_where(), test_df)
37+
38+
def test_single_condition(self):
39+
expected = pd.DataFrame(
40+
{'x': [3, 4], 'y': [2, 1]},
41+
index=[2, 3],
42+
columns=['x', 'y'])
43+
44+
assert_frame_equal(test_df.ply_where(test_df.x > 2.5), expected)
45+
assert_frame_equal(test_df.ply_where(lambda df: df.x > 2.5), expected)
46+
assert_frame_equal(test_df.ply_where(X.x > 2.5), expected)
47+
48+
def test_multiple_conditions(self):
49+
expected = pd.DataFrame(
50+
{'x': [2, 3], 'y': [3, 2]},
51+
index=[1, 2],
52+
columns=['x', 'y'])
53+
54+
lo_df = test_df.x > 1.5
55+
hi_df = test_df.x < 3.5
56+
lo_func = lambda df: df.x > 1.5
57+
hi_func = lambda df: df.x < 3.5
58+
lo_sym = X.x > 1.5
59+
hi_sym = X.x < 3.5
60+
61+
for lo in [lo_df, lo_func, lo_sym]:
62+
for hi in [hi_df, hi_func, hi_sym]:
63+
assert_frame_equal(test_df.ply_where(lo, hi), expected)
64+
65+
66+
class PlyWhereForSeriesTest(unittest.TestCase):
67+
68+
def test_no_conditions(self):
69+
assert_series_equal(test_series.ply_where(), test_series)
70+
71+
def test_single_condition(self):
72+
expected = pd.Series([3, 4], index=[2, 3])
73+
74+
assert_series_equal(test_series.ply_where(test_series > 2.5), expected)
75+
assert_series_equal(test_series.ply_where(lambda s: s > 2.5), expected)
76+
assert_series_equal(test_series.ply_where(X > 2.5), expected)
77+
78+
def test_multiple_conditions(self):
79+
expected = pd.Series([2, 3], index=[1, 2])
80+
81+
assert_series_equal(
82+
test_series.ply_where(test_series < 3.5, test_series > 1.5), expected)
83+
assert_series_equal(
84+
test_series.ply_where(test_series < 3.5, lambda s: s > 1.5), expected)
85+
assert_series_equal(
86+
test_series.ply_where(test_series < 3.5, X > 1.5), expected)
87+
assert_series_equal(
88+
test_series.ply_where(lambda s: s < 3.5, lambda s: s > 1.5), expected)
89+
assert_series_equal(
90+
test_series.ply_where(lambda s: s < 3.5, X > 1.5), expected)
91+
assert_series_equal(
92+
test_series.ply_where(X < 3.5, X > 1.5), expected)
93+
94+
95+
class PlySelectTest(unittest.TestCase):
96+
97+
def test_bad_arguments(self):
98+
# Nonexistent column, include or exclude
99+
with self.assertRaises(ValueError):
100+
test_df.ply_select('z')
101+
with self.assertRaises(ValueError):
102+
test_df.ply_select('-z')
103+
104+
# Exclude without asterisk
105+
with self.assertRaises(ValueError):
106+
test_df.ply_select('-x')
107+
108+
# Include with asterisk
109+
with self.assertRaises(ValueError):
110+
test_df.ply_select('*', 'x')
111+
112+
def test_noops(self):
113+
assert_frame_equal(test_df.ply_select('*'), test_df)
114+
assert_frame_equal(test_df.ply_select('x', 'y'), test_df)
115+
assert_frame_equiv(test_df.ply_select(x=X.x, y=X.y), test_df)
116+
117+
def test_reorder(self):
118+
reordered = test_df.ply_select('y', 'x')
119+
assert_frame_equiv(reordered, test_df[['y', 'x']])
120+
self.assertEqual(list(reordered.columns), ['y', 'x'])
121+
122+
def test_subset_via_includes(self):
123+
assert_frame_equal(test_df.ply_select('x'), test_df[['x']])
124+
assert_frame_equal(test_df.ply_select('y'), test_df[['y']])
125+
126+
def test_subset_via_excludes(self):
127+
assert_frame_equal(test_df.ply_select('*', '-y'), test_df[['x']])
128+
assert_frame_equal(test_df.ply_select('*', '-x'), test_df[['y']])
129+
130+
def test_empty(self):
131+
assert_frame_equal(test_df.ply_select(), test_df[[]])
132+
assert_frame_equal(test_df.ply_select('*', '-x', '-y'), test_df[[]])
133+
134+
def test_ways_of_providing_new_columns(self):
135+
# Value
136+
assert_frame_equal(
137+
test_df.ply_select(new=5),
138+
pd.DataFrame({'new': [5, 5, 5, 5]}))
139+
140+
# Dataframe-like
141+
assert_frame_equal(
142+
test_df.ply_select(new=[5, 6, 7, 8]),
143+
pd.DataFrame({'new': [5, 6, 7, 8]}))
144+
145+
# Function
146+
assert_frame_equal(
147+
test_df.ply_select(new=lambda df: df.x),
148+
pd.DataFrame({'new': [1, 2, 3, 4]}))
149+
150+
# Symbolic expression
151+
assert_frame_equal(
152+
test_df.ply_select(new=X.x),
153+
pd.DataFrame({'new': [1, 2, 3, 4]}))
154+
155+
def test_old_and_new_together(self):
156+
assert_frame_equal(
157+
test_df.ply_select('x', total=X.x + X.y),
158+
pd.DataFrame(
159+
{'x': [1, 2, 3, 4], 'total': [5, 5, 5, 5]},
160+
columns=['x', 'total']))
161+
162+
def test_kwarg_overrides_asterisk(self):
163+
assert_frame_equal(
164+
test_df.ply_select('*', y=X.x),
165+
pd.DataFrame({'x': [1, 2, 3, 4], 'y': [1, 2, 3, 4]}))
166+
167+
def test_kwarg_overrides_column_include(self):
168+
assert_frame_equal(
169+
test_df.ply_select('x', 'y', y=X.x),
170+
pd.DataFrame({'x': [1, 2, 3, 4], 'y': [1, 2, 3, 4]}))
171+
172+
def test_new_index(self):
173+
assert_frame_equal(
174+
test_df.ply_select('x', index=X.y),
175+
pd.DataFrame(
176+
{'x': [1, 2, 3, 4]},
177+
index=pd.Index([4, 3, 2, 1], name='y')))
178+
179+
180+
class PlySelectForGroupsTest(unittest.TestCase):
181+
182+
def test_simple(self):
183+
grp = test_dfsq.groupby('xsq')
184+
assert_frame_equal(
185+
grp.ply_select(count=X.x.count()),
186+
pd.DataFrame(
187+
{'count': [1, 2, 2]},
188+
index=pd.Index([0, 1, 4], name='xsq')))
189+
190+
191+
if __name__ == '__main__':
192+
try:
193+
from colour_runner.runner import ColourTextTestRunner
194+
unittest.main(verbosity=2, testRunner=ColourTextTestRunner)
195+
except ImportError:
196+
unittest.main(verbosity=2)

‎tests/test_symbolic.py

+248
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
import unittest
2+
import mock
3+
4+
from ply.symbolic import Call
5+
from ply.symbolic import GetAttr
6+
from ply.symbolic import Symbol
7+
from ply.symbolic import eval_if_symbolic
8+
from ply.symbolic import sym_call
9+
from ply.symbolic import to_callable
10+
11+
12+
class ExpressionTest(unittest.TestCase):
13+
14+
# These test whether operations on symbolic expressions correctly construct
15+
# compound symbolic expressions:
16+
17+
def test_getattr(self):
18+
expr = Symbol('some_symbol').some_attr
19+
self.assertEqual(
20+
repr(expr),
21+
"getattr(Symbol('some_symbol'), 'some_attr')")
22+
23+
def test_call(self):
24+
expr = Symbol('some_symbol')('arg1', 'arg2', kwarg_name='kwarg value')
25+
self.assertEqual(
26+
repr(expr),
27+
"Symbol('some_symbol')(*('arg1', 'arg2'), " +
28+
"**{'kwarg_name': 'kwarg value'})")
29+
30+
def test_ops(self):
31+
expr = Symbol('some_symbol') + 1
32+
self.assertEqual(
33+
repr(expr),
34+
"getattr(Symbol('some_symbol'), '__add__')(*(1,), **{})")
35+
36+
expr = 1 + Symbol('some_symbol')
37+
self.assertEqual(
38+
repr(expr),
39+
"getattr(Symbol('some_symbol'), '__radd__')(*(1,), **{})")
40+
41+
expr = Symbol('some_symbol')['key']
42+
self.assertEqual(
43+
repr(expr),
44+
"getattr(Symbol('some_symbol'), '__getitem__')(*('key',), **{})")
45+
46+
47+
class SymbolTest(unittest.TestCase):
48+
49+
def test_eval(self):
50+
self.assertEqual(
51+
Symbol('some_symbol')._eval({'some_symbol': 'value'}),
52+
'value')
53+
self.assertEqual(
54+
Symbol('some_symbol')._eval(
55+
{'some_symbol': 'value', 'other_symbol': 'irrelevant'}),
56+
'value')
57+
with self.assertRaises(KeyError):
58+
Symbol('some_symbol')._eval({'other_symbol': 'irrelevant'}),
59+
60+
def test_repr(self):
61+
self.assertEqual(repr(Symbol('some_symbol')), "Symbol('some_symbol')")
62+
63+
64+
class GetAttrTest(unittest.TestCase):
65+
66+
def test_eval_with_nonsymbolic_object(self):
67+
some_obj = mock.Mock()
68+
del some_obj._eval
69+
# Ensure constructing the expression does not access `.some_attr`.
70+
del some_obj.some_attr
71+
72+
with self.assertRaises(AttributeError):
73+
some_obj.some_attr
74+
expr = GetAttr(some_obj, 'some_attr')
75+
76+
some_obj.some_attr = 'attribute value'
77+
78+
self.assertEqual(expr._eval({}), 'attribute value')
79+
80+
def test_eval_with_symbolic_object(self):
81+
some_obj = mock.Mock()
82+
del some_obj._eval
83+
some_obj.some_attr = 'attribute value'
84+
85+
expr = GetAttr(Symbol('some_symbol'), 'some_attr')
86+
87+
self.assertEqual(
88+
expr._eval({'some_symbol': some_obj}),
89+
'attribute value')
90+
91+
def test_repr(self):
92+
self.assertEqual(
93+
repr(GetAttr('object', 'attrname')),
94+
"getattr('object', 'attrname')")
95+
96+
97+
class CallTest(unittest.TestCase):
98+
99+
def test_eval_with_nonsymbolic_func(self):
100+
func = mock.Mock(return_value='return value')
101+
del func._eval # So it doesn't pretend to be symbolic
102+
103+
expr = Call(func, ('arg1', 'arg2'), {'kwarg_name': 'kwarg value'})
104+
105+
# Ensure constructing the expression does not call the function
106+
self.assertFalse(func.called)
107+
108+
result = expr._eval({})
109+
110+
func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
111+
self.assertEqual(result, 'return value')
112+
113+
def test_eval_with_symbolic_func(self):
114+
func = mock.Mock(return_value='return value')
115+
del func._eval # So it doesn't pretend to be symbolic
116+
117+
expr = Call(
118+
Symbol('some_symbol'),
119+
('arg1', 'arg2'),
120+
{'kwarg_name': 'kwarg value'})
121+
122+
result = expr._eval({'some_symbol': func})
123+
124+
func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
125+
self.assertEqual(result, 'return value')
126+
127+
def test_eval_with_symbolic_arg(self):
128+
func = mock.Mock(return_value='return value')
129+
del func._eval # So it doesn't pretend to be symbolic
130+
131+
expr = Call(
132+
func,
133+
(Symbol('some_symbol'), 'arg2'),
134+
{'kwarg_name': 'kwarg value'})
135+
136+
result = expr._eval({'some_symbol': 'arg1'})
137+
138+
func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
139+
self.assertEqual(result, 'return value')
140+
141+
def test_eval_with_symbol_kwarg(self):
142+
func = mock.Mock(return_value='return value')
143+
del func._eval # So it doesn't pretend to be symbolic
144+
145+
expr = Call(
146+
func,
147+
('arg1', 'arg2'),
148+
{'kwarg_name': Symbol('some_symbol')})
149+
150+
result = expr._eval({'some_symbol': 'kwarg value'})
151+
152+
func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
153+
self.assertEqual(result, 'return value')
154+
155+
def test_repr(self):
156+
# One arg
157+
self.assertEqual(
158+
repr(Call('func', ('arg1',), {'kwarg_name': 'kwarg value'})),
159+
"'func'(*('arg1',), **{'kwarg_name': 'kwarg value'})")
160+
161+
# Two args
162+
self.assertEqual(
163+
repr(Call(
164+
'func',
165+
('arg1', 'arg2'),
166+
{'kwarg_name': 'kwarg value'})),
167+
"'func'(*('arg1', 'arg2'), **{'kwarg_name': 'kwarg value'})")
168+
169+
170+
class FunctionsTest(unittest.TestCase):
171+
172+
def test_eval_if_symbolic(self):
173+
self.assertEqual(
174+
eval_if_symbolic(
175+
'nonsymbolic',
176+
{'some_symbol': 'symbol_value'}),
177+
'nonsymbolic')
178+
self.assertEqual(
179+
eval_if_symbolic(
180+
Symbol('some_symbol'),
181+
{'some_symbol': 'symbol_value'}),
182+
'symbol_value')
183+
184+
def test_to_callable_from_nonsymbolic_noncallable(self):
185+
test_callable = to_callable('nonsymbolic')
186+
self.assertEqual(
187+
test_callable('arg1', 'arg2', kwarg_name='kwarg value'),
188+
'nonsymbolic')
189+
190+
def test_to_callable_from_nonsymbolic_callable(self):
191+
func = mock.Mock(return_value='return value')
192+
del func._eval # So it doesn't pretend to be symbolic
193+
194+
test_callable = to_callable(func)
195+
196+
# Ensure running to_callable does not call the function
197+
self.assertFalse(func.called)
198+
199+
result = test_callable('arg1', 'arg2', kwarg_name='kwarg value')
200+
201+
func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
202+
self.assertEqual(result, 'return value')
203+
204+
def test_to_callable_from_symbolic(self):
205+
mock_expr = mock.Mock()
206+
mock_expr._eval.return_value = 'eval return value'
207+
208+
test_callable = to_callable(mock_expr)
209+
210+
# Ensure running to_callable does not evaluate the expression
211+
self.assertFalse(mock_expr._eval.called)
212+
213+
result = test_callable('arg1', 'arg2', kwarg_name='kwarg value')
214+
215+
mock_expr._eval.assert_called_once_with(
216+
{0: 'arg1', 1: 'arg2', 'kwarg_name': 'kwarg value'})
217+
self.assertEqual(result, 'eval return value')
218+
219+
def test_sym_call(self):
220+
expr = sym_call(
221+
'func', Symbol('some_symbol'), 'arg1', 'arg2',
222+
kwarg_name='kwarg value')
223+
self.assertEqual(
224+
repr(expr),
225+
"'func'(*(Symbol('some_symbol'), 'arg1', 'arg2'), " +
226+
"**{'kwarg_name': 'kwarg value'})")
227+
228+
229+
class IntegrationTest(unittest.TestCase):
230+
231+
def test_pythagoras(self):
232+
from math import sqrt
233+
234+
X = Symbol('X')
235+
Y = Symbol('Y')
236+
237+
expr = sym_call(sqrt, X ** 2 + Y ** 2)
238+
func = to_callable(expr)
239+
240+
self.assertEqual(func(X=3, Y=4), 5)
241+
242+
243+
if __name__ == '__main__':
244+
try:
245+
from colour_runner.runner import ColourTextTestRunner
246+
unittest.main(verbosity=2, testRunner=ColourTextTestRunner)
247+
except ImportError:
248+
unittest.main(verbosity=2)

0 commit comments

Comments
 (0)
Please sign in to comment.