Posts

Python data pipelines similar to R's '%>%'

Since a few years, pipelines (via %>% of the magrittr package) are quite popular in R and the grown ecosystem of the “tidyverse” is built around pipelines. Having tried both the pandas syntax (e.g. chaining like df.groupby().mean() or plain function2(function1(input))) and the R’s pipeline syntax, I have to admit that I like the pipeline syntax a lot more. In my opinion the strengths of R’s pipeline syntax are: The same verbs can be used for different inputs (there are SQL backends for dplyr), thanks to R’s single-dispatch mechanism (called S3 objects).
Read More…

Python development on Windows: making it comfortable

Recently someone was surprised that I use windows as my main dev machine as other OS usually are developer friendly. Out of the box, this is true. But to make yourself at home as a developer, you usually change a lot of things, no matter if you are using OS X, Linux or Win. So here is what I use: proper command line: cmder with git Pycharm + Notepad++ as editor python from miniconda with multiple envs jupyter notebook with a conda env kernel manager Not all is windows specific… I actually suspect that a lot is windows agnostic and I would use a similar setup on a different OS…
Read More…

How to refresh conda patches

Conda recipes can contain patches which are applied on top of the source for the package. When updating the package to a new upstream version, these patches need to be checked if the still apply (or are still needed). This is the way I do it currently (be aware that I work on windows, so you might need to change some slashes…)… Preparation # makes the "patch" command available... set "PATH=%path%;C:\Program Files\Git\usr\bin\" # Update the latest source for matplotlib.
Read More…

Demo mode for IPython (works in the notebook)

R has a demo mode, which lets you execute some demo of a function or a package. See e.g. demo(lm.glm) for such a thing. An PR in IPython-extensions lets you do much the same: It will get some demo code (which can be a function in a package or the matplotlib examples on github) and lets you execute that code by yourself. Specially formatted comments in the function will get turned into formatted text, if the frontend suppports it.
Read More…

Automatic building of python wheels and conda packages

Recently I found the conda-forge project on github which makes it easy to auto build and upload your python project as a (native) conda package. Conda-forge introduces the concept of a “smithy” (a repository on github) which builds the conda packages for the main repository. A smithy connects to three different CI services to get builds for all three major platforms: Travis for Mac OS X, CircleCI for Linux and AppVeyor for Windows.
Read More…

More functions for working with JSON data / nested structures

I updated the functions in my last blog post (rename the functions and added a few corner cases) and added a new convert_to_dataframe_input function: # can be a dict or a list of structures data = {"ID1":{"result":{"name":"Jan Schulz"}}, "ID2":{"result": {"name":"Another name", "bday":"1.1.2000"}}} converter_dict = dict( names = "result.name", bday = "result.bday" ) import pandas as pd print(pd.DataFrame(convert_to_dataframe_input(data, converter_dict))) ## _index bday names ## 0 ID1 NaN Jan Schulz ## 1 ID2 1.
Read More…

Two functions for working with JSON/dicts

I recently had to explore a JSON API and came up with the following twothree functions to make working with the returned JSON/dict easier: [Update 2015-11-10: you might like dripper, which does much of this code snippet…] [Update 2015-09-26: updates to code and new convert_to_dataframe_input function: see here for a post about it] _null = object() def get_from_structure(data, name, default=_null): """Return the element with the given name. `data` is a structure containing lists, dicts or scalar values.
Read More…