Setting Up a Geospatial Python tool box with conda

Anaconda

I’m using Ubuntu 15.04 and it comes with Python 2.7 and Python 3.4 pre-installed. Installing into this Python installation requires root access to the system and it may be better to use a virtual environment when installing packages with pip, which is a little confusing. Also, if you are on a Windows machine installing packages may be complicated.

To me the easier solution to setting up a geospatial environment  is to use Conda (a open source,  cross platform package manager application that installs, runs, and updates packages and their dependencies) and doesn’t require administrator privileges to install anything.

To install Conda, you will download Anaconda or Miniconda (both are free). Anaconda is a Python distribution that  includes the most popular Python packages for science, maths, engineering, and data analysis, while Miniconda is just a conda and python installation.

  • Miniconda, minimal installation that only includes conda and its dependencies. The final installation is 400 Mb. You need to install your packages manually
  • Install Anaconda, it will includes 150 scientific packages automatically installed at once. Minimum 3 GB disk space to download and install

Once you have the installer for you OS, just follow the instructions. In my case I’ll install Miniconda 3 for Linux, this will install python 3, but later you can create an environment with python 2 if you need to.

# install Miniconda
$ bash Miniconda3-latest-Linux-x86_64.sh

# update conda
$ conda update conda

This will create a new folder with the Miniconda installation on:  home/username/miniconda3

Let’s create a new environment called geospatial with the most important packages on it (Numpy, Shapely, Matplotlit, SciPy, Pandas…). Later I’ll explain a little more why we need this packages.

$ conda create --name geospatial numpy shapely matplotlib rasterio fiona pandas ipython pysal scipy pyproj

Then you can check the environments that you have installed on your computer:

$ conda info --envs
geospatial               /home/username/miniconda3/envs/geospatial
root                  *  /home/username/miniconda3

The * indicates the active environment, to activate another you only need to type:

# On Linux and Mac OS X 
$ source activate geospatial 

# On Windows 
> activate numpy16

To add a new package (for example Pil) you just can do it with conda:

$ conda install pil

To check the installed packages just use:

$ conda list

To remove a package:

$ conda remove pil

Some packages are not available using conda install,  but we can look in the repository Anaconda.org (a package management service for both public and private package repositories). For example, GDAL can be found there: https://anaconda.org/osgeo/gdal

To install this package with conda:

conda install -c https://conda.anaconda.org/osgeo gdal

If a package is not available from conda or Anaconda.org, you always may be able to find and install the package with another package manager like pip.

The ‘thing‘ about conda is that it manages the packages versions and compatibilities, it will install dependency packages or upgrade or downgrade some specific packages  if needed on your environment.

Conda enviroment Files

One of the most powerful things about using conda is the environment file. When you are working on a specific project with your team you may want to share your environment with another person so they can re-create something you have done. To allow to quickly reproduce your environment, with all of its packages and versions, you have the environment.yml file. This file can be the equivalent of requirements.txt if you are using virtualenvs.

To create one environment file of the active environment just run:

$ conda env export > environment.yml

The file will contain the name and the dependencies of that specific environment (including packages and python version)

name: geospatial
dependencies:
- affine=1.1.0=py27_0
- armadillo=5.200.2=1
- cairo=1.12.18=6
- click=6.3=py27_0
- cligj=0.2.0=py27_0
- clyent=1.2.1=py27_0
- curl=7.45.0=0
- cycler=0.10.0=py27_0
- decorator=4.0.9=py27_0
...

To recreate this environment in other computer you just need to run this from the folder where the yml file is:

$ conda env create -f environment.yml

You will find more information about managing environments and other things about conda here: http://conda.pydata.org/docs/test-drive.html

Geospatial Python packages

Now we must decide what package to install based on our needs, as said before, installing with conda means that we don’t need to be worried about dependencies before installing a specific package, conda will take care and install the necessary packages altogether. The most important are:

http://www.numpy.org/

NumPy is the fundamental package for scientific computing with Python. It gives support for matrices, multidimensional arrays, and math functions.  NumPy is necessary for other libraries to function properly.

$ conda install numpy

Pandas

Pandas is a high-performance Python data analysis library, which can handle
large tabular datasets.

$ conda install pandas

 http://ipython.org/

IPython is an enhanced interactive Python shell that replaces the normal Python console with some extra features: tab-completion, object introspection, system shell access, command history retrieval etc.

$ conda install ipython

SciPy

A Python-based ecosystem of open-source software for mathematics, science, and engineering. These are some of the core packages that will include: Pandas, NumPy, Matplotlib, Sympy, IPython

$ conda install scipy

PySal is an Open Source Python Library for Spatial Analytical Functions like: spatial weights or spatial autocorrelation

$ conda install pysal

pyprojPyProj

PyProj is the python interface to PROJ4 library for cartographic transformations and projections.

$ conda install pyproj

GDAL

The Geospatial Data Abstraction Library (GDAL) is a translator library for raster and vector geospatial data formats released  by the Open Source Geospatial Foundation.

Within the GDAL library are two parts: the GDAL component which supports the reading/writing/translation of raster formats, and the OGR component which supports reading/writing/translation of vector data.

$ conda install -c https://conda.anaconda.org/osgeo gdal

Shapelyshapely

Python package for manipulation and analysis of geometric objects in the Cartesian plane.

$ conda install shapely

matplotlib

matplotlib is a python 2D plotting library which produces publication quality figures. Graphs can be plotted on a variety of formats, including scalable vector graphic (svg).

$ conda install matplotlib

DescartesDescartes

The descartes library provides a better integration of Shapely geometry objects within Matplotlib. Requires: matplotlib, numpy, and optionally Shapely 1.2+.

$ conda install -c https://conda.anaconda.org/ioos descartes

 

pyshpPyshp

The Python Shapefile Library (pyshp) provides read and write support for the Esri Shapefile format.

$ conda install pyshp

geojsongeojson

Python bindings and utilities for GeoJSON formatted data

$ conda install geojson

geopandasgeopandas

GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.

$ conda install -c http://conda.anaconda.org/rsignell geopandas

rasteriorasterio

Rasterio reads and writes geospatial raster datasets. Rasterio employs GDAL under the hood for file I/O and raster formatting. Its functions typically accept and return Numpy n-arrays.

$ conda install rasterio

FionaFiona

Fiona reads and writes spatial data files, is a simple Python API around the OGR library for data access.

$ conda install fiona

pilPIL

Python Imaging Library (PIL). This library supports many file formats (bmp, gif, jpeg, pdf, tiff, eps…), and provides powerful image processing and graphics capabilities.

$ conda install pil

A tutorial is available here http://infohost.nmt.edu/tcc/help/pubs/pil/index.html

pillowPillow (the friendly PIL fork)

Unfortunately,  PIL last release was in 2009. But good news, there’s an active fork of PIL called Pillow that supports Python 3

$ conda install pillow

Pillow tutorial: http://pillow.readthedocs.org/en/3.0.x/handbook/tutorial.html

Spectral PythonSpectral Python

Spectral Python (SPy) is a pure Python module for processing hyperspectral image data. It has functions for reading, displaying, manipulating, and classifying hyperspectral imagery.  SPy is a very advanced Python package for remote sensing.

$ conda install -c http://conda.anaconda.org/rbacher spectral