I’m using Ubuntu 15.04 and it comes with Python 2.7 and Python 3.4 pre-installed.
Installing into this Python installation requires root access to the system and it may be better to use a virtual environment when installing packages with pip, which is a little confusing. Also, if you are on a Windows machine installing packages may be complicated.
To me the easier solution to setting up a geospatial environment is to use Conda (a open source, cross platform package manager application that installs, runs, and updates packages and their dependencies) and doesn’t require administrator privileges to install anything.
To install Conda, you will download Anaconda or Miniconda (both are free). Anaconda is a Python distribution that includes the most popular Python packages for science, maths, engineering, and data analysis, while Miniconda is just a conda and python installation.
- Miniconda, minimal installation that only includes conda and its dependencies. The final installation is 400 Mb. You need to install your packages manually
- Install Anaconda, it will includes 150 scientific packages automatically installed at once. Minimum 3 GB disk space to download and install
Once you have the installer for you OS, just follow the instructions. In my case I’ll install Miniconda 3 for Linux, this will install python 3, but later you can create an environment with python 2 if you need to.
# install Miniconda $ bash Miniconda3-latest-Linux-x86_64.sh # update conda $ conda update conda
This will create a new folder with the Miniconda installation on: home/username/miniconda3
Let’s create a new environment called geospatial with the most important packages on it (Numpy, Shapely, Matplotlit, SciPy, Pandas…). Later I’ll explain a little more why we need this packages.
$ conda create --name geospatial numpy shapely matplotlib rasterio fiona pandas ipython pysal scipy pyproj
Then you can check the environments that you have installed on your computer:
$ conda info --envs geospatial /home/username/miniconda3/envs/geospatial root * /home/username/miniconda3
The * indicates the active environment, to activate another you only need to type:
# On Linux and Mac OS X $ source activate geospatial # On Windows > activate numpy16
To add a new package (for example Pil) you just can do it with conda:
$ conda install pil
To check the installed packages just use:
$ conda list
To remove a package:
$ conda remove pil
Some packages are not available using conda install, but we can look in the repository Anaconda.org (a package management service for both public and private package repositories). For example, GDAL can be found there: https://anaconda.org/osgeo/gdal
To install this package with conda:
conda install -c https://conda.anaconda.org/osgeo gdal
If a package is not available from conda or Anaconda.org, you always may be able to find and install the package with another package manager like pip.
The ‘thing‘ about conda is that it manages the packages versions and compatibilities, it will install dependency packages or upgrade or downgrade some specific packages if needed on your environment.
Conda enviroment Files
One of the most powerful things about using conda is the environment file. When you are working on a specific project with your team you may want to share your environment with another person so they can re-create something you have done. To allow to quickly reproduce your environment, with all of its packages and versions, you have the environment.yml file. This file can be the equivalent of requirements.txt if you are using virtualenvs.
To create one environment file of the active environment just run:
$ conda env export > environment.yml
The file will contain the name and the dependencies of that specific environment (including packages and python version)
name: geospatial dependencies: - affine=1.1.0=py27_0 - armadillo=5.200.2=1 - cairo=1.12.18=6 - click=6.3=py27_0 - cligj=0.2.0=py27_0 - clyent=1.2.1=py27_0 - curl=7.45.0=0 - cycler=0.10.0=py27_0 - decorator=4.0.9=py27_0 ...
To recreate this environment in other computer you just need to run this from the folder where the yml file is:
$ conda env create -f environment.yml
You will find more information about managing environments and other things about conda here: http://conda.pydata.org/docs/test-drive.html
Geospatial Python packages
Now we must decide what package to install based on our needs, as said before, installing with conda means that we don’t need to be worried about dependencies before installing a specific package, conda will take care and install the necessary packages altogether. The most important are:
NumPy is the fundamental package for scientific computing with Python. It gives support for matrices, multidimensional arrays, and math functions. NumPy is necessary for other libraries to function properly.
$ conda install numpy
Pandas is a high-performance Python data analysis library, which can handle
large tabular datasets.
$ conda install pandas
IPython is an enhanced interactive Python shell that replaces the normal Python console with some extra features: tab-completion, object introspection, system shell access, command history retrieval etc.
$ conda install ipython
A Python-based ecosystem of open-source software for mathematics, science, and engineering. These are some of the core packages that will include: Pandas, NumPy, Matplotlib, Sympy, IPython
$ conda install scipy
PySal is an Open Source Python Library for Spatial Analytical Functions like: spatial weights or spatial autocorrelation
$ conda install pysal
PyProj is the python interface to PROJ4 library for cartographic transformations and projections.
$ conda install pyproj
The Geospatial Data Abstraction Library (GDAL) is a translator library for raster and vector geospatial data formats released by the Open Source Geospatial Foundation.
Within the GDAL library are two parts: the GDAL component which supports the reading/writing/translation of raster formats, and the OGR component which supports reading/writing/translation of vector data.
$ conda install -c https://conda.anaconda.org/osgeo gdal
Python package for manipulation and analysis of geometric objects in the Cartesian plane.
$ conda install shapely
matplotlib is a python 2D plotting library which produces publication quality figures. Graphs can be plotted on a variety of formats, including scalable vector graphic (svg).
$ conda install matplotlib
The descartes library provides a better integration of Shapely geometry objects within Matplotlib. Requires: matplotlib, numpy, and optionally Shapely 1.2+.
conda install -c https://conda.anaconda.org/ioos descartes
The Python Shapefile Library (pyshp) provides read and write support for the Esri Shapefile format.
$ conda install pyshp
Python bindings and utilities for GeoJSON formatted data
$ conda install geojson
GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.
$ conda install -c http://conda.anaconda.org/rsignell geopandas
Rasterio reads and writes geospatial raster datasets. Rasterio employs GDAL under the hood for file I/O and raster formatting. Its functions typically accept and return Numpy n-arrays.
$ conda install rasterio
Fiona reads and writes spatial data files, is a simple Python API around the OGR library for data access.
$ conda install fiona
Python Imaging Library (PIL). This library supports many file formats (bmp, gif, jpeg, pdf, tiff, eps…), and provides powerful image processing and graphics capabilities.
$ conda install pil
A tutorial is available here http://infohost.nmt.edu/tcc/help/pubs/pil/index.html
Unfortunately, PIL last release was in 2009. But good news, there’s an active fork of PIL called Pillow that supports Python 3
$ conda install pillow
Pillow tutorial: http://pillow.readthedocs.org/en/3.0.x/handbook/tutorial.html
Spectral Python (SPy) is a pure Python module for processing hyperspectral image data. It has functions for reading, displaying, manipulating, and classifying hyperspectral imagery. SPy is a very advanced Python package for remote sensing.
$ conda install -c http://conda.anaconda.org/rbacher spectral