Infotheory, written in C++, is a software package to perform information theoretic analysis, especially on high-dimensional data. While inferring the data distribution from samples, unlike the straight-forward method of binning the entire data space, this implementation only keeps track of non-empty bins in a list (sparse representation) thus avoiding the catastrophic explosion of bin counts with increased dimensions. Moreover, it enables better distribution estimation by employing averaged shifted histograms [1].

The following information theoretic quantities can be estimated using this tool as of now and follow this repo for more to come.

  1. Entropy [2]
  2. Mutual Information [3]
  3. Partial Information Decomposition measures [4]
    • Unique Information
    • Redundant Information
    • Synergistic Information

The package can be used in Python or C++. While the C++ headers should function well on all platforms, the python package has currently only been tested on MacOS and Linux.



From your terminal
pip install --upgrade infotheory

In MacOS, you might have to add two environment variables using
export CXXFLAGS="-mmacosx-version-min=10.9"
export LDFLAGS="-mmacosx-version-min=10.9"


Simply download Infotools.h and VectorMatrix.h , place them in your source directory and include Infotools in your source like any other header as follows
#include "Infotools.h"
Since Infotools accepts TVector arrays as arguments, you will also need to include VectorMatrix.h, which contains the TVector class.
#include "VectorMatrix.h"
See demo below for a better understanding.

API Documentation

C++ Python


From both C++ and Python, here are the steps involved in using this package -

  1. Creating the object
  2. Specify how the data should be binned
  3. Adding data
  4. Invoking information theoretic tool (say mutual information)
  5. Invoking information theoretic tool (say synergy)
  6. (Repeat to call as many tools of analysis on this data as required)
  7. Start over for a new dataset

Below, you can find demonstrations in Python and in C++, of using this package to estimate mutual information between two 2D random variables. More demos are available here.

A similar demo on how to use this package with C++ is available here.


This sections demonstrates some baseline information theoretic results using this package. See baselines directory in the repo for code to reproduce these results.


For the range of probabilities of HEADS in a coin flip, entropy should be maximum for p=0.5 and have the general shape of an inverted-U.

Mutual Information

Mutual information should be high for identical variables, as shown in first panel. It should also should be high for variables where bins even though there is no direct correlation between x and y, knowledge of x is highly indicative of y. In the illustration in the second panel, the data is binned so that identifying the bin in which x falls can determine which bin y will fall in. This illustrates mutual information as a generalized measure of correlation. Furthermore, mutual information should be slightly lower for noisy identical variables and low for random variables. It is normalized to be in [0,1] for this baseline.

Partial Information Decomposition

For an XOR gate, by its definition each input provides no unique information, and hence no redundant information. The output of an XOR gate can only be inferred through the synergistic information of both inputs combined. This should show up in the partial information decomposition of the total mutual information between inputs and outputs in an XOR gate. However, in an AND gate the output can be determined to be 0, if one of the inputs is 0, and so the other input is redundant in that case; and an output of 1 can only be inferred if both information from both inputs are together known to be 1. This also can be brought out from partial information decomposition. See Table 1. in [5] for corroboration of numbers shown below.

2-input XOR
total_mi = 1.0
redundant_info = 0.0
unique_1 = 0.0
unique_2 = 0.0
synergy = 1.0

2-input AND
total_mi = 0.8112781244591328
redundant_info = 0.31127812445913283
unique_1 = 0.0
unique_2 = 0.0
synergy = 0.5


MIT License

A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

This is not legal advice. Learn more about repository licenses.


Having trouble with Infotheory? Want to contribute? Contact Madhavun at madvncv [at]


Thanks to Randall Beer for VectorMatrix.h


  1. Scott, D. W. (1985). Averaged shifted histograms: effective nonparametric density estimators in several dimensions. The Annals of Statistics, 1024-1040.
  4. Williams, P. L., & Beer, R. D. (2010). Nonnegative decomposition of multivariate information. arXiv preprint arXiv:1004.2515.
  5. Timme, N., Alford, W., Flecker, B., & Beggs, J. M. (2014). Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective. Journal of computational neuroscience, 36(2), 119-140.