A new histogram-based mutual information estimator using data-driven tree-structured partitions (TSP) is presented in this paper. The derived TSP is a solution to a complexity regularized empirical information maximization, with the objective of finding a good tradeoff between the known estimation and approximation errors. A distribution-free concentration inequality for this tree-structured learning problem as well as finite sample performance bounds for the proposed histogram-based solution is derived. It is shown that this solution is density-free strongly consistent and that it provides, with an arbitrary high probability, an optimal balance between the mentioned estimation and approximation errors. Finally, for the emblematic scenario of independence, $I(X;Y)=0$,it is shown that the TSP estimate converges to zero with $O(e^{-n^{1⁄3}+\log \log n})$.