Mars

Mars is a tensor-based unified framework for large-scale data computation.

Mars tensor

documentation

Mars tensor provides a familiar interface like Numpy.

Numpy Mars tensor
import numpy as np
a = np.random.rand(1000, 2000)
(a + 1).sum(axis=1)
import mars.tensor as mt
a = mt.random.rand(1000, 2000)
(a + 1).sum(axis=1).execute()

Mars dataframe

documentation

Mars DataFrame provides a familiar interface like pandas.

Pandas Mars DataFrame
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(100000000, 4),
                  columns=list('abcd')
print(df.sum())
import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(mt.random.rand(100000000, 4),
                  columns=list('abcd')
print(df.sum().execute())

Easy to scale in and scale out

Mars can scale in to a single machine, and scale out to a cluster with hundreds of machines. Both the local and distributed version share the same piece of code, it’s fairly simple to migrate from a single machine to a cluster due to the increase of data.