Welcome to mixed-naive-bayes’s documentation!

Getting started

Install using pip:

pip install mixed-naive-bayes

Example 1: Discrete and continuous data

Below is an example of a dataset with discrete (first 2 columns) and continuous data (last 2). We assume that the discrete features follow a categorical distribution and the features with the continuous data follow a Gaussian distribution. Specify categorical_features=[0,1] then fit and predict as per usual.

>>> from mixed_naive_bayes import MixedNB
>>> X = [[0, 0, 180.9, 75.0],
...      [1, 1, 165.2, 61.5],
...      [2, 1, 166.3, 60.3],
...      [1, 1, 173.0, 68.2],
...      [0, 2, 178.4, 71.0]]
>>> y = [0, 0, 1, 1, 0]
>>> clf = MixedNB(categorical_features=[0,1])
>>> clf.fit(X,y)
>>> clf.predict(X)

Note

The module expects that the categorical data be label-encoded accordingly. See the following example to see how.

Example 2: Discrete and continuous data

Below is a similar dataset. However, for this dataset we assume a categorical distribution on the first 3 features, and a Gaussian distribution on the last feature. Feature 3 however has not been label-encoded. We can use sklearn’s [LabelEncoder()](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) preprocessing module to fix this.

>>> import numpy as np
>>> from sklearn.preprocessing import LabelEncoder
>>> X = [[0, 0, 180, 75.0],
...      [1, 1, 165, 61.5],
...      [2, 1, 166, 60.3],
...      [1, 1, 173, 68.2],
...      [0, 2, 178, 71.0]]
>>> y = [0, 0, 1, 1, 0]
>>> X = np.array(X)
>>> y = np.array(y)
>>> label_encoder = LabelEncoder()
>>> X[:,2] = label_encoder.fit_transform(X[:,2])
>>> print(X)
array([[ 0,  0,  4, 75],
[ 1,  1,  0, 61],
[ 2,  1,  1, 60],
[ 1,  1,  2, 68],
[ 0,  2,  3, 71]])

Then fit and predict as usual, specifying categorical_features=[0,1,2] as the indices that we assume categorical distribution.

>>> from mixed_naive_bayes import MixedNB
>>> clf = MixedNB(categorical_features=[0,1,2])
>>> clf.fit(X,y)
>>> clf.predict(X)

Example 3: Discrete data only

If all columns are to be treated as discrete, specify categorical_features='all'.

>>> from mixed_naive_bayes import MixedNB
>>> X = [[0, 0],
...      [1, 1],
...      [1, 0],
...      [0, 1],
...      [1, 1]]
>>> y = [0, 0, 1, 0, 1]
>>> clf = MixedNB(categorical_features='all')
>>> clf.fit(X,y)
>>> clf.predict(X)

Note

The module expects that the categorical data be label-encoded accordingly. See the previous example to see how.

Example 4: Continuous data only

If all features are assumed to follow Gaussian distribution, then leave the constructor blank.

>>> from mixed_naive_bayes import MixedNB
>>> X = [[0, 0],
...      [1, 1],
...      [1, 0],
...      [0, 1],
...      [1, 1]]
>>> y = [0, 0, 1, 0, 1]
>>> clf = MixedNB()
>>> clf.fit(X,y)
>>> clf.predict(X)

API

class mixed_naive_bayes.MixedNB(categorical_features=None, max_categories=None, alpha=0.5, priors=None, var_smoothing=1e-09)

Naive Bayes classifier for Categorical and Gaussian models.

Note: When using categorical_features, MixedNB expects that for each feature, all possible classes are captured in the trining data X in the mixed_naive_bayes.mixed_naive_bayes.MixedNB.fit method. This is to ensure numerical stability.

Parameters
  • categorical_features (array-like shape (num_categorical_classes,) or) –

  • (default=None) ('all') – Columns which have categorical feature_distributions

  • max_categories (array-like, shape (num_categorical_classes,) (default=None)) – The maximum number of categories that can be found for each categorical feature. If none specified, they will be generated automatically.

  • alpha (non-negative float, optional (default=0)) – Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing). This is for features with categorical distribution.

  • priors (array-like, size (num_classes,), optional (default=None)) – Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.

  • var_smoothing (float, optional (default=1e-9)) – Portion of the largest variance of all features that is added to variances for calculation stability.

priors

probability of each class.

Type

array, shape (num_classes,)

epsilon

absolute additive value to variances

Type

float

num_samples

number of training samples

Type

int

categorical_features

number of classes (number of layes of y)

Type

int

gaussian_features

the distribution for every feature and class

Type

array, shape (num_classes,)

categorical_posteriors

the distribution of the categorical features

Type

array

theta

the mean of the gaussian features

Type

array

sigma

the variance of the gaussian features

Type

array

References

https://scikit-learn.org/stable/modules/classes.html#module-sklearn.naive_bayes

Example

>>> from mixed_naive_bayes import MixedNB
>>> X = [[0, 0, 180, 75],
         [1, 1, 165, 61],
         [2, 1, 166, 60],
         [1, 1, 173, 68],
         [0, 2, 178, 71]]
>>> y = [0, 0, 1, 1, 0]
>>> clf = MixedNB(categorical_features=[0,1])
>>> clf.fit(X,y)
>>> clf.predict(X)

Indices and tables