Quick Start Example#

This section provides a quick example of how to use the InsightSolver API client. Before running the example script, please ensure that:

You have completed the steps in the Installation Guide.
You have obtained a valid service key.
You have credits available.

The following example demonstrates the basic usage of the InsightSolver API client, showing how to initialize the solver and generate insights.

# Import Pandas
import pandas as pd

# Import some data: https://www.kaggle.com/competitions/titanic/data
df = pd.read_csv('kaggle_titanic_train.csv')

# Specify the name of the target variable
target_name = 'Survived' # We are interested in whether the passengers survived or not

# Specify the target goal
target_goal = 1 # We are searching rules that describe survivors

# Choose how features should be interpreted
columns_types = {
   'Survived' : 'binary',
   'Pclass'   : 'continuous', # 'multiclass' (i.e. unordered) or 'continuous' (i.e. ordered)
   'Name'     : 'ignore',
   'Sex'      : 'binary',
   'Age'      : 'continuous',
   'SibSp'    : 'continuous', # 'multiclass' (i.e. unordered) or 'continuous' (i.e. ordered)
   'Parch'    : 'continuous', # 'multiclass' (i.e. unordered) or 'continuous' (i.e. ordered)
   'Ticket'   : 'ignore',
   'Fare'     : 'continuous',
   'Cabin'    : 'ignore',
   'Embarked' : 'multiclass',
}

# Import the class InsightSolver from the module insightsolver
from insightsolver import InsightSolver

# Create an instance of the class InsightSolver
solver = InsightSolver(
   df            = df,            # A dataset
   target_name   = target_name,   # Name of the target variable
   target_goal   = target_goal,   # Target goal
   columns_types = columns_types, # Columns types
)

# Specify the service key
service_key = 'name_of_your_service_key.json'

# Fit the solver
solver.fit(
   service_key  = service_key, # Use your API service key here
)

# Print the rule mining results
solver.print(mode='dense')
"""
                                    contribution variable               rule     nans
i p_value coverage gain     cohen_d

0 2e-67   19.1%    +146.73% 84.76
                                           86.2%      Sex             female
                                           13.8%   Pclass             [1, 2]
1 3e-20   12.2%    +105.55% 19.80
                                           81.4%   Pclass             [1, 1]
                                           11.3%     Fare  [7.925, 512.3292]
                                            7.3%      Age        [4.0, 42.0]  exclude
2 7e-12    7.9%    +100.98% 8.37
                                           57.4%    Parch             [1, 6]
                                           31.4%    SibSp             [0, 1]
                                           11.2%      Age       [0.42, 25.0]  exclude
"""

In this specific example, the InsightSolver API gives us three rules in which we find more survivors of the Titanic:

(i=0) : Women in 1st or 2nd class. This group covers 19.1% of the passengers and has a survival gain of +146.73% compared to the population of the Titanic.
(i=1) : Rich 1st class that are not too old (which we know the age). This group covers 12.2% of the passengers and has a survival gain of +105.55% compared to the population of the Titanic.
(i=2) : Children (which we know the age) with not too many siblings. This group covers 7.9% of the passengers and has a survival gain of +100.98% compared to the population of the Titanic.

Note that there could be a survivor bias in the two rules i=1 and i=2 because we know the age of the survivors more than we know the age of the non-survivors of the Titanic. We could also use target_goal = 0 to look for passengers that did not survive the Titanic:

# Specify the target goal
target_goal = 0 # We are searching rules that describe non-survivors

# Create an instance of the class InsightSolver
solver = InsightSolver(
   df            = df,            # A dataset
   target_name   = target_name,   # Name of the target variable
   target_goal   = target_goal,   # Target goal
   columns_types = columns_types, # Columns types
)

# Fit the solver
solver.fit(
   service_key = service_key,
)

# Print the rule mining results
solver.print(mode='dense')

"""
                                    contribution variable            rule     nans
i p_value coverage gain    cohen_d

0 7e-55   42.6%    +45.64% 60.04
                                           80.1%      Sex            male
                                           11.1%     Fare    [0.0, 26.25]
                                            8.8%    Parch          [0, 0]
1 2e-14    9.2%    +56.36% 10.32
                                           41.1%     Fare  [7.8875, 14.5]
                                           35.0%      Age    [19.0, 25.0]  include
                                           23.9%   Pclass          [3, 3]
"""

In this specific example, the InsightSolver API gives us two rules in which we find more non-survivors of the Titanic:

(i=0) : Poor males without a family. This group covers 42.6% of the passengers and has a non-survival gain of +45.64% compared to the population of the Titanic.
(i=1) : Poor young third class adults (include missing ages). This group covers 9.2% of the passengers and has a non-survival gain of +56.36% compared to the population of the Titanic.

Note that there could be a survivor bias in the rule i=1 because we know the age of the non-survivors less than we know the age of the survivors of the Titanic. In conclusion, using the InsightSolver API, we know that Rose DeWitt Bukater (young rich female, 1st class, with her family) had a higher chance to survive the Titanic than Jack Dawson (3rd class young male without a family). For more technical details about the API, please refer to the detailed documentation.