api_utilities module#

  • Organization: InsightSolver Solutions Inc.

  • Project Name: InsightSolver

  • Module Name: insightsolver

  • File Name: api_utilities.py

  • Author: Noé Aubin-Cadot

  • Email: noe.aubin-cadot@insightsolver.com

Description#

This file provides essential utility functions to secure and streamline client-server communication within the API. It includes functions for data compression, encryption, decryption, and transformations of data structures, all designed to facilitate efficient and protected message exchange between the client and server.

While all communications are secured via HTTPS, this file goes a step further by adding an additional layer of encryption, using RSA-4096 and ECDSA-SECP521R1 for secure key exchange and AES-256 for data encryption. These functions are particularly useful for scenarios requiring enhanced data privacy and integrity.

Functions provided#

  • hash_string: Computes the hash of a string.

  • convert_bytes_to_base64_string: Convert bytes to a base64 string.

  • convert_base64_string_to_bytes: Convert a base64 string to bytes.

  • compress_string: Compress a string using gzip.

  • decompress_string: Decompress a gzip-compressed string.

  • compress_and_encrypt_string: Compress and encrypt a string for secure transmission.

  • decrypt_and_decompress_string: Decrypt an encrypted string.

  • encode_obj: Takes an object and encode it to a new object compatible with json serialization.

  • convert_dict_to_json_string: Convert a dict to a json string.

  • decode_obj: Inverse operation from encode_obj.

  • convert_json_string_to_dict: Convert a json string to a dict.

  • transform_dict: Convert a dictionary for easier client-server communication.

  • untransform_dict: Reverse the dictionary transformation to restore the original data format.

  • generate_keys: Generate RSA and ECDSA private and public keys.

  • compute_credits_from_df: Compute the amount of credits consumed for a given DataFrame.

  • request_cloud_credits_infos: Request the server for informations about the credits available.

  • request_cloud_public_keys: Request the server for public keys.

  • request_cloud_computation: Request the server for computation.

  • search_best_ruleset_from_API_dict: Make the API call.

License#

Exclusive Use License - see LICENSE for details.


insightsolver.api_utilities.hash_string(string)#

A function to compute the hash of a string using hashlib.

insightsolver.api_utilities.convert_bytes_to_base64_string(data: bytes) str#

Convert a bytes object to a base64-encoded string.

Parameters#

databytes

The byte data to encode.

Returns#

str

The base64-encoded string.

insightsolver.api_utilities.convert_base64_string_to_bytes(string: str) bytes#

Convert a base64-encoded string to a bytes object.

Parameters#

stringstr

The base64-encoded string.

Returns#

bytes

The decoded byte data.

insightsolver.api_utilities.compress_string(original_string: str) str#

Compress a string using gzip and then encode it to base64.

Parameters#

original_stringstr

The original string to be compressed.

Returns#

str

The compressed string.

Example#

original_string = "This is a test string"
compressed_string = compress_string(original_string)
print(compressed_string)  # Example output: 'H4sIAA01/2YC/wvJyCxWAKJEhZLU4hKF4pKizLx0AG3zTmsVAAAA'
insightsolver.api_utilities.decompress_string(compressed_string: str) str#

Decompress a base64-encoded string that was previously compressed using gzip.

This function takes a base64-encoded string, decodes it, and then decompresses the resulting data using gzip to return the original string.

Parameters#

compressed_stringstr

The base64-encoded string that contains the compressed data.

Returns#

str

The original uncompressed string.

Example#

compressed_string = 'H4sIAA01/2YC/wvJyCxWAKJEhZLU4hKF4pKizLx0AG3zTmsVAAAA'
original_string = decompress_string(compressed_string)
print(original_string) # 'This is a test string'
insightsolver.api_utilities.compress_and_encrypt_string(original_string: str, symmetric_key: bytes) tuple[str, str]#

Compress and encrypt a string using AES-256-GCM.

This function compresses the given string using gzip and then encrypts it using AES-256 in GCM mode. A nonce is used in the encryption process for AES-GCM, and the result is base64-encoded for easy transfer over networks.

Security: - AES-256 encryption - GCM (Galois/Counter Mode) with authentication

Parameters#

original_stringstr

The original string to be compressed and encrypted.

symmetric_keybytes

The 32-byte symmetric key used for encryption.

Returns#

tuple[str, str]

A tuple containing the base64-encoded encrypted compressed string and the base64-encoded nonce used.

Example#

transformed_string, nonce_string = compress_and_encrypt_string(
        original_string = "Secret data",
        symmetric_key   = token_bytes(32),
)
print(transformed_string, nonce_string) # 'Base64_encoded_result', nonce_string
insightsolver.api_utilities.decrypt_and_decompress_string(transformed_string: str, symmetric_key: bytes, nonce: bytes) str#

Decrypt and decompress a string using AES-256-GCM.

This function takes a base64-encoded encrypted string, decrypts it using AES-256 in GCM mode with the provided symmetric key and nonce, and then decompresses the result using gzip.

Security: - AES-256 encryption - GCM (Galois/Counter Mode) with authentication

Parameters#

transformed_stringstr

The base64-encoded string that contains the encrypted and compressed data.

symmetric_keybytes

The 32-byte symmetric key used for decryption.

noncebytes

The nonce used for AES-GCM during encryption.

Returns#

str

The original uncompressed and decrypted string.

Raises#

Exception

If the decryption fails.

Example#

original_string = decrypt_and_decompress_string(
        transformed_string = encrypted_compressed_string,
        symmetric_key      = token_bytes(32),
        nonce              = nonce
)
print(original_string) # 'Secret data'
insightsolver.api_utilities.encode_obj(obj)#

This function takes an object and encode it to a new object compatible with json serialization.

insightsolver.api_utilities.convert_dict_to_json_string(d: dict) str#

This function converts a dict to a json string.

insightsolver.api_utilities.decode_obj(obj)#

This function does the inverse operation from the function encode_obj.

insightsolver.api_utilities.convert_json_string_to_dict(string: str) dict#

This function takes a json string and converts it to a dict.

insightsolver.api_utilities.transform_dict(d_original: dict, do_compress_data: bool = False, symmetric_key: bytes | None = None, json_format: str = 'json') dict#

Transform the contents of a dictionary by optionally compressing and encrypting its data.

This function takes a dictionary and converts it to a string. Depending on the options provided, it can compress the data using gzip, encrypt it using AES-256, or both. The resulting string is returned in a transformed dictionary format for easier transmission or storage.

Parameters#

d_originaldict

The original dictionary that needs to be transformed.

do_compress_databool, optional

Whether or not to compress the dictionary data (default is False).

symmetric_keybytes, optional

A symmetric key. Typically generated using from secrets import token_bytes;symmetric_key = token_bytes(32). If provided, the data will be encrypted (default is None).

json_formatstr, optional

The format to convert the dictionary to a string. Can be ‘json’ or ‘json_extended’ (default is ‘json’).

Returns#

dict

A dictionary containing the transformed string, the transformations applied, and the json format.

Example#

d_original = {'A':1, 'B':2, 'C':3}
from secrets import token_bytes
symmetric_key = token_bytes(32) # b'\x1a\xef&\x0bR\xe1\x95\xfa\x90\x10r\x93\x1a\xaeN\xc2\xba\x80\xf1\x1a\x0fG\xf4(\x0e#\xd4\xaf`\x81q\xf4'
d_transformed = transform_dict(
        d_original       = d_original,
        do_compress_data = True,
        symmetric_key    = symmetric_key,
        json_format      = 'json',
)
print(d_transformed)
# {
#       'transformations': 'encrypted_gzip_base64',
#       'json_format': 'json',
#       'transformed_string': 'q30qPkK19Z3sENnfk77t4CnpzWKV+gdHLLSpNNgU3DjdmEbLcZWj+AjZyFmUquuUmh6obZmTh8k=',
#       'nonce_string': '7PpTvoc0Ksx8whRy',
# }
insightsolver.api_utilities.untransform_dict(d_transformed: dict, symmetric_key: bytes | None = None, verbose: bool = False) dict#

Decompress and decrypt the contents of a transformed dictionary.

This function takes a dictionary that has been transformed (e.g., compressed, encrypted), and restores its original contents by reversing the transformations. Depending on the transformation type, it may decrypt and/or decompress the data.

Parameters#

d_transformeddict

The transformed dictionary containing the compressed/encrypted string, the transformations applied, and the json format used.

symmetric_keybytes, optional

A symmetric key. Typically generated using from secrets import token_bytes;symmetric_key = token_bytes(32). If provided, the data will be decrypted using this key (default is None).

verbosebool, optional

If True, additional debug information will be printed (default is False).

Returns#

dict

The original dictionary with its content restored.

Raises#

Exception

If an invalid transformation type or JSON format is provided.

Example#

d_transformed = {
        'transformations'    : 'encrypted_gzip_base64',
        'json_format'        : 'json',
        'transformed_string' : 'q30qPkK19Z3sENnfk77t4CnpzWKV+gdHLLSpNNgU3DjdmEbLcZWj+AjZyFmUquuUmh6obZmTh8k='
        'nonce_string'       : '7PpTvoc0Ksx8whRy',
}
d_untransformed = untransform_dict(
        d_transformed = d_transformed,
        symmetric_key = symmetric_key, # b'\x1a\xef&\x0bR\xe1\x95\xfa\x90\x10r\x93\x1a\xaeN\xc2\xba\x80\xf1\x1a\x0fG\xf4(\x0e#\xd4\xaf`\x81q\xf4'
)
print(d_untransformed) # {'A': 1, 'B': 2, 'C': 3}
insightsolver.api_utilities.generate_keys()#

This function generates RSA and ECDSA private and public keys. The generated keys:

  • rsa_private_key

  • ecdsa_private_key

  • rsa_public_key_pem_bytes

  • ecdsa_public_key_pem_bytes

Returns#

tuple

A tuple containing four elements:

  • rsa_private_key: The generated RSA private key.

  • ecdsa_private_key: The generated ECDSA private key.

  • rsa_public_key_pem_bytes: The RSA public key serialized in PEM format.

  • ecdsa_public_key_pem_bytes: The ECDSA public key serialized in PEM format.

insightsolver.api_utilities.generate_url_headers(computing_source: str, input_file_service_key: str | None = None) Tuple[str, Dict[str, Any] | None]#

This function generates the url and the headers for the POST request.

Parameters#

computing_sourcestr

Where the server is.

input_file_service_keyoptional

The client’s service key, needed if the server is remote. Default is None.

insightsolver.api_utilities.compute_credits_from_df(df: DataFrame, columns_names_to_btypes: dict = {}) int#

This function computes the number of credits consumed by a rule mining via the API. This number is based on the size of the DataFrame sent to the API.

Remark: The amount of credits debited is m*n where: - m is the number of rows of df (excluding the header). - n is the number of features to explore (i.e. the number of columns less the index, less the target variable, less the ignored features).

Parameters#

dfpd.DataFrame

Input DataFrame whose size is used to compute credits.

columns_names_to_btypes: dict

The dict that specifies how to handle the variables.

Returns#

int

The computed number of credits consumed.

insightsolver.api_utilities.request_cloud_credits_infos(computing_source: str, d_out_credits_infos: dict, input_file_service_key: str | None = None, user_email: str | None = None, timeout: int = 60) dict#

Send a dict that specifies which infos about the credits are asked for.

Parameters#

computing_sourcestr

Where the server is.

d_out_credits_infosdict

A dictionary containing the infos about the credits that are asked for. The dictionary format is:

  • private_key_id: private_key_id of the service_key.

  • user_email: Email of the user.

  • do_compute_credits_available: A boolean that specifies where the number of credits available is requested.

  • do_compute_df_credits_infos: A boolean that specifies if a DataFrame containing all credits transactions is asked for.

input_file_service_keyoptional

The client’s service key, needed if the server is remote. Default is None.

timeoutint, optional

The timeout duration for the request, in seconds. Default is 60 seconds, as this operation is typically fast and does not involve computation.

insightsolver.api_utilities.request_cloud_public_keys(computing_source: str, d_client_public_keys: dict, input_file_service_key: str | None = None, timeout: int = 60) dict#

Send the client’s public keys to the server and receive the server’s public keys in response.

This function establishes a secure connection to the specified server (computing_source) and sends the client’s public keys (d_client_public_keys). The server responds with its own set of public keys, which are returned in a dictionary format.

Parameters#

computing_sourcestr

Where the server is.

d_client_public_keysdict

A dictionary containing the client’s public keys to be sent to the server. The dictionary format is:

  • alice_rsa_public_key_pem_base64: Client’s RSA public key, encoded in base64.

  • alice_ecdsa_public_key_pem_base64: Client’s ECDSA public key, encoded in base64.

input_file_service_keyoptional

The client’s service key, needed if the server is remote. Default is None.

timeoutint, optional

The timeout duration for the request, in seconds. Default is 60 seconds, as this operation is typically fast and does not involve computation.

Returns#

dict

A dictionary containing the server’s public keys and a unique session identifier. The dictionary format is as follows:

  • session_id: A unique identifier for the session.

  • bob_rsa_public_key_pem_base64: Server’s RSA public key, encoded in base64.

  • bob_ecdsa_public_key_pem_base64: Server’s ECDSA public key, encoded in base64.

Example#

# Client's public keys
d_client_public_keys = {
        'alice_rsa_public_key_pem_base64': '<base64-encoded RSA public key>',
        'alice_ecdsa_public_key_pem_base64': '<base64-encoded ECDSA public key>',
}

# Request server public keys
d_server_public_keys = request_cloud_public_keys(
        computing_source='https://server-address.com',
        d_client_public_keys=d_client_public_keys,
        input_file_service_key='client_service_key'
)

# Access the session ID and server's public keys
session_id = d_server_public_keys['session_id']
bob_rsa_public_key = d_server_public_keys['bob_rsa_public_key_pem_base64']
bob_ecdsa_public_key = d_server_public_keys['bob_ecdsa_public_key_pem_base64']

Raises#

Exception

If the request fails or the server does not return the expected keys.

insightsolver.api_utilities.request_cloud_computation(computing_source: str, d_out_transformed: dict, input_file_service_key: str | None = None, timeout: int = 600, verbose: bool = False) dict#

Send the transformed dict to the server for it to compute the rule mining.

Parameters#

computing_sourcestr

The computing source.

d_out_transformeddict

The transformed dict to send to the server.

input_file_service_keystr, optional

The client’s service key, needed if the server is remote. Default is None.

timeoutint, optional

Timeout for the request, in seconds. Default is 600 seconds, as computation may take longer.

Returns#

dict

The dict that contains the rule mining results.

insightsolver.api_utilities.search_best_ruleset_from_API_dict(d_out_original: dict, input_file_service_key: str | None = None, user_email: str | None = None, computing_source: str = 'remote_cloud_function', do_compress_data: bool = True, do_compute_memory_usage: bool = True, verbose: bool = False) dict#

Search for the best ruleset where the computation is done from the server.

Parameters#

d_out_original: dict

The original dict, pre-transformation, that contains the necessary data for the server to do rule mining.

input_file_service_key: str, optional

The service key of the client.

user_email: str, optional

Email of the user (only for use inside a Google Cloud Run container).

computing_source: str, optional

The computing source.

do_compress_data: bool, optional

If we want to compress the data to reduce transmission size.

do_compute_memory_usage: bool, optional

If we want to compute the memory usage.

verbose: bool, optional

Verbosity.

Returns#

dict

The dict that contain the output of the rule mining from the server.