api_utilities module#
Organization: InsightSolver Solutions Inc.
Project Name: InsightSolver
Module Name: insightsolver
File Name: api_utilities.py
Author: Noé Aubin-Cadot
Description#
This file provides essential utility functions to secure and streamline client-server communication within the API. It includes functions for data compression, encryption, decryption, and transformations of data structures, all designed to facilitate efficient and protected message exchange between the client and server.
While all communications are secured via HTTPS, this file goes a step further by adding an additional layer of encryption, using RSA-4096 and ECDSA-SECP521R1 for secure key exchange and AES-256 for data encryption. These functions are particularly useful for scenarios requiring enhanced data privacy and integrity.
Functions provided#
hash_string: Computes the hash of a string.convert_bytes_to_base64_string: Convert bytes to a base64 string.convert_base64_string_to_bytes: Convert a base64 string to bytes.compress_string: Compress a string using gzip.decompress_string: Decompress a gzip-compressed string.compress_and_encrypt_string: Compress and encrypt a string for secure transmission.decrypt_and_decompress_string: Decrypt an encrypted string.encode_obj: Takes an object and encode it to a new object compatible with json serialization.convert_dict_to_json_string: Convert a dict to a json string.decode_obj: Inverse operation fromencode_obj.convert_json_string_to_dict: Convert a json string to a dict.transform_dict: Convert a dictionary for easier client-server communication.untransform_dict: Reverse the dictionary transformation to restore the original data format.generate_keys: Generate RSA and ECDSA private and public keys.compute_credits_from_df: Compute the amount of credits consumed for a given DataFrame.request_cloud_credits_infos: Request the server for informations about the credits available.request_cloud_public_keys: Request the server for public keys.request_cloud_computation: Request the server for computation.search_best_ruleset_from_API_dict: Make the API call.
License#
Exclusive Use License - see LICENSE for details.
- insightsolver.api_utilities.hash_string(string)#
A function to compute the hash of a string using hashlib.
- insightsolver.api_utilities.convert_bytes_to_base64_string(data: bytes) str#
Convert a bytes object to a base64-encoded string.
Parameters#
- databytes
The byte data to encode.
Returns#
- str
The base64-encoded string.
- insightsolver.api_utilities.convert_base64_string_to_bytes(string: str) bytes#
Convert a base64-encoded string to a bytes object.
Parameters#
- stringstr
The base64-encoded string.
Returns#
- bytes
The decoded byte data.
- insightsolver.api_utilities.compress_string(original_string: str) str#
Compress a string using gzip and then encode it to base64.
Parameters#
- original_stringstr
The original string to be compressed.
Returns#
- str
The compressed string.
Example#
original_string = "This is a test string" compressed_string = compress_string(original_string) print(compressed_string) # Example output: 'H4sIAA01/2YC/wvJyCxWAKJEhZLU4hKF4pKizLx0AG3zTmsVAAAA'
- insightsolver.api_utilities.decompress_string(compressed_string: str) str#
Decompress a base64-encoded string that was previously compressed using gzip.
This function takes a base64-encoded string, decodes it, and then decompresses the resulting data using gzip to return the original string.
Parameters#
- compressed_stringstr
The base64-encoded string that contains the compressed data.
Returns#
- str
The original uncompressed string.
Example#
compressed_string = 'H4sIAA01/2YC/wvJyCxWAKJEhZLU4hKF4pKizLx0AG3zTmsVAAAA' original_string = decompress_string(compressed_string) print(original_string) # 'This is a test string'
- insightsolver.api_utilities.compress_and_encrypt_string(original_string: str, symmetric_key: bytes) tuple[str, str]#
Compress and encrypt a string using AES-256-GCM.
This function compresses the given string using gzip and then encrypts it using AES-256 in GCM mode. A nonce is used in the encryption process for AES-GCM, and the result is base64-encoded for easy transfer over networks.
Security: - AES-256 encryption - GCM (Galois/Counter Mode) with authentication
Parameters#
- original_stringstr
The original string to be compressed and encrypted.
- symmetric_keybytes
The 32-byte symmetric key used for encryption.
Returns#
- tuple[str, str]
A tuple containing the base64-encoded encrypted compressed string and the base64-encoded nonce used.
Example#
transformed_string, nonce_string = compress_and_encrypt_string( original_string = "Secret data", symmetric_key = token_bytes(32), ) print(transformed_string, nonce_string) # 'Base64_encoded_result', nonce_string
- insightsolver.api_utilities.decrypt_and_decompress_string(transformed_string: str, symmetric_key: bytes, nonce: bytes) str#
Decrypt and decompress a string using AES-256-GCM.
This function takes a base64-encoded encrypted string, decrypts it using AES-256 in GCM mode with the provided symmetric key and nonce, and then decompresses the result using gzip.
Security: - AES-256 encryption - GCM (Galois/Counter Mode) with authentication
Parameters#
- transformed_stringstr
The base64-encoded string that contains the encrypted and compressed data.
- symmetric_keybytes
The 32-byte symmetric key used for decryption.
- noncebytes
The nonce used for AES-GCM during encryption.
Returns#
- str
The original uncompressed and decrypted string.
Raises#
- Exception
If the decryption fails.
Example#
original_string = decrypt_and_decompress_string( transformed_string = encrypted_compressed_string, symmetric_key = token_bytes(32), nonce = nonce ) print(original_string) # 'Secret data'
- insightsolver.api_utilities.encode_obj(obj)#
This function takes an object and encode it to a new object compatible with json serialization.
- insightsolver.api_utilities.convert_dict_to_json_string(d: dict) str#
This function converts a dict to a json string.
- insightsolver.api_utilities.decode_obj(obj)#
This function does the inverse operation from the function
encode_obj.
- insightsolver.api_utilities.convert_json_string_to_dict(string: str) dict#
This function takes a json string and converts it to a dict.
- insightsolver.api_utilities.transform_dict(d_original: dict, do_compress_data: bool = False, symmetric_key: bytes | None = None, json_format: str = 'json') dict#
Transform the contents of a dictionary by optionally compressing and encrypting its data.
This function takes a dictionary and converts it to a string. Depending on the options provided, it can compress the data using gzip, encrypt it using AES-256, or both. The resulting string is returned in a transformed dictionary format for easier transmission or storage.
Parameters#
- d_originaldict
The original dictionary that needs to be transformed.
- do_compress_databool, optional
Whether or not to compress the dictionary data (default is False).
- symmetric_keybytes, optional
A symmetric key. Typically generated using from secrets import token_bytes;symmetric_key = token_bytes(32). If provided, the data will be encrypted (default is None).
- json_formatstr, optional
The format to convert the dictionary to a string. Can be ‘json’ or ‘json_extended’ (default is ‘json’).
Returns#
- dict
A dictionary containing the transformed string, the transformations applied, and the json format.
Example#
d_original = {'A':1, 'B':2, 'C':3} from secrets import token_bytes symmetric_key = token_bytes(32) # b'\x1a\xef&\x0bR\xe1\x95\xfa\x90\x10r\x93\x1a\xaeN\xc2\xba\x80\xf1\x1a\x0fG\xf4(\x0e#\xd4\xaf`\x81q\xf4' d_transformed = transform_dict( d_original = d_original, do_compress_data = True, symmetric_key = symmetric_key, json_format = 'json', ) print(d_transformed) # { # 'transformations': 'encrypted_gzip_base64', # 'json_format': 'json', # 'transformed_string': 'q30qPkK19Z3sENnfk77t4CnpzWKV+gdHLLSpNNgU3DjdmEbLcZWj+AjZyFmUquuUmh6obZmTh8k=', # 'nonce_string': '7PpTvoc0Ksx8whRy', # }
- insightsolver.api_utilities.untransform_dict(d_transformed: dict, symmetric_key: bytes | None = None, verbose: bool = False) dict#
Decompress and decrypt the contents of a transformed dictionary.
This function takes a dictionary that has been transformed (e.g., compressed, encrypted), and restores its original contents by reversing the transformations. Depending on the transformation type, it may decrypt and/or decompress the data.
Parameters#
- d_transformeddict
The transformed dictionary containing the compressed/encrypted string, the transformations applied, and the json format used.
- symmetric_keybytes, optional
A symmetric key. Typically generated using
from secrets import token_bytes;symmetric_key = token_bytes(32). If provided, the data will be decrypted using this key (default is None).- verbosebool, optional
If True, additional debug information will be printed (default is False).
Returns#
- dict
The original dictionary with its content restored.
Raises#
- Exception
If an invalid transformation type or JSON format is provided.
Example#
d_transformed = { 'transformations' : 'encrypted_gzip_base64', 'json_format' : 'json', 'transformed_string' : 'q30qPkK19Z3sENnfk77t4CnpzWKV+gdHLLSpNNgU3DjdmEbLcZWj+AjZyFmUquuUmh6obZmTh8k=' 'nonce_string' : '7PpTvoc0Ksx8whRy', } d_untransformed = untransform_dict( d_transformed = d_transformed, symmetric_key = symmetric_key, # b'\x1a\xef&\x0bR\xe1\x95\xfa\x90\x10r\x93\x1a\xaeN\xc2\xba\x80\xf1\x1a\x0fG\xf4(\x0e#\xd4\xaf`\x81q\xf4' ) print(d_untransformed) # {'A': 1, 'B': 2, 'C': 3}
- insightsolver.api_utilities.generate_keys()#
This function generates RSA and ECDSA private and public keys. The generated keys:
rsa_private_keyecdsa_private_keyrsa_public_key_pem_bytesecdsa_public_key_pem_bytes
Returns#
- tuple
A tuple containing four elements:
rsa_private_key: The generated RSA private key.
ecdsa_private_key: The generated ECDSA private key.
rsa_public_key_pem_bytes: The RSA public key serialized in PEM format.
ecdsa_public_key_pem_bytes: The ECDSA public key serialized in PEM format.
- insightsolver.api_utilities.generate_url_headers(computing_source: str, input_file_service_key: str | None = None) Tuple[str, Dict[str, Any] | None]#
This function generates the url and the headers for the POST request.
Parameters#
- computing_sourcestr
Where the server is.
- input_file_service_keyoptional
The client’s service key, needed if the server is remote. Default is None.
- insightsolver.api_utilities.compute_credits_from_df(df: DataFrame, columns_names_to_btypes: dict = {}) int#
This function computes the number of credits consumed by a rule mining via the API. This number is based on the size of the DataFrame sent to the API.
Remark: The amount of credits debited is m*n where: - m is the number of rows of df (excluding the header). - n is the number of features to explore (i.e. the number of columns less the index, less the target variable, less the ignored features).
Parameters#
- dfpd.DataFrame
Input DataFrame whose size is used to compute credits.
- columns_names_to_btypes: dict
The dict that specifies how to handle the variables.
Returns#
- int
The computed number of credits consumed.
- insightsolver.api_utilities.request_cloud_credits_infos(computing_source: str, d_out_credits_infos: dict, input_file_service_key: str | None = None, user_email: str | None = None, timeout: int = 60) dict#
Send a dict that specifies which infos about the credits are asked for.
Parameters#
- computing_sourcestr
Where the server is.
- d_out_credits_infosdict
A dictionary containing the infos about the credits that are asked for. The dictionary format is:
private_key_id: private_key_id of the service_key.user_email: Email of the user.do_compute_credits_available: A boolean that specifies where the number of credits available is requested.do_compute_df_credits_infos: A boolean that specifies if a DataFrame containing all credits transactions is asked for.
- input_file_service_keyoptional
The client’s service key, needed if the server is remote. Default is None.
- timeoutint, optional
The timeout duration for the request, in seconds. Default is 60 seconds, as this operation is typically fast and does not involve computation.
- insightsolver.api_utilities.request_cloud_public_keys(computing_source: str, d_client_public_keys: dict, input_file_service_key: str | None = None, timeout: int = 60) dict#
Send the client’s public keys to the server and receive the server’s public keys in response.
This function establishes a secure connection to the specified server (
computing_source) and sends the client’s public keys (d_client_public_keys). The server responds with its own set of public keys, which are returned in a dictionary format.Parameters#
- computing_sourcestr
Where the server is.
- d_client_public_keysdict
A dictionary containing the client’s public keys to be sent to the server. The dictionary format is:
alice_rsa_public_key_pem_base64: Client’s RSA public key, encoded in base64.alice_ecdsa_public_key_pem_base64: Client’s ECDSA public key, encoded in base64.
- input_file_service_keyoptional
The client’s service key, needed if the server is remote. Default is None.
- timeoutint, optional
The timeout duration for the request, in seconds. Default is 60 seconds, as this operation is typically fast and does not involve computation.
Returns#
- dict
A dictionary containing the server’s public keys and a unique session identifier. The dictionary format is as follows:
session_id: A unique identifier for the session.bob_rsa_public_key_pem_base64: Server’s RSA public key, encoded in base64.bob_ecdsa_public_key_pem_base64: Server’s ECDSA public key, encoded in base64.
Example#
# Client's public keys d_client_public_keys = { 'alice_rsa_public_key_pem_base64': '<base64-encoded RSA public key>', 'alice_ecdsa_public_key_pem_base64': '<base64-encoded ECDSA public key>', } # Request server public keys d_server_public_keys = request_cloud_public_keys( computing_source='https://server-address.com', d_client_public_keys=d_client_public_keys, input_file_service_key='client_service_key' ) # Access the session ID and server's public keys session_id = d_server_public_keys['session_id'] bob_rsa_public_key = d_server_public_keys['bob_rsa_public_key_pem_base64'] bob_ecdsa_public_key = d_server_public_keys['bob_ecdsa_public_key_pem_base64']
Raises#
- Exception
If the request fails or the server does not return the expected keys.
- insightsolver.api_utilities.request_cloud_computation(computing_source: str, d_out_transformed: dict, input_file_service_key: str | None = None, timeout: int = 600, verbose: bool = False) dict#
Send the transformed dict to the server for it to compute the rule mining.
Parameters#
- computing_sourcestr
The computing source.
- d_out_transformeddict
The transformed dict to send to the server.
- input_file_service_keystr, optional
The client’s service key, needed if the server is remote. Default is None.
- timeoutint, optional
Timeout for the request, in seconds. Default is 600 seconds, as computation may take longer.
Returns#
- dict
The dict that contains the rule mining results.
- insightsolver.api_utilities.search_best_ruleset_from_API_dict(d_out_original: dict, input_file_service_key: str | None = None, user_email: str | None = None, computing_source: str = 'remote_cloud_function', do_compress_data: bool = True, do_compute_memory_usage: bool = True, verbose: bool = False) dict#
Search for the best ruleset where the computation is done from the server.
Parameters#
- d_out_original: dict
The original dict, pre-transformation, that contains the necessary data for the server to do rule mining.
- input_file_service_key: str, optional
The service key of the client.
- user_email: str, optional
Email of the user (only for use inside a Google Cloud Run container).
- computing_source: str, optional
The computing source.
- do_compress_data: bool, optional
If we want to compress the data to reduce transmission size.
- do_compute_memory_usage: bool, optional
If we want to compute the memory usage.
- verbose: bool, optional
Verbosity.
Returns#
- dict
The dict that contain the output of the rule mining from the server.