sparknlp_jsl.transpiler.python_2_scala#

The following script provides a set of functions for transpiling Python code to Scala, focusing on specific adaptations required for Spark and SparkNLP libraries. It includes functionalities such as converting Python dictionaries to Scala Maps, converting class names,generating import sections for Spark-related libraries, and executing various code transformations.

Module Contents#

Functions#

anonymize_script(script)

Anonymize string values in the script.

break_line_after_backslash(script)

convert(py_code)

Convert Python code to Scala code.

convert_class_name(py_code, classes)

Converts Python class names to Scala class names based on the provided mapping.

convert_python_dict_to_scala_script(py_code)

Converts a Python dictionary in the provided code to Scala Map.

find_key_values(text, dictionary)

Finds key-value pairs from the given text that match a provided dictionary.

get_data_dict()

Load data dictionary from external sources.

get_import_section(py_code, data_dict)

Generates the import section for Spark-related libraries based on the provided Python code.

prepare_scala_code(is_spark_initialized, ...)

Prepare Scala code for build by adding import sections and, if specified, Spark session configuration.

remove_blank_lines(script)

remove blank lines.

restore_sensitive_values(anonymized_script, ...)

Restore sensitive values in the anonymized script.

run_transpiler(py_code)

Execute all conversion and processing steps.

anonymize_script(script)#

Anonymize string values in the script.

Parameters:

script (str) – The input script.

Returns:

Anonymized script and a list of sensitive values.

Return type:

Tuple[str, List[str]]

break_line_after_backslash(script)#
convert(py_code)#

Convert Python code to Scala code.

Parameters:

py_code (str) – The input Python code.

Returns:

The converted Scala code.

Return type:

str

convert_class_name(py_code, classes)#

Converts Python class names to Scala class names based on the provided mapping.

Parameters:
  • py_code (str) – Python code containing class names.

  • classes (dict) – Mapping of Python class names to Scala class names.

Returns:

Scala code with class names converted.

Return type:

str

convert_python_dict_to_scala_script(py_code)#

Converts a Python dictionary in the provided code to Scala Map.

Parameters:

py_code (str) – Python code containing a dictionary.

Returns:

Scala code with the dictionary converted to a Map.

Return type:

str

find_key_values(text, dictionary)#

Finds key-value pairs from the given text that match a provided dictionary.

Parameters:
  • text (str) – Text to search for key-value pairs.

  • dictionary (dict) – Dictionary to match key-value pairs.

Returns:

Key-value pairs found in the text.

Return type:

dict

get_data_dict()#

Load data dictionary from external sources.

Returns: - dict: Data dictionary.

get_import_section(py_code, data_dict)#

Generates the import section for Spark-related libraries based on the provided Python code.

Parameters:
  • py_code (str) – Python code.

  • data_dict (dict) – Dictionary containing data for import sections.

Returns:

Import section for Spark-related libraries.

Return type:

str

prepare_scala_code(is_spark_initialized, import_section, scala_code)#

Prepare Scala code for build by adding import sections and, if specified, Spark session configuration.

Parameters:
  • is_spark_initialized (bool) – Flag indicating whether Spark session configuration should be added.

  • import_section (bool) – Flag indicating whether import sections should be added.

  • scala_code (str) – Scala code to be prepared for build.

Returns:

Scala code ready for build.

Return type:

str

Notes

If ‘is_spark_initialized’ is True, the function adds Spark session configuration to the Scala code. If ‘import_section’ is True, the function adds import sections to the Scala code.

remove_blank_lines(script)#

remove blank lines.

Parameters:

script (str) – The input script.

Returns:

The cleaned script.

Return type:

str

restore_sensitive_values(anonymized_script, sensitive_values)#

Restore sensitive values in the anonymized script.

Parameters:
  • anonymized_script (str) – The anonymized script.

  • sensitive_values (List[str]) – List of sensitive values.

Returns:

The script with restored sensitive values.

Return type:

str

run_transpiler(py_code)#

Execute all conversion and processing steps.

Parameters:

py_code (str) – The input Python code.

Returns:

The final processed Scala code.

Return type:

str