Spark Log Parser

The Parser for Apache Spark parses unmodified Spark history server event logs extracting information to a compact format that can more readily be applied to generating Sync predictions. See the user guides for information on where to find event logs. Related tools with their documentation may also be helpful: client_tools.

Parsed logs contain metadata pertaining to your Apache Spark application execution. Particularly, the run time for a task, the amount of data read & written, the amount of memory used, etc. These logs do not contain sensitive information such as the data that your Apache Spark application is processing. Below is an example of the output of the log parser.

Installation

Install the package in this repo to your Python 3 environment, e.g.

pip3 install https://github.com/synccomputingcode/spark_log_parser/archive/main.tar.gz

Parsing your Spark logs

Step 0: Generate the appropriate Apache Spark History Server Event log

If you have not already done so, complete the instructions to download the Apache Spark event log.

Step 1: Parse the log to strip away sensitive information

To process a log file execute the spark-log-parser command with a log file path and a directory in which to store the result like so:
```
spark-log-parser -l <log file location> -r <result directory>
```
The parsed file parsed-<log file name> will appear in the result directory.
Send Sync Computing the parsed log

Email Sync Computing (or upload to the Sync Auto-tuner) the parsed event log.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark Log Parser

Installation

Parsing your Spark logs

Step 0: Generate the appropriate Apache Spark History Server Event log

Step 1: Parse the log to strip away sensitive information

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Spark Log Parser

Installation

Parsing your Spark logs

Step 0: Generate the appropriate Apache Spark History Server Event log

Step 1: Parse the log to strip away sensitive information