- 27th Jul 2024
- 21:11 pm
- Admin
In this assignment you need to do the following Task - Option Quote Parsing:
As standard across the industry quotes for products are sent via real-time feeds and files. Especially in fixed income and the OTC market these quotes are not standardized and vary drastically amongst brokers. We would like you to apply this by writing a parsing function in python that will read each message in the enclosed zip file that begin with ‘hycdx_option_quotes_N.txt’ files and extract out the option prices on the High Yield Credit Default Swap Index (HYCDX) given a time stamp and index reference price for each option’s expiry and strike detail. Note the different ways each unique sender will organize the pricing information. This is indicative on how various brokers send quotes that we need to process.
Notice the difference in the terms ‘Expiry’, ‘EXPIRY’ or ‘Exp’ to identify the option expiration differs in each file. As another example the terms ‘REF’ and ‘Ref’ to identify what underlying reference price the options are priced off of. We suggest you use regular expressions to help account for these differences. Most senders will put all prices for options across a single strike on one line. The broker in ‘hycdx_option_quotes_4.txt’ does not follow this format where each line in the message has prices for a put or call option on a single strike.
Prices are not normalized, some brokers represent them in cents and others in dollar terms. Please normalize all prices in cents so that we can adjust all to a single reference underlying price and compare across brokers. Also notice that if you want to use Call / Put then that translates to PAY=Put and RCV=Call.
Ideally we are looking for an elegant generic solution that can handle the multiple file formats. As you are probably aware, columns may shift or additional formats may be introduced from additional brokers. So a parser that can easily adapt and handle these quotes will be the most flexible. Please comment your code as you develop so we can follow your logic.
Store the following below fields from each message. You do not need to perform any calculation or validation of the fields, that would be an extension in the real-world, but for this exercise you may skip. Some of the fields for a given quote are not present so you can simply leave them blank in your structure. As an example some quotes only have a bid/ask, or only have a quote for the call option or put. The desired output is also attached in ‘task1_output.xlsx’. We understand this is an academic and interview exercise, so exact matches and results may differ slightly. We are most focused on how your explain your approach and design the solution.
Free Assignment Solution - Python Parsing Function for HYCDX Option Quotes
{
"cells": [
{
"cell_type": "code",
"source": [
"data = {\"Date\": [],\n",
" \"Time\": [],\n",
" \"Firm\": [],\n",
" \"Expiration\": [],\n",
" \"Option Type\": [],\n",
" \"Strike Px\": [],\n",
" \"Strike Spd\": [],\n",
" \"Bid Price\": [],\n",
" \"Ask Price\": [],\n",
" \"Delta\": [],\n",
" \"Implied Vol Spd\": [],\n",
" \"Implied Vol bps\": [],\n",
" \"Implied Vol Px\": [],\n",
" \"Ref Px\": []}"
],
"metadata": {
"id": "aojTnraqJuqh"
},
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"source": [
"header_map = {\"Strike Px\": [\"K\", \"Stk\"],\n",
" \"Strike Spd\": [\"Sprd\"],\n",
" \"Put\": [\"Pay\", \"PAY\", \"Puts\", \"DEC PAY\"],\n",
" \"Call\": [\"Rec\", \"RCV\", \"Calls\", \"DEC RCV\"],\n",
" \"Implied Vol Spd\": [\"Vol\", \"SprdVol\"],\n",
" \"Implied Vol bps\": [\"Vol Bpd\", \"be\", \"BE\"],\n",
" \"Delta\": [\"Delta\", \"Dlt\", \"Del\"],\n",
" \"Implied Vol Px\": [\"Prc Vol\"]}\n",
"\n",
"rev_header_map = {}\n",
"for key in header_map:\n",
" for val in header_map[key]:\n",
" rev_header_map[val] = key"
],
"metadata": {
"id": "fjSC88uKKpeH"
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import dateutil\n",
"import dateutil.parser\n",
"import re\n",
"import pandas as pd\n",
"import glob\n",
"\n",
"def DTF(s):\n",
" '''\n",
" s is the first line in the txt file (string data type)\n",
" This function is used to extract Date, Time and Firm name\n",
" '''\n",
" return s.split()[3], s.split()[4], s.split()[1]\n",
"\n",
"def exp_date(lines, idx=0, ex=1):\n",
" '''\n",
" lines is the entire dataset\n",
" This function extracts Expiry Date\n",
" Action:\n",
" It finds the index of the first occurence of \"|\" character in lines\n",
" Then the line (say L) just before \"|\" shall contain Expiry date\n",
" Then using dateutil.parser, we extract the date from line L.\n",
" '''\n",
" if ex:\n",
" for i, s in enumerate(lines):\n",
" if '|' in s:\n",
" idx = i-1\n",
" break\n",
" ex_dt = None\n",
" for c in lines[idx].split():\n",
" try:\n",
" ex_dt = dateutil.parser.parse(c)\n",
" return ex_dt, idx+1\n",
" except:\n",
" pass\n",
" return ex_dt, idx+1\n",
"\n",
"def check_hdr(hdr_list, hdrL, tp):\n",
" flag = 1\n",
" for h in hdr_list:\n",
" fg = 1\n",
" for h0 in hdrL:\n",
" if tp!=4:\n",
" if h in h0:\n",
" fg = 0\n",
" break\n",
" else:\n",
" if h0 in h:\n",
" fg = 0\n",
" break\n",
" if fg:\n",
" flag = 0\n",
" break\n",
" return flag\n",
"\n",
"def find_hdr(lines, idx, header_map, tp=0):\n",
" # Using regex to find the headers of the output csv file\n",
" hdr = ''.join([c if c.isalpha() else '' if c.isdigit() else '' if c=='/' else ' ' for c in lines[idx]]).lstrip().rstrip()\n",
" hdr_list = []\n",
" for key in list(header_map.keys())[::-1]:\n",
" for val in header_map[key]:\n",
" if val in hdr:\n",
" hdr_list.append(val)\n",
"\n",
" hdrL = re.split(r'\\s{2,}', hdr)\n",
" flag = check_hdr(hdr_list, hdrL, tp)\n",
" if flag==0:\n",
" hdrL = re.split(r'\\s{1,}', hdr)\n",
" if hdrL[-1]=='':\n",
" del hdrL[-1]\n",
" if hdrL[0]=='':\n",
" del hdrL[0]\n",
" return hdrL\n",
"\n",
"def count_times(hdrL, hdr):\n",
" count = 0\n",
" for idx, hdr_ in enumerate(hdrL):\n",
" if hdr_ == hdr:\n",
" count += 1\n",
" \n",
" return 1 if count > 1 else 2\n",
"\n",
"def process1(hdrL, row, rev_header_map, data, D, T, F, ex_dt, ref_Px, N):\n",
" # Process the data extracted from txt into the one provided in CSV\n",
" # Put and Call will make 2 different rows\n",
" for i in range(2):\n",
" data[\"Date\"].append(D.strftime('%d/%m/%Y'))\n",
" data[\"Time\"].append(T)\n",
" data[\"Firm\"].append(F)\n",
" data[\"Ref Px\"].append(ref_px)\n",
" data[\"Expiration\"].append(ex_dt.strftime('%d/%m/%Y'))\n",
" data[\"Option Type\"].append(\"P\")\n",
" data[\"Option Type\"].append(\"C\")\n",
" visited = {\"Date\", \"Time\", \"Firm\", \"Expiration\", \"Option Type\", \"Ref Px\"}\n",
" ct = 0\n",
" if N==4:\n",
" hdrL = hdrL[6:] + hdrL[:6]\n",
" row = row[7:] + row[:7]\n",
" sc = 1\n",
" for idx, hdr in enumerate(hdrL):\n",
" if hdr in rev_header_map:\n",
" if rev_header_map[hdr]==\"Put\" or rev_header_map[hdr]==\"Call\":\n",
" visited.add(\"Bid Price\")\n",
" visited.add(\"Ask Price\")\n",
" try:\n",
" float(row[ct])\n",
" if float(row[ct])>=10:\n",
" sc = 100\n",
" data[\"Bid Price\"].append(float(row[ct])/sc)\n",
" ct += 1\n",
" except:\n",
" data[\"Bid Price\"].append('')\n",
" ct += 1\n",
" try:\n",
" float(row[ct])\n",
" if float(row[ct])>10:\n",
" sc = 100\n",
" data[\"Ask Price\"].append(float(row[ct])/sc)\n",
" ct += 1\n",
" except:\n",
" data[\"Ask Price\"].append('')\n",
" ct += 1\n",
" else:\n",
" visited.add(rev_header_map[hdr])\n",
" rep = count_times(hdrL, hdr)\n",
" for i in range(rep):\n",
" try:\n",
" float(row[ct])\n",
" data[rev_header_map[hdr]].append(row[ct])\n",
" except:\n",
" data[rev_header_map[hdr]].append('')\n",
" ct += 1\n",
" else:\n",
" ct += 1\n",
" for key in data:\n",
" if key not in visited:\n",
" for i in range(2):\n",
" data[key].append('')\n",
" return data\n",
"\n",
"N_list = []\n",
"for file in glob.glob(\"*.txt\"):\n",
" if file[:-5]==\"hycdx_option_quotes_\":\n",
" N_list.append(int(file[-5]))\n",
"N_list = sorted(N_list)\n",
"\n",
"for N in N_list:\n",
" lines = [line.rstrip().replace(u'\\xa0', u' ') for line in open(\"hycdx_option_quotes_\" + str(N) + \".txt\")]\n",
" lines = [w for w in lines if w!='']\n",
" lines_str = \"\\n\".join(lines)\n",
" D, T, F = DTF(lines[0])\n",
" D = dateutil.parser.parse(D)\n",
"\n",
" # Using Regex to find the Ref Px variable\n",
"\n",
" ref_px = re.findall('ref[:]?\\ ([-+]?(?:\\d*\\.\\d+|\\d+))', lines_str.lower())[0]\n",
"\n",
" ex_dt, idx = exp_date(lines)\n",
" hdrL = find_hdr(lines, idx, header_map, N)\n",
" idx += 1\n",
"\n",
" while idx < len(lines):\n",
" if \"|\" in lines[idx]:\n",
" # extract all numbers from row using regex\n",
" s = lines[idx].replace(\"|\", \"\")\n",
" s = s.replace(\"/\", \" \").replace(\"[\", \" \").replace(\"]\", \" \").replace(\"{\", \" \").replace(\"}\", \" \")\n",
" s = s.replace(\"%\", \" \")\n",
" # Splitting into numbers using regex\n",
" row = re.split(r'\\s{1,}', s)\n",
" if row[-1]=='':\n",
" del row[-1]\n",
" if row[0]=='':\n",
" del row[0]\n",
" data = process1(hdrL, row, rev_header_map, data, D, T, F, ex_dt, ref_px, N)\n",
" idx += 1\n",
" continue\n",
" else:\n",
" ex_dt, idx = exp_date(lines, idx=idx, ex=0)\n",
" if ex_dt==None:\n",
" break\n",
" idx += 1"
],
"metadata": {
"id": "SmjCz5vaiw-c"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "code",
"source": [
"df = pd.DataFrame(data)\n",
"df.to_csv('output.csv', index=False)"
],
"metadata": {
"id": "eeBAfMfFbmNS"
},
"execution_count": 8,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"colab": {
"name": "MonthlyExpenses.ipynb",
"provenance": [],
"collapsed_sections": []
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Get the best Python Parsing Function for HYCDX Option Quotes assignment help and tutoring services from our experts now!
This sample Python assignment solution has been successfully completed by our team of Python programmers. The solutions provided are designed exclusively for research and reference purposes. If you find value in reviewing the reports and code, our Python tutors would be delighted.
-
For a comprehensive solution package including code, reports, and screenshots, please visit our Python Assignment Sample Solution page.
-
Contact our Python experts for personalized online tutoring sessions focused on clarifying any doubts related to this assignment.
-
Explore the partial solution for this assignment available in the blog above for further insights.
About The Author - Jordan Lee
Jordan Lee is a data analyst specializing in financial data processing and analysis. For this assignment, Jordan tackled the challenge of parsing option quotes for the High Yield Credit Default Swap Index (HYCDX). The task involved developing a Python function to handle and normalize data from various file formats.