Connection Conundrums - Part 1
Recently, I had the fortune of working on a complex deep water well that required many casing liners (including expandable liners) to deliver the well objectives. With such tight clearances, finding suitable casing connections was a challenge and many hours were spent searching and downloading connection specification data sheets and updating WellCat model casing and Connection Utilization Envelope (CUE) data.
There must be a better way
Frustrated, I turned to Python, my go to problem solving companion. With the help of a couple of colleagues who already had the casing and connection data in a convenient format, I was able to generate all of the casing permutations that were feasible with the client’s approved list of casing connections. Investing a couple of hours writing some code saved me many hours of laborious work checking dimensions, calculating clearances and manually inputting data into well design software.
In this post, I’ll step through a similar work flow and demonstrate how truly amazing Wells and Drilling Engineers in the Oil and Gas industry are for being able to manage such large option spaces in their heads.
Acquiring some data
The data I used previously was proprietary, so the first step is getting some suitable new data that I can use for this exercise.
casing connections manufacturers, please allow us to download your data in a more useable format. You’ll thank us for it.
Most connections manufacturers have the facility for downloading connection data from their websites, but this is a laborious process. I started writing some code to automate this, but automating selections from drop down menus is error prone and slow. I gave up.
Instead, I wrote some code to scrape data from the the Tenaris Premium Connections Catalogue using the camelot-py library. Unfortunately, the data tables in the catalogue can’t be read without a degree of fettling and rather than spend hours learning camelot’s attributes, I manually cleaned up the data.
Let’s start with importing the data. For this workflow we’ll first need some libraries imported and while we’re at it let’s set the URL to the online Tenaris catalogue and download it.
import camelot
import wget
import numpy as np
# the URL for the Tenaris catalogue so I can claim I'm not copying anything
url = 'https://www.tenaris.com/media/isfmuyog/tenarishydril-premium-connections-catalog.pdf'
file = wget.download(url)
Importing some data
We now have the catalogue downloaded and accessible through the file variable. Next we’ll write a function for extracting tables from specific pages in a pdf document. This is written as a function as we may wish to call it several times for different connections, especially if the tables are not consistent throughout the document.
def import_tables(file, pages):
"""
Function for extracting table data from a pdf file.
Parameters
----------
file: string
The variable name or the path and filename of the pdf.
pages: string
The pages in the pdf where the tables are located, e.g. '1, 2, 3' or
'1-3'.
Returns
-------
tables: TableList object
"""
tables = camelot.read_pdf(
file, pages=pages,
flavor='stream',
)
return tables
Processing the data
It’s time to process the data in the tables. To do this, we’ll write a function that reads each line from the table, cleans it up into a common length array and appends the array to a list. Because this is the oilfield, we have to deal with fractions and need to write a function for interpreting a fraction and converting it to a float.
First let’s write that function for handling fractions, since this is a helper function to our data extraction function.
def convert_to_float(frac_str):
"""
It wouldn't be the oilfield without a healthy mix of units and fractions.
This function attempts to deal with fractions care of someone on stack
overflow (thank you).
Parameters
----------
frac_str: string
Returns
-------
result: float
Exceptions
----------
ValueError: if the string is not a number or a fraction
"""
try:
return float(frac_str)
except ValueError:
num, denom = frac_str.split('/')
try:
leading, num = num.split(' ')
whole = float(leading)
except ValueError:
whole = 0
frac = float(num) / float(denom)
result = whole - frac if whole < 0 else whole + frac
return result
And now the data extraction function.
def extract_data(table, start_row=1, end_row=None):
"""
This is where the real fun starts. I'm not proud of this processing
workflow, but it works and cost me less time than arseing about with
camelot's settings, especially since there's subtle differences between
the tables in Tenaris' catalogue.
Parameters
----------
table: TableList object
start_row: int
If you're confident that the table header is a specific number of rows
then enter that here and skip these rows.
end_row: int or None
If you only want to read a finite number of rows. The default 'None'
reads them all.
Returns
-------
datas: list of (n, m) arrays
"""
# Define a list of characters in the table that we want to ignore
characters = ['(', ')', '*']
datas = []
skip_header = True
for i, row in enumerate(table.data[start_row:end_row]):
if skip_header:
# Read the first item of each row to determine if we're passed the
# header and into the table data. We know the first row of data
# begins with a number (or a fraction).
try:
convert_to_float(row[0])
except ValueError:
continue
skip_header = False
data = []
for j, item in enumerate(row):
# We don't want new line characters
if '\n' in item:
item = item.split('\n')[-1]
# Skip and unwanted characters
if any(c in item for c in characters):
continue
# Manage blank data - there's rogue columns in the read data
if item == '':
if j == 0:
data.append(datas[-1][j])
else:
continue
# If it's a number add to the data, otherwise add a NaN
else:
try:
data.append(
convert_to_float(item)
)
except ValueError:
data.append(np.nan)
# Remove and rows of NaNs
if np.all(np.isnan(np.array(data)[1:])):
continue
else:
datas.append(data)
return datas
Applying a template
Before we write our main function, we’re going to create a dictionary of table headers. This isn’t ideal, but having reviewed the connections catalogue, the table headers have differences between the connections and I’d rather state these explicitly than rely on code to interpret them. Sometimes you just have to pull your sleeves up and get dirty.
We’ll structure the dictionary with the connection name as the main key followed by the page numbers in the catalogue where the associated tables are located. We’ll then add a headers
key where we’ll list, in the correct order, the table headers we will assign to our imported data.
Creating the dictionary in this way means we can have an entry for each connection and loop over this dictionary in our main function.
Here’s the dictionary, called template
with an entry for the Wedge 521
connection.
# I'm not proud of manually typing these headers - and if this must
# be done it'd be better to place all this inside a yaml file and import it
# but I struggled to get camelot to interpret these
template = {
'Wedge 521': {
'pages': '49, 50',
'connection_type': 'threaded',
'headers': [
'size', 'nominal_weight',
'pipe_body_wall_thickness', 'pipe_body_inside_diameter',
'pipe_body_drift', 'box_outside_diameter',
'connection_inside_diameter', 'make_up_loss',
'critical_section_area', 'tensile_efficiency',
'compression_efficiency',
'joint_yield_55', 'joint_yield_80', 'joint_yield_90',
'joint_yield_95', 'joint_yield_110', 'joint_yield_125',
]
}
}
So now it’s time for our main function. It’s good practice to write your main code inside a function since it can help with debugging and you can then do some performance checking easier.
Tying together our code
def main():
# initiate our catalogue dictionary
catalogue = {}
# loop over our template dictionary we made with the table headers
for k, v in template.items():
# import the data
data = []
temp = import_tables(file, pages=v['pages'])
# process the data
for t in temp:
data.extend(extract_data(t))
# some of the data might not import correctly - for now we're just
# going to ignore these rows (which may be longer or shorter)
length = [len(l) for l in data]
filtered_data = [
d for d in data
if len(d) == max(set(length), key=length.count)
]
# just in case some rows with NaNs imported, we'll mask them off and
# filter them out of the data
mask = np.all(np.isnan(filtered_data), axis=1)
catalogue[k] = np.vstack(np.array(filtered_data)[~mask])
return catalogue
We’re now ready to run it. This is done adding the following lines at the bottom of your script. I commonly add a simple print
statement more to help me with debugging as it gives me somewhere to place a breakpoint in my editor.
Running the code
if __name__ == '__main__':
catalogue = main()
print("Done")
Congratulations, we’ve just imported the data from the Tenaris Premium Connections catalogue for the Wedge 521 connection.
Results
If you want to QAQC the data you can do so with the help of pandas, creating a DataFrame
with the following line of code:
df = pd.DataFrame(catalogue['Wedge 521'], columns=template['Wedge 521']['headers'])
Below is the first five rows of our freshly imported data, created with the help of tabulate, a dependency that assists pandas
to nicely format the table in markdown:
print(df.head().to_markdown())
size | nominal_weight | pipe_body_wall_thickness | pipe_body_inside_diameter | pipe_body_drift | box_outside_diameter | connection_inside_diameter | make_up_loss | critical_section_area | tensile_efficiency | compression_efficiency | joint_yield_55 | joint_yield_80 | joint_yield_90 | joint_yield_95 | joint_yield_110 | joint_yield_125 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | 9.5 | 0.226 | 3.548 | 3.423 | 4.103 | 3.457 | 3.62 | 1.646 | 61.4 | 83.6 | 91 | 132 | 148 | 156 | 181 | 206 |
1 | 4 | 11 | 0.262 | 3.476 | 3.351 | 4.162 | 3.401 | 3.62 | 2.029 | 66 | 85.9 | 112 | 162 | 183 | 193 | 223 | 254 |
2 | 4 | 11.6 | 0.286 | 3.428 | 3.303 | 4.2 | 3.353 | 3.62 | 2.284 | 68.4 | 86.9 | 126 | 183 | 205 | 217 | 251 | 285 |
3 | 4.5 | 9.5 | 0.205 | 4.09 | 3.965 | 4.629 | 4.04 | 2.74 | 1.678 | 60.7 | 76.8 | 92 | 134 | 151 | 160 | 185 | 210 |
4 | 4.5 | 10.5 | 0.224 | 4.052 | 3.927 | 4.651 | 3.976 | 3.62 | 1.821 | 60.5 | 82.6 | 100 | 146 | 164 | 173 | 200 | 228 |
Now that we have some data, in the next post we’ll use it to determine how many permutations of casing design exist!
Feel free to download the code.