Connection Conundrums

Recently, I had the fortune of working on a complex deep water well that required many casing liners (including expandable liners) to deliver the well objectives. With such tight clearances, finding suitable casing connections was a challenge and many hours were spent searching and downloading connection specification data sheets and updating WellCat model casing and Connection Utilization Envelope (CUE) data.

There must be a better way

Frustrated, I turned to Python, my go to problem solving companion. With the help of a couple of colleagues who already had the casing and connection data in a convenient format, I was able to generate all of the casing permutations that were feasible with the client’s approved list of casing connections. Investing a couple of hours writing some code saved me many hours of laborious work checking dimensions, calculating clearances and manually inputting data into well design software.

In this post, I’ll step through a similar work flow and demonstrate how truly amazing Wells and Drilling Engineers in the Oil and Gas industry are for being able to manage such large option spaces in their heads.

Acquiring some data

The data I used previously was proprietary, so the first step is getting some suitable new data that I can use for this exercise.

casing connections manufacturers, please allow us to download your data in a more useable format. You’ll thank us for it.

Most connections manufacturers have the facility for downloading connection data from their websites, but this is a laborious process. I started writing some code to automate this, but automating selections from drop down menus is error prone and slow. I gave up.

Instead, I wrote some code to scrape data from the the Tenaris Premium Connections Catalogue using the camelot-py library. Unfortunately, the data tables in the catalogue can’t be read without a degree of fettling and rather than spend hours learning camelot’s attributes, I manually cleaned up the data.

Let’s start with importing the data. For this workflow we’ll first need some libraries imported and while we’re at it let’s set the URL to the online Tenaris catalogue and download it.

import camelot
import wget
import numpy as np

# the URL for the Tenaris catalogue so I can claim I'm not copying anything
url = 'https://www.tenaris.com/media/isfmuyog/tenarishydril-premium-connections-catalog.pdf'
file = wget.download(url)

Importing some data

We now have the catalogue downloaded and accessible through the file variable. Next we’ll write a function for extracting tables from specific pages in a pdf document. This is written as a function as we may wish to call it several times for different connections, especially if the tables are not consistent throughout the document.

def import_tables(file, pages):
    """
    Function for extracting table data from a pdf file.

    Parameters
    ----------
    file: string
        The variable name or the path and filename of the pdf.
    pages: string
        The pages in the pdf where the tables are located, e.g. '1, 2, 3' or
        '1-3'.
    
    Returns
    -------
    tables: TableList object
    
    """
    tables = camelot.read_pdf(
        file, pages=pages,
        flavor='stream',
    )
    return tables

Processing the data

It’s time to process the data in the tables. To do this, we’ll write a function that reads each line from the table, cleans it up into a common length array and appends the array to a list. Because this is the oilfield, we have to deal with fractions and need to write a function for interpreting a fraction and converting it to a float.

First let’s write that function for handling fractions, since this is a helper function to our data extraction function.

def convert_to_float(frac_str):
    """
    It wouldn't be the oilfield without a healthy mix of units and fractions.
    This function attempts to deal with fractions care of someone on stack
    overflow (thank you).

    Parameters
    ----------
    frac_str: string

    Returns
    -------
    result: float

    Exceptions
    ----------
    ValueError: if the string is not a number or a fraction
    """
    try:
        return float(frac_str)
    except ValueError:
        num, denom = frac_str.split('/')
        try:
            leading, num = num.split(' ')
            whole = float(leading)
        except ValueError:
            whole = 0
        frac = float(num) / float(denom)
        result = whole - frac if whole < 0 else whole + frac

        return result

And now the data extraction function.

def extract_data(table, start_row=1, end_row=None):
    """
    This is where the real fun starts. I'm not proud of this processing
    workflow, but it works and cost me less time than arseing about with
    camelot's settings, especially since there's subtle differences between
    the tables in Tenaris' catalogue.

    Parameters
    ----------
    table: TableList object
    start_row: int
        If you're confident that the table header is a specific number of rows
        then enter that here and skip these rows.
    end_row: int or None
        If you only want to read a finite number of rows. The default 'None'
        reads them all.
    
    Returns
    -------
    datas: list of (n, m) arrays

    """
    # Define a list of characters in the table that we want to ignore
    characters = ['(', ')', '*']

    datas = []
    skip_header = True
    for i, row in enumerate(table.data[start_row:end_row]):
        if skip_header:
            # Read the first item of each row to determine if we're passed the
            # header and into the table data. We know the first row of data
            # begins with a number (or a fraction).
            try:
                convert_to_float(row[0])
            except ValueError:
                continue
        skip_header = False
        data = []
        for j, item in enumerate(row):
            # We don't want new line characters
            if '\n' in item:
                item = item.split('\n')[-1]
            # Skip and unwanted characters
            if any(c in item for c in characters):
                continue
            # Manage blank data - there's rogue columns in the read data
            if item == '':
                if j == 0:
                    data.append(datas[-1][j])
                else:
                    continue
            # If it's a number add to the data, otherwise add a NaN
            else:
                try:
                    data.append(
                        convert_to_float(item)
                    )
                except ValueError:
                    data.append(np.nan)
        # Remove and rows of NaNs
        if np.all(np.isnan(np.array(data)[1:])):
            continue
        else:
            datas.append(data)
    return datas

Applying a template

Before we write our main function, we’re going to create a dictionary of table headers. This isn’t ideal, but having reviewed the connections catalogue, the table headers have differences between the connections and I’d rather state these explicitly than rely on code to interpret them. Sometimes you just have to pull your sleeves up and get dirty.

We’ll structure the dictionary with the connection name as the main key followed by the page numbers in the catalogue where the associated tables are located. We’ll then add a headers key where we’ll list, in the correct order, the table headers we will assign to our imported data.

Creating the dictionary in this way means we can have an entry for each connection and loop over this dictionary in our main function.

Here’s the dictionary, called template with an entry for the Wedge 521 connection.

# I'm not proud of manually typing these headers - and if this must
# be done it'd be better to place all this inside a yaml file and import it
# but I struggled to get camelot to interpret these
template = {
    'Wedge 521': {
        'pages': '49, 50',
        'connection_type': 'threaded',
        'headers': [
            'size', 'nominal_weight',
            'pipe_body_wall_thickness', 'pipe_body_inside_diameter',
            'pipe_body_drift', 'box_outside_diameter',
            'connection_inside_diameter', 'make_up_loss',
            'critical_section_area', 'tensile_efficiency',
            'compression_efficiency',
            'joint_yield_55', 'joint_yield_80', 'joint_yield_90',
            'joint_yield_95', 'joint_yield_110', 'joint_yield_125',
        ]
    }
}

So now it’s time for our main function. It’s good practice to write your main code inside a function since it can help with debugging and you can then do some performance checking easier.

Tying together our code

def main():
    # initiate our catalogue dictionary
    catalogue = {}

    # loop over our template dictionary we made with the table headers
    for k, v in template.items():
        # import the data
        data = []
        temp = import_tables(file, pages=v['pages'])
        # process the data
        for t in temp:
            data.extend(extract_data(t))
        # some of the data might not import correctly - for now we're just
        # going to ignore these rows (which may be longer or shorter)
        length = [len(l) for l in data]
        filtered_data = [
            d for d in data
            if len(d) == max(set(length), key=length.count)
        ]
        # just in case some rows with NaNs imported, we'll mask them off and
        # filter them out of the data
        mask = np.all(np.isnan(filtered_data), axis=1)
        catalogue[k] = np.vstack(np.array(filtered_data)[~mask])

    return catalogue

We’re now ready to run it. This is done adding the following lines at the bottom of your script. I commonly add a simple print statement more to help me with debugging as it gives me somewhere to place a breakpoint in my editor.

Running the code

if __name__ == '__main__':
    catalogue = main()

print("Done")

Congratulations, we’ve just imported the data from the Tenaris Premium Connections catalogue for the Wedge 521 connection.

Results

If you want to QAQC the data you can do so with the help of pandas, creating a DataFrame with the following line of code:

df = pd.DataFrame(catalogue['Wedge 521'], columns=template['Wedge 521']['headers'])

Below is the first five rows of our freshly imported data, created with the help of tabulate, a dependency that assists pandas to nicely format the table in markdown:

print(df.head().to_markdown())

	size	nominal_weight	pipe_body_wall_thickness	pipe_body_inside_diameter	pipe_body_drift	box_outside_diameter	connection_inside_diameter	make_up_loss	critical_section_area	tensile_efficiency	compression_efficiency	joint_yield_55	joint_yield_80	joint_yield_90	joint_yield_95	joint_yield_110	joint_yield_125
0	4	9.5	0.226	3.548	3.423	4.103	3.457	3.62	1.646	61.4	83.6	91	132	148	156	181	206
1	4	11	0.262	3.476	3.351	4.162	3.401	3.62	2.029	66	85.9	112	162	183	193	223	254
2	4	11.6	0.286	3.428	3.303	4.2	3.353	3.62	2.284	68.4	86.9	126	183	205	217	251	285
3	4.5	9.5	0.205	4.09	3.965	4.629	4.04	2.74	1.678	60.7	76.8	92	134	151	160	185	210
4	4.5	10.5	0.224	4.052	3.927	4.651	3.976	3.62	1.821	60.5	82.6	100	146	164	173	200	228

Now that we have some data, in the next post we’ll use it to determine how many permutations of casing design exist!

Feel free to download the code.