My intention for this post is simple: for those of you that use no-code tools like Bubble.io to build your site, make sure you’re not unintentionally exposing your site’s user emails to the client. That was seemingly the case for First Round Capital’s Angel directory, a site where angel investors can signup and showcase their profile to founders looking for an angel investment. But when I reached out to First Round Capital with my discovery, they informed me that what I found was in fact intentional, although, I find it odd that if they did intend on sharing their angel investors’ emails, they would do so explicitly on the user’s profile rather than make you go digging for them. In any event, here is how I was able to write a script to get all of the personal emails of investors from the First Round Angel directory.

First Round Angel Directory

When I first stumbled onto the First Round Angel directory, I thought it was somewhat of a useless site considering they don’t give you the ability to connect with any of the investors. It’s basically just a list of names. So I did what most developers would do and I opened up my browser developer tools to see what I could find. To my amusement, the personal emails the investors used when signing up to the directory were being exposed in one of the network calls that Bubble.io uses to fetch the results of a search query.

image

Getting the Data with Python

Once I learned what POST request was being called to make the request, I copied the request payload for each page which were all different since the results were paginated. I stored all of the request payloads in a json file and imported them programmatically to use in my script. Before you cringe at my Python, just know that this was my first time building anything with it and I figured it was a good project to learn with.

url = "https://angels.firstround.com/elasticsearch/msearch"
cookies = { ... }
headers = { ... }

def aggregate_contacts(data, cookies, headers, seen={}):
    response = requests.post(url, data=data, cookies=cookies, headers=headers)
    json = response.json()['responses']
    list = []
    for i in range(0, len(json)):
        result = json[i]
        entries = result['hits']['hits']

        for val in entries:
            email1 = val['_source'].get('profile_2___email_text')
            email2 = val['_source'].get('authentication')
            full_name = val['_source'].get('first_last_text')
            img = val['_source'].get('picture_image')
            company = val['_source'].get('profile___company_text')
            title = val['_source'].get('job_title_text')

            if img is not None:
                if img[0:5] != 'https':
                    img = 'https:' + img

            bio = val['_source'].get('profile___bio_text')
            linkedin = val['_source'].get('profile_2___linkedin_text')

            if company is None:
                company = ''
            if title is None:
                title = ''
            if bio is None:
                bio = ''
            if linkedin is None:
                linkedin = ''
            elif linkedin[0:5] != 'https':
                linkedin = 'https://' + linkedin

            if email1 or email2 is not None:
                if email1:
                    email = email1
                if email2:
                    email = email2.get('email').get('email')

                if full_name not in seen:
                    list.append({'name': full_name, 'email': email,
                                'img': img, 'company': company, 'title': title, 'bio': bio, 'linkedin': linkedin})

                seen[full_name] = email
    return list

Outputting the Data to Excel

I then used the xlsxwriter package to output the results neatly into an excel spreadsheet.

def create_xlsx_file(file_path: str, headers: dict, items: list):
    with Workbook(file_path) as workbook:
        worksheet = workbook.add_worksheet()
        worksheet.write_row(row=0, col=0, data=headers.values())
        header_keys = list(headers.keys())
        for index, item in enumerate(items):
            row = map(lambda field_id: item.get(field_id, ''), header_keys)
            worksheet.write_row(row=index + 1, col=0, data=row)

xl_headers = {
    'name': 'Full Name',
    'email': 'Email',
    'img': 'Picture',
    'company': 'Company',
    'title': 'Title',
    'bio': 'Bio',
    'linkedin': 'LinkedIn'
}

with open('payloads.json') as json_file:
    result = []
    seen = {}
    pages = json.load(json_file)
    for page in pages:
        data = pages[page]
        items = aggregate_contacts(data, cookies, headers, seen)
        result.extend(items)

    print(result)
    create_xlsx_file("first_round_angels.xlsx", xl_headers, result)

Results

The final result was a formatted file of all the angel investors and their publicly available information. I haven’t sent emails to any of them…yet.

image

Conclusion

If you use a no-code site builder, just make sure you don’t accidently expose sensitive information on the client. These tools are great for getting something up and running quickly, but an inversion of control means there are decisions made for you that you may not be fully aware of.