Category: Uncategorized

Test

A couple of housekeeping things: I’ve been pretty busy lately!

I gave a keynote at RStudio’s 2021 Conference. You can find all the slides here and it’ll be up on RStudio shortly! https://twitter.com/vboykis/status/1352630279387676673
I gave an interview for Coralie’s Parenting in Tech project project https://twitter.com/cco_app/status/1352184845104009217

On to the newsletter!

Portrait of a Young Man, Antonello da Messina, 1478

It’s a cloudy winter day, and you’re looking out from your loft in TriBeCa, finger languidly hovering over the “ORDER” button on Postmates. You’re between two Zoom calls, one that’s the final work call for the day, and one that’s a birthday party for your aunt in Westchester. You are tired to death of Zoom calls and of 2020, but you’re even more tired of the grind.

You wish you’d taken the ride on the options train at that startup you worked at just out of college, back in the day. You’d be a millionaire by now. You could buy that house up in Saugerties, fix it up. You could have a vegetable garden. You wouldn’t need to do Postmates every day, and you damn sure wouldn’t need to write code. Maybe you’d take up woodworking.

You idly google your ex-co-founders. What are they up to now? Looks like one had to testify in front of Congress for mishandling customer data in that data breach last year. One got in hot water last week because he suggested on a supposedly off-the-record Clubhouse chat that he doesn’t care about ethics in AI. The other one doesn’t leave much of a trace online (he was your ops guy), but from what you do see, it looks like he’s been caught up in some Eastern European anonymous user content farm manipulation.

Is it even worth it? Not for less than $50 mil all on the table, you decide. If you could go back in time, now, knowing what you know, what would you do?

All of a sudden, the Zoom conference call’s sound warps around you, the room spins, and you’re travelling through a time wormhole. You come out of the wormhole, your head spinning. The odors of the worn-down ranch-style house in Palo Alto assault you. You are, somehow, back in 2011. You walk, incredulous, through the house, the same three bedrooms, gruesomely partitioned into halves with makeshift mattresses on the floor. The same Matrix posters, the same soft California evening light reflecting off the neglected pool in the back yard.

You walk into the kitchen, which was also the conference room, and find yourself sitting down to the same MDF Ikea table, littered with half-full sodas and Hot Pocket wrappers. Awolnation blasts from an unseen speaker on an unseen laptop you are sure is open to the Rails codebase. The three boys at the table look up at you.

“It’s time to build,” says Chad, taking a bite of the pizza stuck to the side of the table.

You orient yourself, trying to remember. “What are we building,” you ask.

They all look at you like you have two heads. “Bassbook,” Paul says. “We’re going to build a social network that connects people based on the music they listen to. You know, the thing we’ve been building for the past five months.”

You remember this. This was all you lived and breathed for a year. Bassbook. Free music. Free knowledge. And maybe, hookups. At least, that was all the dream.

But now, you have actually travelled back in time from 2021 and, from your vantage point, a social network looks like a Bad Idea. You are armed with the gift of Knowledge. You know about Cambridge Analytica, about the Pinterest lawsuit, about Project Dragonfly, about Google and the Nothing, the $1 billion Rubik’s Cube, the Experian Data Breach, the Google AI controversy, the Whatsapp thing, the other Whatsapp thing, It’s Time to Build, and the 10k people in San Francisco.

You know where social media will lead, and you know about the dangers of storing user data and manipulating user data. You know about the controversies, the fines, and, now, the looming government regulatory oversight programs.

You don’t want any of that smoke whatsoever.

“It’s gonna be off the hook. We just spin up this Rails app and watch the users come in,” says Paul.

“You see the numbers Facebook is doing? We can do better,” says Preston, hyping him up.

Uneasily, you tug on your Vans cap. “I don’t think that’s such a great idea.”

Chad, Paul, and Preston turn to look at you, really seeing you for the first time. “What’s wrong, bro?” Paul reaches under the table for a stale, open can of Red Bull and mixes it with his Gatorade, taking a minute to sniff it the way a sommelier might before gulping it down.

“I just don’t think it’s going to go to a good place,” you say hesitantly, not sure where even to begin. Should you tell them about humans being Web Scale?

“Are you afraid of taking money,” Preston says. “Because my dad can easily float us $50 grand as a starting point.” He picks a stale Dorito off the table and throws it in his mouth. “I also know a great lawyer.

“No,” you say, taking a swig of the Red Bull that Paul passes you. It tastes like death. “It’s not about the money. I just don’t know about the societal impact of all this stuff.”

They look at you like you’ve now grown a third head. “Societal impact? What the hell are you on?”

“I mean, we should probably think about some of this stuff before we get going. How are we going to track the data? What are some of the moderation guidelines you put in place? How about the ethics of all of it?

Chad, Paul, and Preston give each other a glance that you can’t read. The moment turns into a lifetime. Chad turns to you. “If you want out, just say so. We’re on a rocket ship. We don’t have time for this.” This meaning you.

They frog march you out of the kitchen, out past the TV forever looping Breaking Bad reruns, and then, suddenly, open the screen door and throw you, clothes on, into the pool. They go back to the whiteboard laughing. “Let us know when you’re ready to commit, bro,” they say, shutting the sliding door against the late-night insects.

As soon as you hit the water, you’re jerked forward through a tunnel, back to the present day, and you’re back in front of your Zoom meeting once again. Everyone is silent, watching you in the tiny hell squares. “Sorry,” you murmur. “I was on mute. Where were we?”

I was so stupid, you think. I could have phrased it differently, I could have stopped them. If I get a chance to next time, I’ll go back and stop the creation of WhatsBook.

As you’re thinking about this, your phone rings, bringing you back to the present.

“Hey man, is it you?” It’s Preston. He’s a millionaire from WhatsBook. “I’m starting this new thing. I thought you might be interested.”

“What is it,” you say warily.

“It’s totally above board. It’s this AI thing. Only not like that,” he says, immediately hearing the hesitation on your side of the line. “There’s no facial recognition or anything. This is all above board.”

“No more music,” you laugh.

“Nah man, too much studio red tape. This is much easier. What we’re doing is creating a totally new holistic online experience. Everyone and everything is strictly online now. So we have a club that we’re running on Zoom. It’s very exclusive, and you can only get in if you pay. You pay for everything in the club with Bitcoin, like you can pay to listen to music on Spotify, and if you pay more than anyone else, your song plays. You can pay for private rooms, for drinks to be delivered to your apartment in real life, for everything. And everything is anonymous. That’s the crazy part about it. You could be partying with millionaires. We call it Studio 256.”

You don’t know much about Bitcoin, but you probably know enough. “This sounds like fraud,” you say, hesitantly.

“No way. It’s just a way for people to meet each other during the pandemic. And it’s a cool way for society to level people since you usually only interact with those in different social classes than you. Here, if you earn enough Bitcoin, you can participate. And you can earn Bitcoin the club, too, by playing popular songs, or by dancing, or by doing stuff that other people give you Bitcoin for. It’s all totally fun, all totally above board.”

He says above board too many times for your liking.

“And the data?”

“We don’t keep it ourselves. We have no idea who you are. That’s the beauty of it, man! No need to deal with user data!”

“I’m in, 100” you say, the relief palatable in your voice. You finally get the chance to contribute to something of value, something good, something pure. Just helping people to pass the pandemic.

Thirteen months later, you and Preston are on the front page of Hacker News for Studio 256 housing an alleged harasser who won’t stop playing “Never Gonna Give You Up” in the club at an obnoxiously high volume, over and over again, creating a hostile environment, someone who you can never get off the platform since all the users are anonymous.

Studio 256 immediately empties and you are left $100k short.

You sigh and start practicing your LeetCode so you can get back to the safety of corporate Zooms and zero liability.

****

What would you do if you had a second chance to recreate the internet? That’s the question I find myself asking all the time. Let’s say I had a ground floor shot at being at Facebook. What would I have done differently? It’s obvious that I wouldn’t make it about advertising.

But if there was no advertising, there would be no money for Facebook, which means it would never survive as a company. Which means all the good it’s done like connect people with rare diseases to share information, form parenting support groups, and connect small businesses with the community would be gone, too. It wouldn’t have connected me with some of my best friends, or allowed me to find after-school events for my daughter. (Back in the days when there were events.) Facebook is more than a misinformation machine, it’s also a marketplace, a place to share pictures. I can tell how vaccination is going in my area by checking Facebook.

We live, as always, in the age of the great cross-hatch. Social networks are largely this big, horrible thing, but they also have very real positives.

How can you tell, when you’re starting a company, which of these things the company will be? Will your notetaking company, Evernote, be a place to store all of humanity’s knowledge, or will it be awful arbiter of secrets great and small in case it accidentally opens everyone’s notes. Will your work communication platform, Slack, be a positive boost of productivity, allowing people to work remotely, or will it be this huge octopus, tentacles reaching through all your devices to make you work off the clock? Will

We don’t know until we file the articles of incorporation and set forth.

Sure, we can guess. We can set up guardrails. That doesn’t at all excuse thinking these things through as we work. “It’s hopeless, so we shouldn’t even try.” But, there is a compound tragedy in this wonderful, terrible last fifteen years in technology. The tragedy is this: we humans are terrible at predicting things. (Don’t tell the data scientists.)

We can asymptotically approach reality, to an extent. But, all models are terrible, George Box said. We don’t even fully understand numbers. And we for sure we don’t know for sure what the future will hold. And, what’s worse we are terrible at understanding how systems will interact. If we decide to track users, what will happen and what won’t? If we decide to track some users? We are, in a way, always trying to understand the beginning and end of this meme:

How do you get from Zuck trying to start a social network to pick up co-eds to the fall of democracy? And what happens five years from now? We don’t have big enough of a neural net to model the predicted possibilities.

And the second tragedy is that all companies must fight the market to survive. Companies start as babies, when the odds are entirely stacked against them, and, at the beginning, they don’t have the luxury of making decisions based on ethics or guidelines or frameworks. They are hungry and blind, and they must go after the money.

If you don’t believe me, read Shoe Dog or The Making of Prince of Persia or The Great Beanie Baby Bubble or Skunkworks or Super Pumped or any one of the foundational books about how companies grow and thrive. There are some that take the very hard path of pursuing ethics first, but at the cost of growth later. Almost no one, especially anyone working on money borrowed against them, can afford to do this.

This is not an excuse for big companies acting like marauders. But it’s just an observation that most companies don’t start this way that we want them to, with a balanced deliberation. They start in a desperate panic to survive, with a bunch of compromises that they forget, with unarchetected data, with “we’ll fix this later.” because if they laid this out at the beginning, they’d never get off the ground.

What do we do about this, these two fundamental laws of physics and capital? Knowing what we know now, how can we fix the internet going forward?

I’m not a prophet, just a rando with a newsletter.

But it’s 2021, you’re sitting at your desk, Preston is on the other end of the line. “It’s time to build,” he says again.

What will we say?

What I’m reading lately:

No meetings. This is very similar to how Automattic works.
Interview with Max Levchin. (Previously in Normcore on him)
From Russia with free Shipping
Artificial intelligence is a house divided (AI previously in Normcore)
The high price of mistrust
All about Wattpad
Ghost knowledge
Congress basically says YouTube needs to shut down its recsys (YouTube recsys previously in Normcore)
ML tools 2021
Michael Scott
The Overstory (it’s good! There are trees!)

January 27, 2021 Add Comment

This is a sample post

<blockquote class="twitter-tweet">I am excited to say I finally published "models for integrating data science teams within organizations" <a href="https://t.co/3dwVDvzWph">https://t.co/3dwVDvzWph</a>— Pardis Noorzad 🖨 (@djpardis) <a href="https://twitter.com/djpardis/status/1156595053998985223?ref_src=twsrc%5Etfw">July 31, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

I am excited to say I finally published "models for integrating data science teams within organizations" https://t.co/3dwVDvzWph
— Pardis Noorzad 🖨 (@djpardis) July 31, 2019

October 11, 2020 Add Comment

https://publish.twitter.com/oembed? url=https://twitter.com/Interior/status/463440424141459456

In one of my last newsletters, I linked to a Reddit thread, “How does data science work in the consulting space?” and said that if there was enough interest, I’d cover some aspects of data science consulting in the newsletter from time to time. This is the first of those pieces.

Last week, Pardis published a great piece on how data science teams can be organized in various companies:

<img src="https://cdn.substack.com/image/twitter_name/w_36/djpardis.jpg" class="tweet-user-avatar" />Pardis Noorzad @djpardis

I am excited to say I finally published "models for integrating data science teams within organizations"

July 31st 2019

49 Retweets253 Likes

She writes,

Designing and building a data science team is a complex problem; so is determining the nature of interactions between data scientists and the rest of the organization.

and covers several ways that data science work can take place in an organization. She ultimately comes to the conclusion that the best model is a hybrid model, where data scientists are part of product teams and report into a centralized data science function.

This debate around what a data scientist is and where data scientists belong has been going on ever since the job title was defined. It’s hard enough to figure out where and how data science belongs in-house.

But how about consulting? How do data scientists work in a consulting capacity? To understand that, it makes sense to first understand what consulting is.

Some years ago, I put together an insanely awesome data analysis. If I told you the SQL gymnastics I had to go through to get the data (window functions, cursors, multiple databases on different servers), you would give me a standing ovation.

I then made the most beautiful Powerpoint you’ve ever seen in your life. Bubble charts, sparklines, a master slide deck theme, the works.

I still think about that presentation some days.

I handed it to my boss gingerly, like a newborn baby. “I made this per your request,” I said with bated breath. He skimmed it. “This looks great! Want to present it at our exec meeting?”

Me? Present? To executives? The people that wear ties and Bluetooth earpieces and say things like, “Sharon, hold my 2 o’clock, I have a meeting at 1:30 and I’m coming in hot,” unironically? He might as well have asked me to present to the UN Security Council.

“I’ll help you tweak it a bit,” my manager said, encouragingly. “Ok,” I said. “First, take out all the slides where you talk about how you did the analysis. Put those in the back. Move the charts forward. Delete these numbers. Make the headlines bigger.”

I gulped. But that was all my work. What were we doing to it? We were rewriting everything! I wouldn’t even get a chance to talk about my confidence intervals! My churn visualizations! My beautiful window queries!

When I got to the executive meeting, I was a nervous wreck. All of the Bluetooth earpieces turned in my direction. “Ok, so I’m going to run through this analysis,” I said. “This is the first slide,” and I proceeded to describe everything on the slide, word for word. I saw their attention waning. I went through the second set of figures, laid out the percentages, and I started seeing people looking at their phones.

By the end of the presentation, I could tell that I’d lost them.

“Don’t get discouraged,” my boss said. But I understood something intuitively then that I wasn’t able to put into words until much later:

Half of your job, regardless of what that job is, is being able to sell your work.

This principle also forms the basis of consulting. Consulting, at its core, is about two things:

1) Being an expert at something and

2) Convincing other people that you know something well enough that they need to pay you for their expertise, in either money or time.

Last week, I tweeted:

<img src="https://cdn.substack.com/image/twitter_name/w_36/vboykis.jpg" class="tweet-user-avatar" />Vicki Boykis @vboykis

Ever since I saw the low-key classy way Joel did this (calendar invite + hourly rate), I’ve been sending people who want to "pick my brain for a bit" an invite to my Calendly, also with an hourly consulting fee. So far no one’s taken me up on it. Your time is worth $. 💪Z

Joel 🌧 @jhooks

I set up a calendly link for an hour consultation. Didn’t expect anybody to actually use it tbh. Somebody did! Looking forward digging i to their business and offering my thoughts. https://t.co/96rysZqJDL

July 29th 2019

16 Retweets129 Likes

which was an extension of this principle.

It’s something I didn’t understand at all until I joined a consulting company, but the ability to convince someone of something is probably the strongest superpower you can ever have in your career.

In an in-house data science position like the ones discussed in the piece, the ability to being convince people that what you do is valuable is mostly separated from the actual money of the business, unless you work very hard at it.

For example, let’s say you create a model that predicts when your customers are unhappy with your SaaS application, and have customer service call them and offer them coupons. They aren’t so unhappy that they’ve cancelled yet, but there’s the potential to cancel. How do you calculate how much money you’ve saved the company? How does that figure into a company’s bottom line for the year?

Or, let’s say you build a machine learning model that predicts code autocompletion and saves developers two seconds per command when they’re writing their code. How do you calculate that time savings in a way that makes sense for managers?

Tracing money within a company can be really hard, which is why, usually, data science in companies is rewarded instead with attention. A good data science (or engineering, really) team will get more:

Seats in important meetings
Time to present specific analyses
Headcount
Shout-outs in emails from higher-ups
More complex analyses entrusted to you
Promotions and bonuses
Budget for software and hardware

If you do a good job as an in-house data scientist, and convince people that you know what you’re doing, that usually manifests itself in additional attention for you.

How do you know you’re a bad data scientist? You never get to present to executives. Your analyses go unused. People don’t send you emails with questions about your numbers. A good in-house data scientist does a good job by putting together good data, and, just as importantly, commanding attention to that data.

The same goes for consulting, but with an additional extra step. As an external party, you’re not shielded by the largess of the company’s budget. A company can’t survive on attention and emails alone – it needs revenue.

As a consultant, you have to constantly be selling actual work – otherwise your consultancy dies. An army marches on its stomach. Any given consultancy, whether it’s one person as an LLC, or a company with thousands of employees, marches on sales. If you’re not selling contracts, your company won’t exist, because you don’t have work.

One of the reasons I decided to go into consulting was that I wanted to see how the money moved, because in a company, it’s not as visible.

In consulting, you can very clearly see how budget gets allocated to projects, moves through the company, and how decisions are made based on that money.

And the way the money moves from companies to consultancies is consultants convincing companies that they have the expertise to fix problems, that they know something, more than them, that they’re willing to pay for.

But what happens once you actually sell some work? I’ll cover that in a future newsletter.

Art: Portrait of a Businessman K. Artsybushev, Mikhail Vrubel 1896

What I’m reading lately:

Books I’ve finished recently:
- I Heart Logs by Jay Kreps – Does this count as a book? It was literally 30 pages.
- Dad is Fat by Jim Gaffigan – Awesome maternity leave read. Hilarious
On how to tackle loneliness
From 2012, how writing a novel is like having a child
Alex Stamos, formerly of Facebook, is on the move:

<img src="https://cdn.substack.com/image/twitter_name/w_36/alexstamos.jpg" class="tweet-user-avatar" />Alex Stamos @alexstamos

Our team at Stanford is looking for data engineers and analysts to power research into disinformation, harassment, bullying, self-harm and other misuses of the internet. Give us a couple of years and I guarantee you will be proud of your impact. July 31st 2019

210 Retweets462 Likes

About the Author and Newsletter

I’m a data scientist in Philadelphia. This newsletter is about tech and everything around tech. Most of my free time is spent wrangling a preschooler and a newborn, reading, and writing bad tweets. I also have longer opinions on things. Find out more here or follow me on Twitter.

If you like this newsletter, forward it to friends!

October 11, 2020 Add Comment

Art: A Walk at Sunset, Victor Borisov Musatov, 1903

For the past three months, I’ve been doing a lot of walking. Usually, me and the two kids will take a morning walk while my husband works, and then, we’ll go for another walk in the afternoon all together, if it’s not too hot.

We’ve been through our neighborhood countless times, and, when we got tired of that, we went to local parks that weren’t crowded. We’ve walked on gravel pathways, through streams, and alongisde gullies filled with ferns. We’ve walked in the heat, in the early morning, the late afternoon, and, once, through a summer rainstorm.

During these walks, I’ve told my daughter hundreds of stories. Some of them are real stories – Harry Potter, Lord of the Rings, Matilda. Some are made-up stories about a girl called Ilana who doesn’t want to do anything and who usually learns a lesson.

We must have done over 50 miles over these past 3 months.

But you wouldn’t notice this from my pedometer app, because my pedometer app is on my iPhone, and the iPhone’s gyroscope, accelerometer, and GPS don’t track steps when you put it in a stroller. This isn’t the only device that’s had problems tracking stroller steps: so has FitBit and the Apple Watch. The main problem here is the same as I’ve written about before,

It’s typically women who have always been the world’s invisible architects, supporting the infrastructure to care for small humans so the rest of the world’s work can get done, standing in the background, doing the day-to-day work of feeding them, washing their clothes, and wiping the last of the organic avocado-pear off their chins while the men built skyscrapers and got all the praise. And it’s always women who are expected to step back by society.
…
The infrastructure is there, but it’s always invisible to the outside world.

And so it is with my steps: the hours I spend walking with the children, talking to them, guiding them to play in the creek, packing their creek shoes and then unpacking them again, preparing lunches, and doing the gruntwork of parenting are all glaringly absent from anything that tracks and analyzes activity.

Vicki Boykis @vboykis

Terraform, but for getting two kids ready for a walk by and in the creek.

I couldn’t track all of this activity on my phone, but I did notice something interesting outside of my screen during our neighborhood walks. The first month, we walked in empty streets. We would, on occasion, come upon either another family, also dragging two kids, a scooter, and a kite, or a lone person walking their dog. The streets of our town were empty. The shops were shuttered up, people would walk by, skittish, keeping their distance, passing us like ships in the night.

Over the past few weeks, since Pennsylvania went from “red” to “yellow” (essential businesses open, restaurants for outdoor dining), people have started emerging. At first, it was lone pedestrians in face masks. Then, families. Now, it looks like groups of friends. The restaurants are wide open. In my daily walk the past couple days, I’ve seen several social distancing picnicks in backyards and on front lawns, and neighborhood kids running together. Our local brunch place’s outdoor tables are packed.

We are, whether we like it or not, returning to “normal” life.

And our small family, our insular pear, is emerging, too.

Our daycare opened last week, and since my daughter is already exhibiting the signs of stress that most kids under lockdown are, we’ve with great hesitation, reluctance, fear, and not a small amount of relief, decided to send her.

Once our daughter is in daycare, it means our nanny returns which means our nanny comes back to be with the baby, and our circle, which, for the past three months, was just the four of us, is now infinitely larger, infinitely scarier, and almost impossible to control.

And another thing: I’m starting a new job tomorrow (more on that later!) which makes all of this even scarier as I try to figure out the rhythm of my work – remote, distributed, and asychronous, but still – with the new rhythm of life under COVID.

For example, what happens if our daughter has the sniffles? Do we keep her at home for a week? More? What happens if someone in our nanny’s family is sick? Does ordering groceries online matter anymore in this case? Can we still see people outside?

I’m still the CEO, but I still don’t know anything, except that, like my pedometer, all of the metrics of the normal world have failed me under Covid and I’m flying a little blind, but the unkown is here, so into it I go.

What I’m reading lately:

What happens to all the unhugged hugs?
What’s less expensive purchase that’s made your life better?
Karen K. Ho @karenkho
what is an item under $100 you purchased that made your life noticeably better? current contenders include a white noise machine, acupressure mat, and a neck and shoulder massager thingy
June 14th 2020
165 Retweets2,619 Likes
That feeling when there’s a lot of ML infra tools
The cell phone is probably the worst thing that’s happened to us, but also the best
Joanna Stern @JoannaStern
2009 – The killing of Oscar Grant is recorded in 240p with a flip phone. 2020 – The killing of George Floyd is recorded in 1080p with an iPhone 11. My story on how the smartphone became ‘a weapon that tells the story.’ They Used Smartphone Cameras to Record Police Brutality—and Change HistoryVideo-camera technology on our phones got better. In the process, it made eyewitnesses of us all.on.wsj.com
June 13th 2020
523 Retweets1,467 Likes
Private equity in medicine
Isaac Arnsdorf @iarnsdorf
When Blackstone-owned TeamHealth charges higher prices for ER care, who benefits — doctors or investors? Little-noticed court records offer a rare glimpse inside the country’s largest ER staffing firm. How Rich Investors, Not Doctors, Profit From Marking Up ER BillsTeamHealth, a medical staffing firm owned by private-equity giant Blackstone, charges multiples more than the cost of ER care. All the money left over after covering costs goes to the company, not the doctors who treated the patients.propublica.org
June 12th 2020
117 Retweets165 Likes
IT’S TIME TO READ THIS TWEET
Sriram Krishnan @sriramk
🚨👋 Thrilled to announce a passion project I’ve been working on for a while. Here’s “The Observer Effect” with our very first interview w/ the one and only Marc Andreessen. theobservereffect.org/marc.html The Observer Effect – Marc AndreessenMarc Andreessen on how he spends his time, learns, the ‘build’ essay and much more.theobservereffect.org
June 13th 2020
416 Retweets3,497 Likes

The Newsletter:

This newsletter’s M.O. is takes on tech news that are rooted in humanism, nuance, context, rationality, and a little fun. It goes out once a week to free subscribers, and once more to paid subscribers. If you like it, forward it to friends and tell them to subscribe!

Swag: Stickers. Mug. Notepad.

The Author:
I’m a data scientist. Most of my free time is spent wrangling a preschooler and a baby, reading, and writing bad tweets. Find out more here or follow me on Twitter.

October 11, 2020 Add Comment

Art: Chaos nr. 2, Hilma af Klint, 1096

Over the weekend, I was catching up on the piece in New York Magazine by model and actress Emily Ratajkowski, about how much it cost for her to buy her own image back. It’s a very good, very long piece with a lot of room for reflection, but what struck me most is that I would have never assumed she was going through something like this.

In the headlines and her photos, she is always glamorous and in-control. In the piece, she is frustrated, angry, and, mostly, defeated. She writes,

I thought about something that had happened a couple of years prior, when I was 22. I’d been lying next to a pool under the white Los Angeles sun when a friend sent me a link to a website called 4chan. Private photos of me — along with those of hundreds of other women hacked in an iCloud phishing scam — were expected to leak onto the internet. A post on 4chan had compiled a list of actresses and models whose nudes would be published, and my name was on it. The pool’s surface sparkled in the sunlight, nearly blinding me as I squinted to scroll through the list of ten, 20, 50 women’s names until I landed on mine. There it was, in plain text, the way I’d seen it listed before on class roll calls: so simple, like it meant nothing.

It’s so surreal to think of someone as removed from mere mortals as Emily doing something as ordinary as checking her iPhone and panicking. After reading this piece, for me, she has turned from a glossy magazine cover, into a living, breathing human being.

Abstract You, Abstract Me

What I had in my head was an abstraction of Emily as portrayed to me by the media. Humans have been creating abstractions for hundreds of thousands of years, because there is simply no way to hold everything in our brain at once. A single human can’t know all of the inner lives of every single celebrity, how a car works from the engine up, what makes the weather happen, how pandemics work, how Nutella is made from scratch, and how the tax code works, all at once. There’s a reason we have generalists and specialists, and that generalists only know a little bit of everything:as an individual, it’s impossible to go deep on the entire universe.

One of the most basic examples of abstraction of this is the invention of human writing. Writing is hard, because we think in many dimensions. Our mind connects different parts of different concepts, fragments of thoughts, feelings, colors, and smells, and writing is a one-dimensional medium, where we need to construct a reasoned argument or narrative.

Vicki Boykis @vboykis

Why is it always so hard to get thoughts down on paper exactly the way they are in your mind?

Here is another fantastic example of an abstraction, one that a lot of working moms perform all the time:

Gretchen Goldman, PhD @GretchenTG

Just so I’m being honest. #SciMomJourneys

This is the truest depiction of being a working parent and having small children that I have ever seen. In my house, the floor is always covered with toys, I never have time to do anything more than brush my teeth, and my professional life exists in bursts between doctors’ visits, covid quarantines, sleepless nights, and endless, endless task of washing the dishes and doing the laundry, every single day. But thanks to the wonder of technology, when I’m chatting on Slack, or submitting pull requests, the messiness of my life is abstracted into the background. The digital realm is a room of my own. My work comes into focus.

Abstractions also exist in software, where an abstraction is any piece of written code that hides the complexity of the underlying code so that you can use it more easily and not get tripped up in the details. A good software abstraction, like a map, doesn’t tell you every single thing about the land, but it gives you generally a good enough idea to get by.

Abstractions are neither bad nor good in and of themselves. They’re just ways we humans make sense of the immensely complex worlds around us. But you have to understand the tradeoffs of what you’re abstracting away.

It’s here that I think we’re in a pretty dangerous spot these days, because there are some abstractions that we deal with, without really thinking about the fact that they’re abstractions, at all. I think something the online world does, often, not only with celebrities, but with all of us, is flatten us into abstractions.

Flattening Facebook and Twitter

There are a couple of recent examples I can think of. The first is the Buzzfeed story about the data scientist at Facebook who wrote an in-depth memo about monitoring politically-influenced bot activity. She was a member of Facebook’s Site Integrity Team who had been fired, ostensibly after writing about all the issues she’d dealt with and how heavily it weighed on her. Instantly, the narrative about her became that she was a whistleblower, yet another one of the heroic voices raising concerns against Facebook. And then, after her memo was leaked, her name vanished into the ether.

The story got major clicks for Buzzfeed and credit to the writers. But I wanted to know so much more. What did she actually do for Facebook? The story said she was a data scientist, but that can mean many things across an organization as large as Facebook. The story said that she was in charge of deleting content, but then said that she only reported it. What was the true story? The story then said,

[she] said she turned down a $64,000 severance package from the company to avoid signing a nondisparagement agreement. Doing so allowed her to speak out internally, and she used that freedom to reckon with the power that she had to police political speech.

This also doesn’t make any sense, since severance packages are usually signed upon being fired or leaving the company and have nothing to do with speaking on internal message boards.

There were a lot of loose ends that just didn’t add up for me. I wanted to know more – about why her org structure didn’t care about the reports, exactly what kind of reports they were, what else she worked on, what the internal politics of her organization looked like and how those decisions got made, but the story marched on, very eager to get to the part of the memo where the data scientist said she felt like she “had blood on her hands,” ultimately painting her in a single, flat light, a spark of sensation, once and gone.

What she illuminated was, to me, a very hierarchical organization focused on PR perception at all costs. This is important, because it means that PR is absolutely a lever to get Facebook to make specific decisions. How can we use this information to better understand how to influence Facebook? The article doesn’t get into that.

I haven’t seen her name in the news for a few weeks now, but she’s out there, a real human person who now has to somehow get another job in the tech industry with her name (involuntarily) attached to this big, huge controversy, used only for clickbait, once and gone.

The second example is the very recent controversy over Twitter’s photo cropping algorithm. Over the weekend, Twitter blew up: In cropping the picture of the backgrounds for a tweet, it turned out that a man’s Black coworker was cut out, leading many to conclude that Twitter’s photo cropping algorithm was racist.

Someone else was able to replicate these results in a juxtaposed image of Barack Obama/Mitch McConnell, and, immediately, the internet was off to the races trying to figure out who to blame at Twitter. There were lots of very angry threads and very little context. People, frustrated and incensed by the results of the crops, tested out cropping on all kinds of posts to see what the algorithm would select. A lot of them were serious. Some were funny. None looked good.

But then, an interesting thing happened: the creators of the algorithm weighed in. They first provided some context by linking to the original post where they announced the algorithm, and then talked about how they did it.

Then, the developers and data scientists who worked on the algorithm, as well as Twitter’s chief design officer, responded on Twitter:

Dantley 🔥✊🏾💙 @dantley

@colinmadland @Twitter Based on some experiments I tried, I think @colinmadland‘s facial hair is affecting the model because of the contrast with his skin. I removed his facial hair and the Black man shows in the preview for me. Our team did test for racial bias before shipping the model.

Zehan Wang @ZehanWang

We’ll look into this. The algorithm does not do face detection at all (it actually replaced a previous algorithm which did). We conducted some bias studies before release back in 2017. At the time we found that there was no significant bias between ethnicities (or genders).

Dantley 🔥✊🏾💙 @dantley

@colinmadland @Twitter Based on some experiments I tried, I think @colinmadland’s facial hair is affecting the model because of the contrast with his skin. I removed his facial hair and the Black man shows in the preview for me. Our team did test for racial bias before shipping the model. https://t.co/Gk33NQlGgB

Zehan Wang @ZehanWang

This is actually similar to what we originally did to test it Thanks to @vinayprabhu for very quickly doing a more systematic analysis and also sharing code and results!

Vinay Prabhu @vinayprabhu

(Results update) White-to-Black ratio: 40:52 (92 images) Code used: https://t.co/qkd9WpTxbK Final annotation: https://t.co/OviLl80Eye (I’ve created @cropping_bias to run the complete the experiment. Waiting for @Twitter to approve Dev credentials) https://t.co/qN0APvUY5f

And, one of the original researchers also responded:

Ferenc Huszár🇪🇺 @fhuszar

Over the weekend reports of racial/gender bias in Twitter’s AI-based image cropping have started blowing up. I wanted to add some context from my perspective as an ex-employee and as a contributor to the research the product is based on.

And then, finally, Twitter comms weighed in,

liz kelley @lizkelley

thanks to everyone who raised this. we tested for bias before shipping the model and didn’t find evidence of racial or gender bias in our testing, but it’s clear that we’ve got more analysis to do. we’ll open source our work so others can review and replicate.

Tony “Abolish (Pol)ICE” Arcieri 🦀 @bascule

Trying a horrible experiment… Which will the Twitter algorithm pick: Mitch McConnell or Barack Obama? https://t.co/bR1GRyCkia

If you had only been looking at a couple tweets, which was entirely possible because they dominated the conversation, it was easy to conclude that Twitter had implemented an algorithm that was ignorantly biased and which it had no intent to fix.

But if you were (somehow, miraculously) able to link all of the tweets together, what came out was this:

Twitter implemented an algorithm to do automatic cropping in 2018.
It replaced a previous algorithm that actually looked at faces but had not been successful
The new algorithm used saliency:
- A region having high saliency means that a person is likely to look at it when freely viewing the image. Academics have studied and measured saliency by using eye trackers, which record the pixels people fixated with their eyes.
Saliency is used across the industry, including at Apple and other companies and it’s not necessarily a great way to go.
It was actually tested for bias, but unfortunately, it looks like a number of things slipped through the cracks, which the researchers acknowledged and said they would work to address.
Both the original researchers and the CDO of Twitter weighed in multiple times in the conversation, confirming what they did originally and what they would do now to re-examine the algorithm.

It took about 48 hours for the cycle to go from outrage, to people manually trying out the algorithm, to people performing their own experiments, to full-on explanation, to the former researcher getting involved, to comms finally closing the loop, as much as it can be closed at this point.

In that time, the context was abstracted to a single idea – Twitter was maliciously ignorant. This was unsurprising to me, but disappointing: Once the whole story came out, it was obvious that in theory, they’d done everything right (at least, as far as the external conversation indicated. As with the Facebook story, it’s impossible to read into exactly what happened internally inside the black box) – they had vetted the algorithm, checked for racial bias, and, when the controversy arose, engaged consistently on Twitter with lots of very angry people who didn’t always come to the reply box with the best of intentions.

The Twitter employees turned from individuals into platforms – conduits for all the rage against all of these social media networks, against all the massively messed-up things they’ve been doing since their foundational days. They were abstracted away – both the responsible parties and the platform as a whole.

The larger story here, of course, is two-fold. First, it’s clear that, in this case, the right thing to do here is to test the algorithm even more rigorously and show some follow-up results in public.

The even more right thing to do, as the CDO says in one of his tweets, would be to revert to manual cropping. But the bigger story is that, of course, Twitter can’t say and can’t promise to switch to manual cropping, because probably manual cropping will result in a drop-off in the engagement they so crave, including engagement that was brought about as a result of this controversy.

So ultimately, what’s the bigger harm here – is it a single algorithm (it could be!), or is the entire structure of the ad-driven revenue model that will always push towards less visibility across all the dimensions of an issue, less consideration of people as individuals, and a higher volume of engagement rather than letting users exercise creative control and allow for nuance in conversations?

We are all humans, online and off

The internet has always made us flat abstractions: text bubbles, DMs, Slack chats, blog posts, Buzzfeed articles, without any context around what we are, who we are, what we believe. We are all large, we all contain multitudes, but the more I live and work online, the more I realize that, what we gain in being able to communicate across time and space, en masse, the more we lose in context, in gesture, in understanding a single person in the sea of humanity. It’s this, combined with the global scope of outrage, that’s a dangerous form of abstraction today.

What I’m reading:

The Newsletter:

This newsletter’s M.O. is takes on tech news that are rooted in humanism, nuance, context, rationality, and a little fun. It goes out once or twice a week. If you like it, forward it to friends and tell them to subscribe!

Swag: Stickers. Mug. Notepad.

The Author:

I’m a machine learning engineer. Most of my free time is spent wrangling a kindergartner and a toddler, reading, and writing bad tweets. Find out more here or follow me on Twitter.

September 29, 2020 Add Comment

The Phoenix Project

title: ]2
date:
A book I read a while ago was The Phoenix Project, which is the
canonical book on DevOps (the fancy IT name for the process of
tricking your developers into doing both software development and server
admin work.)

::: {.tweet attrs=”{“url”:”https://twitter.com/vboykis/status/1009402512489762818″,”full_text”:”Well, I’ve finished The Phoenix Project and have a ton of thoughts that I’ll write about, but first I want to suggest the authors work on a follow-up book that’s just Twilight but with developers versus ops people based mostly on the stereotypes in this passage. “,”username”:”vboykis”,”name”:”Vicki Boykis”,”date”:”Wed Jun 20 11:48:07 +0000 2018″,”photos”:[{“img_url”:”https://pbs.substack.com/media/DgIeOwRX0AIb_6S.jpg”,”link_url”:”https://t.co/sFQ1UYXGfo”}],”quoted_tweet”:{},”retweet_count”:0,”like_count”:12,”expanded_url“:{}}”}

::: {.tweet-header}
[Vicki Boykis ]{.tweet-author-name}[@vboykis]{.tweet-author}
:::

Well, I’ve finished The Phoenix Project and have a ton of thoughts that
I’ll write about, but first I want to suggest the authors work on a
follow-up book that’s just Twilight but with developers versus ops
people based mostly on the stereotypes in this passage. ![][5]

::: {.tweet-footer}
June 20th 2018

[12{.like-count} Likes]{.likes}
:::
:::

I wrote this review a while ago, but didn’t have a good place to publish
it. But the newsletter is the perfect place. So here it is!

According to a blurb on the back cover from Tim O’Reilly, “every person
involved in failed IT project should be forced to read [The Phoenix
Project].”

Hopefully this review will convince you otherwise.

In a nutshell, The Phoenix Project is written as a case study of DevOps,
for an audience of IT managers who are no longer technical. It covers
the case of “Parts Unlimited”, a creatively-named struggling auto parts
retailer with an IT department that can’t get its deal together enough
to make payroll.

The book is meant to make the reader appreciate how much agile practices
such as kanban, and closely integrating operations and development work,
can speed up a company’s workflow. Instead, what it really does is help
you understand why your boss is always stressed out and why American
work culture is awful.

We meet the main character of the book, Bill Palmer, coming out of the
doctor’s office, where he spent the morning with his sick toddler,
“trying to keep the other toddlers from coughing on us, constantly
being interrupted by my vibrating phone.”

Why is his doctor’s waiting room not separated into parts for kids that
are sick and kids getting check-ups? Why not wait the half hour until
after the appointment is over to take a call? These questions remain
unanswered, but even in the first page, I’m already stressed out for
Bill and everyone that was at the doctor’s office with him.

By way of introduction, he says he works “in the technology
backwaters” at Parts Unlimited, and it’s not entirely clear whether he
means that the company is small, that that the tech stack is bad, or
that he considers IT to be a backwater with a wink and nod to the
C-suite target audience reading the book. We never actually learn what
the company’s tech stack is, probably because the authors are trying
not to focus the reader on the technology, but rather the business
problem, but my deep dark suspicion is that PartsUnlimited.com is
probably written in PHP, or maybe even Java applets.

We find out within the first couple pages that Steve Masters, the CEO,
has picked Bill to replace the VP of IT Operations, who he fired. Bill
is not pleased. He says,

I’ve figured out that the trick to a long career in IT Operations
management is to get enough seniority to get good things done but to
keep your head low enough to avoid the political battles that make you
inherently vulnerable. I have absolutely no interest in becoming one
of the VPs who just give each other PowerPoints all day long.

Bill spends the rest of the book making PowerPoints to make his case for
IT modernization.

As his first job in the new role, he comes into an emergency. The
emergency is that payroll is not working because of some glitch in The
System, and if it’s not fixed by 5 PM that day, employees won’t get
paid. Bill reflects,

Suddenly, I realize that my family’s mortgage payment is due in four
days, and we could be one of the families affected. A late mortgage
payment could screw up our credit rating even more, which we spent
years repairing after we put Paige’s student loans on my credit card.

Do you have ulcers yet?

Bill gets together with Wes and Patty, two senior managers in the IT
Operations organization, to try and fix the situation, but he finds them
in the middle of a Sev 1 incident related to SAN data.

After much, much arguing and troubleshooting, they finally get the issue
resolved, with the help of Brent, a developer who is so valuable that
hes always on several projects at once, and is always working until
very, very late, because people like Bill, Wes, and Patty constantly
keep coming by to ask him things on their way to PowerPoint meetings.

So far, we have:

a stressed executive who needs the job because he can’t pay for his
mortgage
a CEO who loves firing people like it’s nobody’s business
two middle managers who don’t know what’s going on, and
a star developer who is constantly interrupted in his work and has
to stay late as a reward for his competence.

We didn’t even get to the part yet where we meet Bill’s spiritual and
career advisor (yes), who takes him to a physical factory and extols the
virtues of running a software department like an assembly line.

The book only unravels from there. After Bill manages to solve the
payroll crisis, even more pressing projects are slapped on his plate. He
is constantly undermined by the marketing department, deals with failed
website launches, and spends nights and weekends on never-ending
burn-out work related to departmental politics.

Can’t wait to be a senior manager!

Ultimately (huge spoiler), he overcomes all these problems and gets a
hefty promotion once he Implements DevOps and has everyone working in a
factory line like they’re orphans in a Dickens novel. So everyone wins
(but mostly Bill and the CEO.)

The book is meant to be a warning against seat-of-your-pants management,
and an argument for closer coupling between operations and development
in an IT department. Mostly, though, I ended up feeling sorry for how
Bill got into a situation he couldn’t back out of. Sometime midway
through the book, he writes, “Even though I can’t take the entire day
off, I take Paige out for breakfast.” As he thinks about how stressed
he is, he writes,

First and foremost, my most important responsibility is to be the
provider for my family. My pay raise will help us get our debt paid
down, and we can start saving money again for our children’s college
education like we always wanted to….with my promotion, we can pay
off our second mortgage sooner.

There is nothing more depressing than this part of the book, and, to me,
this was the crux of it. The real reason Bill is dealing with the stress
of management is not because he wants to and enjoys tackling tactical
problems, but because he’s forced to – he has no other financial
choice.

And here is the heart of the matter – many times, we can try to improve
our work environment. But in an IT market that’s booming, for both
developers and managers alike, it seems like insanity for Bill to stay
where he is, and not seek out a role at a smaller company where he’ll
be more appreciated and can set up DevOps to his heart’s content.

In Antifragile, which is a book, unlike the Phoenix Project, that I
truly believe everyone should read, Taleb argues that the only way we
can truly survive an “unpredictable capitalistic climate” is to be
anti-fragile, or flexible, in the face of negative stresses.

The example he gives in the book is that of a college professor versus a
taxi driver. The taxi driver is more anti-fragile because he always has
to be on the lookout for his next job, and as a result has to hustle, to
understand where the market is going – does he need to try Lyft or Uber?
Should he learn different skills and quit being a taxi driver
alltogether? Whereas the college professor never has to worry about
this, and therefore stagnates, meaning that, if somehow, he manages to
lose tenure, he’s completely out of luck in the market economy.
Obviously this is a very broad generalization (because I know many
people from academia who have moved to industry when they were stuck in
their careers), but the idea stuck with me.

The same is the case for developers and managers. Each should always be
growing, networking, interviewing, and keeping their resume fresh. A
rolling stone gathers no moss. But, because Bill has been at Parts
Unlimited for so long, or maybe because he lives in a geographical area
with few alternative opportunities, or for whatever other reason, he’s
now stuck in this ulcer-inducing job, paying off a second mortgage he
hates.

How many people live and work like this? Probably enough that the
authors of Phoenix felt it was important to have a character that
reflects reality in this way. But instead of writing a book pushing
DevOps, why not talk about how people can become more antifragile and
reduce stress at work, regardless of the operating environment there?

Just as importantly, why not talk about the increasing amount of debt
Americans are forced to accumulate through systemic problems: flatlining
wages, increasing student loan costs (I genuinely believe this problem
will be strongly related to the next market bubble), rising healthcare
costs, and economic agglomeration and regulation that forces Americans
work in high-cost markets like San Francisco and New York, for
diminishing returns?

(Good thread on this here, by the way: )

::: {.tweet attrs=”{“url”:”https://twitter.com/patrickc/status/1148428751732006913″,”full_text”:”A very good book about why houses are expensive.nn”,”username”:”patrickc”,”name”:”Patrick Collison”,”date”:”Tue Jul 09 03:08:45 +0000 2019″,”photos”:[],”quoted_tweet”:{},”retweet_count”:65,”like_count”:725,”expanded_url”:{“url”:”https://www.amazon.com/Zoning-Rules-Economics-Land-Regulation/dp/155844288X”,”image”:”https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/fb1abde3-95e6-4c1b-904d-b0d491fd7fc5_1x1.gif”,”title”:”Zoning Rules!: The Economics of Land Use Regulation: William A. Fischel: 9781558442887: Amazon.com: Books”,”description”:”Zoning Rules!: The Economics of Land Use Regulation [William A. Fischel] on Amazon.com. FREE shipping on qualifying offers. This best-selling book describes how zoning has been overused by local communities to block new housing development in ways that exacerbate sprawl and social inequity. It lay…”,”domain”:”amazon.com”}}”}

::: {.tweet-header}
[Patrick Collison ]{.tweet-author-name}[@patrickc]{.tweet-author}
:::

A very good book about why houses are expensive. [Zoning Rules!:
The Economics of Land Use Regulation: William A. Fischel: 9781558442887:
Amazon.com: Books]{.expanded-link-title}[Zoning Rules!: The Economics of
Land Use Regulation [William A. Fischel] on Amazon.com. *FREE*
shipping on qualifying offers. This best-selling book describes how
zoning has been overused by local communities to block new housing
development in ways that exacerbate sprawl and social inequity. It
lay…]{.expanded-link-description}[amazon.com]{.expanded-link-domain}

::: {.tweet-footer}
July 9th 2019

[[65]{.rt-count} Retweets]{.retweets}[[725]{.like-count} Likes]{.likes}
:::
:::

Why not discuss the growing FIRE movement? Or some of the efforts to
work around the traditional education system? More importantly, why not
discuss whether, in a theoretically DevOps environment, we need middle
managers like Bill at all? (The book/essay Bullshit Jobs does just
that).

The reason, as usual, is that digging around for root causes that are
deeper than the root causes you’re focused on is hard. Humans are very
much short-term view, small-picture animals, and looking at the larger
picture is hard and inconvenient, particularly if you’re trying to sell
a book about DevOps.

But you know what else is hard and inconvenient? Reading Phoenix
Project. And being Bill.

Art: Bird Phoenix, Nina Tokhtaman Valetova 2011

What I’m reading lately:

I just finished How to Do Nothing and I didn’t like it …but
maybe you will
What don’t people get about going to Russia is that it will most
likely kill you.
Erik is on-target, as usual:
::: {.tweet attrs=”{“url”:”https://twitter.com/fulhack/status/1149044954922201094?s=12″,”full_text”:”Management consultants are underrated by engineers. Especially data scientists can learn a ton about the craft of making a great presentation or answering a super open ended business question (like “what should our refund policy be” or “what market should we launch in next”)”,”username”:”fulhack”,”name”:”Erik Bernhardsson”,”date”:”Wed Jul 10 19:57:20 +0000 2019″,”photos”:[],”quoted_tweet”:{},”retweet_count”:9,”like_count”:94,”expanded_url”:{}}”}

::: {.tweet-header}
[Erik Bernhardsson
]{.tweet-author-name}[@fulhack]{.tweet-author}
:::

Management consultants are underrated by engineers. Especially data
scientists can learn a ton about the craft of making a great
presentation or answering a super open ended business question (like
“what should our refund policy be” or “what market should we launch
in next”)
::: {.tweet-footer}
July 10th 2019

[9{.rt-count} Retweets]{.retweets}[[94]{.like-count}
Likes]{.likes}
:::
:::
I’ve only read a little of this piece so far, but it’s beautiful
::: {.tweet attrs=”{“url”:”https://twitter.com/errrica/status/1149125465799573504″,”full_text”:”I’m late to reading this piece today even though I read the rest of this great project earlier. Such a lovely piece by @JasonNark about the river that defines Philadelphia. “,”username”:”errrica”,”name”:”Erica Palan”,”date”:”Thu Jul 11 01:17:15 +0000 2019″,”photos”:[],”quoted_tweet”:{},”retweet_count”:1,”like_count”:10,”expanded_url”:{“url”:”https://www.inquirer.com/science/inq/delaware-river-philadelphia-pennsylvania-20190710.html”,”image”:”https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/cbb34424-e585-4e23-92d7-7b43510abab8_1200x800.jpeg”,”title”:”The Delaware River: The river that made Philadelphia”,”description”:”The Delaware River was born before words, flowing namelessly through an unmapped world without factories or fishermen, to a sea no ship ever sailed upon. Over the next year, Inquirer journalists will explore the river and its watershed, focusing on its challenges and its promise.”,”domain”:”inquirer.com”}}”}

::: {.tweet-header}
[Erica Palan ]{.tweet-author-name}[@errrica]{.tweet-author}
:::

I’m late to reading this piece today even though I read the rest of
this great project earlier. Such a lovely piece by
[@JasonNark]{.tweet-fake-link} about the river that defines
Philadelphia. [The Delaware River: The river that made
Philadelphia]{.expanded-link-title}[The Delaware River was born
before words, flowing namelessly through an unmapped world without
factories or fishermen, to a sea no ship ever sailed upon. Over the
next year, Inquirer journalists will explore the river and its
watershed, focusing on its challenges and its
promise.]{.expanded-link-description}[inquirer.com]{.expanded-link-domain}
::: {.tweet-footer}
July 11th 2019

[1{.rt-count} Retweet]{.retweets}[10{.like-count} Likes]{.likes}
:::
:::
DAGster looks pretty cool
::: {.tweet attrs=”{“url”:”https://twitter.com/schrockn/status/1148305587400138752″,”full_text”:”1/ Today we at Elementl are excited to launch an early release of Dagster, an open-source Python library for building data applications. Here’s a post about what Dagster is, why I moved to data infra, why data is hard, and why we need a new system. “,”username”:”schrockn”,”name”:”Nick Schrock”,”date”:”Mon Jul 08 18:59:21 +0000 2019″,”photos”:[],”quoted_tweet”:{},”retweet_count”:89,”like_count”:357,”expanded_url”:{“url”:”https://medium.com/p/dbd28442b2b7″,”image”:”https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/83193008-b553-459c-86a7-4aa06049b2db_420x344.png”,”title”:”Introducing Dagster – Nick Schrock – Medium”,”description”:”A open-source Python library for building data applications”,”domain”:”medium.com”}}”}

::: {.tweet-header}
[Nick Schrock
]{.tweet-author-name}[@schrockn]{.tweet-author}
:::

1/ Today we at Elementl are excited to launch an early release of
Dagster, an open-source Python library for building data
applications. Here’s a post about what Dagster is, why I moved to
data infra, why data is hard, and why we need a new system.
[[Introducing Dagster – Nick Schrock –
Medium]{.expanded-link-title}[A open-source Python library for
building data
applications]{.expanded-link-description}[medium.com]{.expanded-link-domain}][21]
::: {.tweet-footer}
July 8th 2019

[[89]{.rt-count} Retweets]{.retweets}[[357]{.like-count}
Likes]{.likes}
:::
:::

————————————————————————

About the Author

I’m a data scientist in Philadelphia. Most of my free time is spent
kid-wrangling, reading, and writing bad tweets. I also have [longer
opinions] on things. Find out more here or follow me on Twitter.
This newsletter, including warm takes about data, tech, and everything
around those two. It goes out twice-ish a week. Paid subscribers get
more warm takes and the warm feeling of supporting an author they
*LOVE* (right?, right?).

If you like this newsletter, support it and get friends to subscribe!

{.image-link .image2 .image2-640-1100}

{.tweet-user-avatar}
[5]: https://pbs.substack.com/media/DgIeOwRX0AIb_6S.jpg {.tweet-photo}

{.image-link .image2 .image2-1014-682}

{.tweet-user-avatar}

{.expanded-link-img}

{.expanded-link}

{.tweet-user-avatar}

{.expanded-link-img}

{.expanded-link}

{.tweet-user-avatar}

{.expanded-link-img}
[21]: https://medium.com/p/dbd28442b2b7 {.expanded-link}

September 29, 2020 Add Comment

Commoditize the clicks

The Factory at Asnieres, Van Gogh, 1887

One of my favorite subreddits is (sorry in advance) r/programmingcirclejerk, because it offers a place to call out some of the more ridiculous, lofty statements that people make about various programming languages. Sometimes it can be mean-spirited, but often it’s right on the mark. One of the recent posts was a quote from a blog post that said,

I often think Python is too easy. Can you really call it “programming” if you can generate classification predictions with only 6 lines of code? Especially if 3 of those lines are a dependency and your training data, I would argue that someone else did the real programming.

The discussion of the quote centered around the fact that it’s ridiculous to rebuild programming APIs from scratch, when there is already a full set of them pre-built by a community that specializes in that particular problem.

This cut to the heart of a trend I’ve been thinking about recently: how the process of data science itself is becoming a commodity.

To be clear, not analysis. Data analysis will never be able to be automated because it involves too much business logic, trial and error, and human involvement. But the data science models and the underlying algorithms, the pieces of code that go something like this:

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print("Mean squared error: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))

are already in the process of automation.

This is your run-of-the-mill linear regression model, straight from the scikit-learn (a commonly-used Python machine learning library) documentation, predicting the likelihood of getting diabetes based on a number of factors like eating, exercise, and glucose measurements.

It’s also the least interesting part of data science, because it’s based on

1) Clean data and

2) Running the code interactively on one single machine (i.e. you can’t hook it up to a web app or ask if other people are going to get diabetes.)

Neither of these situations ever occur in the wild west environments that data scientists and engineers work in.

Some days back, I read a really interesting post by Vincent Warmerdam called “The Future of Data Science is Past,” where he hypothesizes that the data science algorithms we’ve all come to know and love, like this one, are merely small parts of very complicated systems, and that what’s becoming more and more important, rather than the model code itself, is those systems themselves and how they talk to each other.

In short, the hype around algorithms forgot to mention that:

algorithms are merely cogs in a system

algorithms don’t understand problems

algorithms aren’t creative solutions

algorithms live in a mess

His main reasoning is that algorithms like decision trees, neural nets, and the like, are blunt tools that can only be used within the context of people shepherding them. Systems are complicated and take a long time to build, and the value is in the entire system working from end to end rather than a single algorithm making a prediction:

It is unrealistic that a self learning algorithm is able to pick up these rules as hard constraints so it makes sense to handle this elsewhere in the system. With that in mind; notice how the algorithm is only a small cog in the entire system. The jupyter notebook that contains the recommender algorithm is not the investment that needs to be made by the team but rather it is all the stuff around the algorithm that requires practically all the work.

This is increasingly true from my perspective. On any given data science project, with the exception of specialized ones that are very industry specific, it takes much more time to implement an algorithm than to choose one.

There has been a lot of talk about how machines are replacing humans with the advent of <scary CNN anchor voice> AI and machine learning </anchor voice.> Ironically what’s happening is that the data scientists that started working on these algorithms are getting crowded out of that space.

As I’ve written before, it’s my opinion that data scientists will now need to become much more like developers than statisticians, and we’re seeing that bear out in industry, particularly as the big shops – Amazon, Google, and Microsoft, build out products that do some (not all!) of the heavy lifting of machine learning (Sagemaker, Azure ML, and Google AI tools), and as products like H20AI and Yellowbrick that do feature selection and the parts of the machine learning process.

What does this mean? Not that data science will become obsolete. Analysis and model selection will never be fully automated, and, now more than ever, in an age of extremely large corporate goofs – both intentional and not – in the way algorithms and data are used, humans need to be in the loop.

As I’ve written before,

I realized that there are three fundamental parts of any data project:

data

creating code that moves or analyzes data

human judgment to interpret the results of parts one and two

If any one of the three are de-prioritized, a data project goes awry.

But, there will be a shift in the value of the work that a data scientist does. Before, it used to be all about exploration, finding correlations, and modeling.

Now, it’s about putting those model-commodities into production where other people can use and see them.

Links

My coworker wants the company to pay for a week-long sex romp with his fired girlfriend
There is only one way to do things in Python
Roller derby in Russia
A lot of interviews with engineers at big companies , like Dan Abramov of React fame (podcasts)
Scala with Cats

About the Author

I’m a data scientist in Philadelphia. Most of my free time is spent kid-wrangling, reading, and writing bad tweets. Find out more here or follow me on Twitter.

August 19, 2020 Add Comment

Hello world!

Paul Graham, one of the most respected voices in Silicon Valley, both from the business and technology side, recently wrote:

His replies soon became a chorus of people (very reasonably, for Twitter!) pleading with him to reconsider his opinion on whether the founders of Airbnb were poor, and what kinds of difficulties founders who haven’t gone to top schools might encounter while working on startups.

He refused to acknowledge this was an issue, even going so far as to insult people, and as is his usual M.O., start blocking them. This was extremely disappointing to see.

Everyone in tech who has been online for a while has their own pg origin story. I first stumbled upon his essays probably sometime around 2012, when I was just getting into the industry. Everything he wrote seemed so smart! How did he do it? My favorite pieces of his are probably Submarine and Maker’s Schedule, Manager’s Schedule, which I’ve sent to countless people by now.

He had a fantastic way for elucidating the common realities we’re dealing with in tech and turning them from a frustrating jumble of noise into clear, elegant writing that describes exactly what people are dealing with. “I know what it’s like,” his earlier posts, which were kind of like reading letters from an older, smarter brother.

It’s hard to pinpoint exactly when the tide turned, but it’s very much tied up in his success at YCombinator. In 2009, the company moved from the East Coast to the West Coast. In 2011, Yuri Milner, a Russian-Israeli investor with investments in the top internet-based companies of the past fifteen years, got involved. Sometime between then and 2014, when Sam Altman (who once tried to change US politics with a dashboard), stepped in, Paul Graham no longer was writing from outside the establishment. He was no longer a startup cofounder, writing Arc, waxing eloquently on Lisp.

Paul Graham was now the reason everyone wanted to be in Silicon Valley, and as such, he stopped getting contradictory advice, and started becoming complacent, to the point where he now refuses to see anyone else’s opinion or experience.

That’s very sad.

In one of his own essays, he writes,

When experts are wrong, it’s often because they’re experts on an earlier version of the world.

He’s also written, recently,

When I ask myself what I’ve found life is too short for, the word that pops into my head is “bullshit.” I realize that answer is somewhat tautological. It’s almost the definition of bullshit that it’s the stuff that life is too short for. And yet bullshit does have a distinctive character. There’s something fake about it. It’s the junk food of experience.

It seems that he now believes something very dangerous: not only is he still an expert on the current version of the world, but that what other people are saying is bullshit.

Links:

Who comes up with brand names?
Replacing Google Analytics with simple logs
The Dollar Store backlash

Subscribe now

About the Author

I’m a data scientist in Philadelphia. Most of my free time is spent kid-wrangling, reading, and watching foreign soaps. Find out more here or follow me on Twitter. Please send me any feedback (suggestions for links, thoughts, etc.) at [email protected].

June 15, 2020 1 Comment