No, ChatGPT Code Interpreter Cannot Replace Data Scientists… Yet

Open Data Analytics
6 min readJul 10, 2023

--

Everyone’s singing praises for the new kid on the block, ChatGPT Code Interpreter — OpenAI’s latest masterpiece, the AI sensation that’s supposedly going to “make Data Scientists as a profession obsolete”. So, I bought into the hype, rushed to ChatGPT Code Interpreter first-hand, and gave it a quick try, hoping it will be a bang rather than a whimper.

Here’s what I’ve found out, it’s not all sunshine and rainbows:

Everything Looks Good in the Beginning

  • (Spoiler alert): ChatGPT Code Interpreter lacks the ability of visual perception, which is a crucial skill for data scientists for discovering insights.

To get started, I put ChatGPT Code Interpreter to the test with an Electric Vehicles dataset to see if it could outdo a data scientist.

In the typical ChatGPT way, ChatGPT Code Interpreter gives me a very lengthy response with:

  • Description for each column in the dataset
  • Basic information about this dataset such as the number of rows and columns, missing values, and data types of each column.
  • It performed descriptive statistics on the numerical columns and categorical columns.

All right, good start. So I proceed to the next step:

ChatGPT Code Interpreter returns me a heatmap showing the correlation matrix of the numerical variables in the dataset. Also, it reads the graph and generates a summary for the correlation matrix.

Also, it gives me a regular bar chart counting the distribution of categorical variables, together with a box plot about the relationship between categorical and numerical variables, with basic descriptions and insights.

So far so good. ChatGPT Code Interpreter does it quite well at processing basic tasks written in natural language for not-that-complicated datasets. However, I need to be very specific in the prompt, and break down the task into multiple steps with enough details for ChatGPT Code Interpreter to digest.

At the end of the message, it also says:

I am glad that ChatGPT Code Interpreter does not talk like an over-confident intern. It does have some level of self-awareness.

But… Does ChatGPT Code Interpreter Really “Get” It?

That would be the crucial question for every data scientist. Does ChatGPT Code Interpreter truly understands the data, or merely mimicking?

To investigate, I continue asking the following question:

It discovers the hidden curve in the dataset. Very nice.

So I push ChatGPT Code Interpreter further, asking for its reasoning:

It “tries to” fit the curve:

For a moment, it genuinely “blows my mind” (I know, this cliche word has been overused a million times by Twitter thread bros). ChatGPT Code Interpreter actually “understands” there is a curve. Until… I checked the code:

That’s anticlimactic. It seems that ChatGPT Code Interpreter is merely curve fitting. It just uses all the data points and makes a regression curve for them, without understanding the original curve in the scatter plot. It means interpreters do not have the ability to understand the visual patterns in visualization. It’s still a language model.

ChatGPT Code Interpreter can generate runnable Python code and give simple stats results in texts. It cannot actually understand a visualization without using a visual perception library to extract text from the chart. It seems that ChatGPT Code Interpreter is still being stuffed by the limitations of Large Language Models. It’s not a Multimodal Model, as many have expected.

Eventually, I try to ask ChatGPT Code Interpreter for Data Clustering:

The output is plainly… incorrect, even on the second try.

Actually, one of the potential explanations behind the curve is:

The deciding factor is chargestatus.

Blue = Charging while parked
Cyan = Charging while driving
Gray = Not charged

Give More Credit to the Packages/Dependencies

ChatGPT Code Interpreter serves as a User Interface for corresponding packages and dependencies, which do most of the dirty work.

Take an example of the OCR feature. On paper, it seems to be fantastic. In fact, Code Interpreter is actually powered by a Python library:

But what happens when there is no available library?

Oops, it hits a wall.

Image to Video! … Not Really

There is one last thing I must give it a try. Everybody has been hyped by the buzzword: Images into video with ChatGPT! Who can resist that?

Well, this escalated quickly. But no worries, I can tell Code Interpreter this is a picture of a cute cat. Surely it can create a video, right?

Hmm, seems like something isn’t right here. No worries, I can be more specific about the video size:

Here’s the downloaded video:

Hey, it’s not even moving.

As a comparison, this is what Stable Diffusion can generate:

Hey, it isn’t what I have been sold by the Twitter thread boys. I need a refund.

Final Thoughts

What’s my take after playing around with ChatGPT Code Interpreter? It’s clear that the Code Interpreter can’t fully replicate or replace a human data scientist’s skills, yet. Data science isn’t just code — it’s complex reasoning, expertise, and a bit of healthy skepticism. ChatGPT Code Interpreter hasn’t reached that level, for now.

To those fearing AI will take over data science, I’d say: don’t rush. ChatGPT Code Interpreter is impressive, but for putting every Data Scientist, Programmer, Artist… out of a job. The more likely future is human-AI collaboration rather than “AI will replace humans in X years”, as you might have to endure numerous times on your Twitter feed.

And I would like to end this article with the ChatGPT Code Interpreter system prompt:

--

--

Open Data Analytics

We are a collective of writers who are eager to explore data's limitless possibilities, and share insights & tips on data analysis, visualization, open source