on January 4, 2022 under blog

31 minute read

Earlier in the year, I became aware of a new Github solution named CoPilot. The concept, while not new, is an interesting one. The goal of CoPilot is to utilize Machine Learning to provide real-time code completions and suggestions. When I was approved for the beta program, I tested it out with random coding ideas. However, it is often helpful to structure what you’re trying to code. As I’ve participated in Advent of Code previously, and a new calendar was about to begin, that seemed like an excellent opportunity to test out CoPilot.

Please note, there are certain aspects of CoPilot, such as current legal challenges, on which I will not comment. Those issues need to be resolved through the proper channels, and I am not a legal expert. This write-up aims to discuss the technology, its uses, and pitfalls.

Advent of Code

Advent of Code is an annual coding advent calendar. The calendar starts easy and progressively becomes more difficult. Since it’s an advent calendar, there is usually a holiday theme of sorts and a quasi-story line to progress through. It’s often a good opportunity to learn a new language or brush up on your language of choice. You’re not bound to any specific language, and people have chosen to do AoC with things as crazy as Excel programming. It has been a while since I wrote C++, so I decided it would be good to try some of the challenges with that language.

Setup

Installing and running CoPilot was relatively strait forward. The tool has plugins for Visual Studio Code, Neovim, and JetBrains IDEs like PyCharm and IntelliJ IDEA. There is also apparently a plugin for Jupyter Notebook. I chose to use the integration with Visual Studio Code. Installation was performed through the vscode plugin menu. After installing it, you’re required to sign in to your github account. That’s all there was to it.

Thoughts

Languages

While CoPilot claims to support almost any language, its primary languages are C, C++, Python, JavaScript, TypeScript, Ruby, Java, and Go. I attempted to use the Python, JavaScript, C, and C++ completions. In doing so, I have concluded that CoPilot performs better with the higher-level languages like Python and Go than with lower-level languages like C. This is likely due to the languages themselves naturally supporting higher-level actions and ideas, so less work is required by the AI. I found that CoPilot did not support Rust at all in my testing.

CoPilot gives recommendations based on the source code on which GitHub trained it. This approach tends to result in the most common methods of solving a task, which isn’t necessarily the best nor the most correct. For instance, I found that creating encryption-related code in C and C++ led to non-functional code likely due to changes in library APIs through the years.

As an aside, it was interesting how CoPilot ingested some code for somewhat obscure libraries. I was an early adopter of the angr framework and was pleased to see that CoPilot completed some prompts correctly. Take the following prompt:

import angr

# Open the binary "test.bin"

The above prompt is correctly completed with:

import angr

# Open the binary "test.bin"
proj = angr.Project('test.bin')

To extend the example, let’s say we want to hook printf with a class-based symbolic procedure:

# Create class hook for printf function, then hook the symbol

This gives us:

# Create class hook for printf function, then hook the symbol
class printf_hook(angr.SimProcedure):
    def run(self, fmt_ptr):
        fmt_str = self.state.memory.load(fmt_ptr, 100)
        print "printf: %s" % fmt_str


proj.hook_symbol('printf', printf_hook())

Note that it incorrectly uses python2 style print statement, which is not valid for the current version of angr since it no longer supports python2. It also assumes that the format string is 100 characters long. If you’re unclear on what your recommended code does, this could be problematic.

Inline Completions

It took me a while to understand how to interact with CoPilot in the most efficient method possible. I started with completions while I typed. For instance, I created a python function called “print_running_sum” that would take in an array and print out the summation for each entry.

def print_running_sum(array):

The completion I obtained was:

def print_running_sum(array):
    running_sum = 0
    for i in range(len(array)):
        running_sum += array[i]
        print running_sum

The first note here is, again, I’m getting python2 by default, likely due to that being the majority of the code indexed by CoPilot. Secondly, notice CoPilot decided to use range(len(array)). This completion is an example of how the suggested code isn’t necessarily the best. In python, that type of iteration ends up being anti-pythonic as you’re encouraged to use the element iterator instead. These completions are unfortunate as they lead to correct functioning but poor code.

Full Function Completions

Next, I tested writing functions based on comments. Writing code from comments became my primary way of interacting with CoPilot. It allowed more lifting to be done by the AI, so long as I scoped the intended functionality correctly. Take the following example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

// Function: main
// Description: Reads file indicated by argument 1, and prints the contents of
// the file to stdout.
// Input: int argc - number of arguments
//        char *argv[] - array of arguments
// Output: int - 0 if successful, 1 if not

The above includes and descriptions led to the following code suggestion:

int main(int argc, char *argv[]) {
    // Check for correct number of arguments
    if (argc != 2) {
        printf("Usage: ./test <filename>\n");
        return 1;
    }

    // Open file
    int fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        printf("Error opening file\n");
        return 1;
    }

    // Read file
    char buf[1024];
    int bytes_read = read(fd, buf, 1024);
    if (bytes_read == -1) {
        printf("Error reading file\n");
        return 1;
    }

    // Print file contents
    printf("%s", buf);

    // Close file
    close(fd);

    return 0;
}

Note that the suggestion performs some sanity checks, such as checking the number of arguments provided and error checks when opening and reading the file. It also assumes the size of the file and only reads in the first 1024 bytes. This example is another case where understanding what the code does is essential. If I expect a file to be larger than 1024, I need to adjust the output. I could also re-write the comments to push CoPilot in a different direction. Note what happens when I add another comment line:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

// Function: main
// Description: Reads file indicated by argument 1, and prints the contents of
// the file to stdout.
//              The file can be arbitrary size.
// Input: int argc - number of arguments
//        char *argv[] - array of arguments
// Output: int - 0 if successful, 1 if not
int main(int argc, char *argv[]) {
    // Check for correct number of arguments
    if (argc != 2) {
        printf("Usage: ./test <filename>\n");
        return 1;
    }

    // Open file
    int fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        printf("Error opening file\n");
        return 1;
    }

    // Read file
    char buf[1024];
    int bytes_read = 0;
    while ((bytes_read = read(fd, buf, 1024)) > 0) {
        write(STDOUT_FILENO, buf, bytes_read);
    }

    // Close file
    close(fd);

    return 0;
}

Now, the output includes a loop used to ensure all of the file contents are output. CoPilot also decided to change the result to utilize the write call instead of printf.

One additional note when asking CoPilot for full function completions is if the function requires too many characters, CoPilot will only give you back the first x many (unsure exactly how many it is). The recommendation you get back will frequently stop mid-word or command. To get around this issue, I’ve discovered you can delete one character, then re-type it to nudge CoPilot into giving you more of the function. I have found myself repeating this paradigm often, which is why when utilizing CoPilot, it seems better to build small functions than large ones.

Cross-Language Completions

Comments can contain data structures that are not native to the code you’re writing. For instance, it can be counterintuitive to define your C structures based on JSON, but since CoPilot understands the structures, it can translate what you’re asking for. Take the following example:

// Creates a structure to hold the data defined by this example json
// {"name":"John Doe","age":30,"city":"New York","favorite_foods":["pizza","icecream","salad"]}
typedef struct {
    char *name;
    int age;
    char *city;
    char **favorite_foods;
} Person;

In this example, my comment to CoPilot regarded how I wanted a C structure to be created. Note that I gave it an example of the structure in JSON and received a data structure.

Comment Completions

Another bizarre but helpful completion I found was creating compile commands. CoPilot will fill in recommendations for your comments as you type, which means that you can ask CoPilot how you should compile your program. For instance, if your program contains libraries that need to be linked (such as SQLite), CoPilot will recommend how to link them. Here’s an example. Again, all code was generated by CoPilot:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sqlite3.h>

// Function: main
// Purpose: Opens the database and prints the names of all the tables in the
// database.
int main(int argc, char *argv[])
{
    sqlite3 *db;
    char *zErrMsg = 0;
    int rc;
    char *sql;
    char **result;
    int nrow;
    int ncolumn;
    int i;
    int j;

    rc = sqlite3_open("test.db", &db);

    if( rc )
    {
        fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(db));
        sqlite3_close(db);
        return(1);
    }
    else
    {
        fprintf(stderr, "Opened database successfully\n");
    }

    sql = "SELECT name FROM sqlite_master WHERE type='table'";

    rc = sqlite3_get_table(db, sql, &result, &nrow, &ncolumn, &zErrMsg);

    if( rc != SQLITE_OK )
    {
        fprintf(stderr, "SQL error: %s\n", zErrMsg);
        sqlite3_free(zErrMsg);
    }
    else
    {
        fprintf(stdout, "Table names:\n");
        for(i=1; i<=nrow; i++)
        {
            for(j=0; j<ncolumn; j++)
            {
                fprintf(stdout, "%s\n", result[i]);
            }
        }
    }

    sqlite3_free_table(result);
    sqlite3_close(db);
    return 0;
}

When I then prompt CoPilot for some compile commands, it understands to link with the SQLite library and also understands various debugging libraries:

// Compile with:
// gcc -o test test.c -lsqlite3
// For ASAN:
// clang -fsanitize=address -o test test.c -lsqlite3
// For TSAN:
// clang -fsanitize=thread -o test test.c -lsqlite3
// For UBSAN:
// clang -fsanitize=undefined -o test test.c -lsqlite3

Code Styling

Another exciting feature of CoPilot is its ability to mimic your code style. Instead of receiving back boilerplate code, it will attempt to format its suggested code to fit your styling, for instance, spacing, phrasing, variable naming conventions, etc. I don’t have a clear example of this working since, when it works, it simply looks and feels like your own code.

Code Describe

As of writing, the CoPilot nightly build provides the ability to ask the AI questions about the code snippet. The reverse query ability inherently flips the concept on its head and allows for general questions about the code. As is commonplace for modern AI, the developers don’t know what questions you can ask, which they freely discuss on their feedback page. Here are the current default prompts:

Explain Code
- Explain code tends to give a line-by-line literal explanation of what is happening
Code does the following / Code does the following (English)
- Both of these appear to try to coax the AI into summarizing code
Show Example Code
- This will ask the AI to create workable example code for calling the given function or utilizing the given code

As of writing, this capability is very new and often is just wrong or returns nonsense. Here’s an example:

def test(i: int):
    if i > 5:
        name = input("What is your name?")
        print("Hello, " + name)
    else:
        raise Exception("i is not greater than 5")

When I ask CoPilot to “Explain Code,” I get the following:

""" Here is the explanation for the code above: 1. Test is a function that
takes in an integer as a parameter. 2. If the integer is greater than 5, the
following happens: * The input function is used to ask the user to input their
name. * The name variable is then assigned to the inputted name. * The print
function is used to print the name. 3. If the integer is less than or equal to
5, the following happens: * An exception is raised. """

""" Here is the explanation for the code above: 1. The exception is raised if i
is not greater than 5. 2. The code will then ask the user to enter a name. 3.
The exception will be raised again if the user doesn't enter a name. 4. The
code will then print a greeting to the user. """

""" Here is the explanation for the code above: 1. We are using a try block,
which is similar to a catch block in Java. 2. We are using a raise Exception
statement, which throws an exception. 3. We are using a finally block, which is
similar to a finally block in Java. 4. We are using a finally block to print
out a message when the exception is raised. 5. We are using a finally block to
print out a message when the exception is not raised. """

Note that I re-run the same query and receive drastically different answers. Sometimes the answers are wholly incorrect, partially incorrect, and sometimes they make sense. Due to this current variability and its new technology that is still undergoing work, I am leaving that part of the review here.

Unexpected Completions

CoPilot attempts to complete what you’re typing even when you don’t expect it. It seems to do worse when given less context. For instance, the following were two examples of a starting comment block in empty files, which CoPilot suggested:

// Copyright (c) 2015, 2017, 2019, The Regents of the University of California.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
// * Redistributions of source code must retain the above copyright notice, this
//   list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above copyright notice,
//   this list of conditions and the following disclaimer in the documentation
//   and/or other materials provided with the distribution.
// * Neither the name of the copyright holder nor the names of its contributors
//   may be used to endorse or promote products derived from this software
//   without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
// DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
// FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
// DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
// OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# This file is part of the Metasploit Framework and may be subject to
# redistribution and commercial restrictions. Please see the Metasploit
# Framework web site for more information on licensing and terms of use.
# http://metasploit.com/framework/

These appear to be from source code directly ingested by the model. While these specifically are not an issue, it points to concerns about code ownership from the AI-generated recommendations. Similar concerns have been raised regarding CoPilot generating secure keys, logins, etc. The official CoPilot response to this has been that they acknowledge it’s possible, but they believe it’s unlikely to occur. My take on this is that if it’s possible but unlikely, someone will create a way to make it more likely. With that said, it’s not clear to me what the fix would be. Perhaps they need to blacklist specific privacy-related terms or check the sent output for indicators.

Common Errors

While most of the time, CoPilot returned accurate code recommendations, about 10% of the time, I would get code that had issues. As previously noted, sometimes the library had changed, and the code wouldn’t run. The more frustrating errors were where the code would run but not do so correctly. For instance, I found on multiple occasions that the bug was due to CoPilot confusing “less than” with “greater than,” even when explicitly written in the comments. I noticed a few off-by-one errors. I also saw a strange propensity for CoPilot to confuse multiplication with addition, again even when I explicitly mentioned multiplication.

CoPilot also tended to leverage the existing code in the file, even when it was not what I wanted. For instance, I wrote a simple split function to help break apart lines by a deliminator. However, I wanted to use regular expressions instead in some cases, as it would be a more straightforward and elegant way to parse. In those cases, I often found that it took extra work to force CoPilot to use the regular expression rather than the existing function. This led me to believe that there’s a sweet spot for the size of the codebase. With too much code, you end up getting overly specific and potentially incorrect suggestions. Too little code, and your suggestions can be a bit vague.

The errors mentioned above, combined with the amount of code generated automatically, meant I ended up spending a fair amount of my time debugging code I didn’t write rather than writing it correctly the first time.

Security Implications

In my experience, CoPilot has issues regarding bounds checks in C, as well as off-by-one and signedness confusions. It is likely that any code created by CoPilot potentially has any security vulnerability known due to CoPilot learning from public code. Here’s an example where I only provided the comments and let CoPilot create the code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Structure to hold a name, and age
typedef struct {
    char name[20];
    int age;
} Person;

// Function: main
// Description: Prompts the user for a name and age, and then creates the
// Person and prints it out.
int main(int argc, char *argv[]) {
    Person p;
    printf("Enter a name: ");
    scanf("%s", p.name);
    printf("Enter an age: ");
    scanf("%d", &p.age);
    printf("You entered: %s, %d\n", p.name, p.age);
    return 0;
}

Note that when I create the Person structure, CoPilot decides the name only needs 20 characters. However, when I ask CoPilot to create the structure from input, it builds a classic buffer overflow by using scanf with the %s argument to read an arbitrary amount of data into the statically sized buffer. It also should be noted that it didn’t initialize the Person structure on the stack to zeros. You can sometimes avoid cases like this when you utilize global variables or compiler define statements or make more explicit comments. Here’s a re-write of the same code when I use a compiler define statement for the buffer size:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_SIZE 64

// Structure to hold a name, and age
typedef struct {
    char name[NAME_SIZE];
    int age;
} Person;

// Function: main
// Description: Prompts the user for a name of max length NAME_SIZE and age,
// and then creates the Person and prints it out.
int main(void) {
    Person person;
    printf("Enter a name: ");
    scanf("%s", person.name);
    printf("Enter an age: ");
    scanf("%d", &person.age);
    printf("You entered: %s, %d\n", person.name, person.age);
    return EXIT_SUCCESS;
}

Ironically, in this case, CoPilot still didn’t understand the issue with scanf, even after prompting. Here I pushed CoPilot to use fgets instead:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_SIZE 64

// Structure to hold a name, and age
typedef struct {
    char name[NAME_SIZE];
    int age;
} Person;

// Function: main
// Description: Uses fgets to prompt the user for a name and age,
// and then creates the Person and prints it out.
int main(void) {
    Person p;
    printf("Enter a name: ");
    fgets(p.name, NAME_SIZE, stdin);
    printf("Enter an age: ");
    scanf("%d", &p.age);
    printf("You entered: %s%d\n", p.name, p.age);
    return 0;
}

These types of issues arise a lot, so it’s vital that whoever is utilizing CoPilot is aware of potential security issues. I provided feedback to the development team on this and was informed that this is expected, and they are generally trying to make CoPilot produce safer code. These security issues are one reason I am glad CoPilot is not out of preview yet.

Development Time

In my experience solving Advent of Code challenges, along with a random assortment of other things I attempted to code, it seems CoPilot sped up development by a substantial amount. The caveats to this are:

You need to get familiar with how to interact with CoPilot.
You need to have at least a baseline understanding of the language.
The codebase needs to be a moderate or smaller size.
You need to have a good idea of how you want to solve the given challenge.

If I had to put a number to it, I’d say CoPilot generally reduced my development time by 60% or greater.

Wishlist

CoPilot needs to support more than single file projects. In discussions with the developers, they generally agree. However, the issue appears to be computation power and the expense of processing that much more code by the AI. Codebase support may be on the development list to allow at some point, but it is not likely to be high on the list.

Re-structuring code would be fantastic. This could be from the perspective of code cleanup but could also be from additional or changed functionality. One such situation I found myself in a lot was when I wanted to make a minor tweak to a function but could only accomplish this by modifying the code myself. Here’s an example switch statement:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Function: menu
// Purpose:  Prompts the user for an integer and performs the corresponding
//           action
//           1 == print "Hello"
//           2 == print "Goodbye"
void menu(void)
{
    int choice;
    printf("\n\n");
    printf("1. Print \"Hello\"\n");
    printf("2. Print \"Goodbye\"\n");
    printf("3. Exit\n");
    printf("\n");
    printf("Enter your choice: ");
    scanf("%d", &choice);
    printf("\n");
    switch (choice)
    {
        case 1:
            printf("Hello\n");
            break;
        case 2:
            printf("Goodbye\n");
            break;
        case 3:
            printf("Exiting...\n");
            exit(0);
        default:
            printf("Invalid choice\n");
            break;
    }
}

Now, I want to add the third option to be “Aloha.” Of course, I can do this trivially myself in this case, but I would like CoPilot to do it instead. Here’s a full re-write with the updated option:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Function: menu
// Purpose:  Prompts the user for an integer and performs the corresponding
//           action
//           1 == print "Hello"
//           2 == print "Goodbye"
//           3 == print "Aloha"
//           4 == exit
void menu(void)
{
    int choice;
    printf("\n\n");
    printf("1. Print \"Hello\"\n");
    printf("2. Print \"Goodbye\"\n");
    printf("3. Print \"Aloha\"\n");
    printf("4. Exit\n");
    printf("\n");
    printf("Enter your choice: ");
    scanf("%d", &choice);
    printf("\n");
    switch (choice)
    {
    case 1:
        printf("Hello\n");
        break;
    case 2:
        printf("Goodbye\n");
        break;
    case 3:
        printf("Aloha\n");
        break;
    case 4:
        printf("Exiting...\n");
        exit(0);
    default:
        printf("Invalid choice\n");
        break;
    }
}

While this case is trivial, it’s easy to see where this type of statement may be in the middle of a function somewhere, and to have it re-written without touching the rest of the function would be beneficial.

Another wish list item would include the AI model updating regularly. Regular model updates would help with libraries that change and more modern code suggestions.

One thing that became very frustrating while learning and using CoPilot was the lack of ability to report bad suggestions. Having the ability to note when you were given a bad suggestion would help the end-users, and the developers better understand what was going on.

One final wish list item is for a security focus to CoPilot. Security improvements would include not providing insecure code by default by performing bounds checking, size checks, etc. It would also have the ability to ask the AI about the existing code for potential security concerns. Finally, there should be a focus on being proactive to ensure no sensitive data can be leaked from the models.

Takeaway

First, I want to say CoPilot is a remarkable technical feat. Much of the coverage I’ve read on the technology is incredibly critical and substantially negative. While I think it’s important to be critical of any new technology, it’s worth acknowledging the accomplishment. I didn’t think I would see a day where I could interact with an AI in this manner. The fact that I can has me excited to see this type of technology applied to vulnerability analysis and reverse engineering.

That said, this technology is understandably designated as a preview. It is rough around the edges, and if allowed to be used generally, would likely lead to worse code overall and more security concerns. To me, this appears to be part of the normal progression of technology, and at some point, we can expect a level of maturity that will allow this technology’s use more broadly.

Github CoPilot certainly has its advantages and disadvantages. The learning curve was steeper than I expected, and I didn’t know what I didn’t know when attempting to use it. Most of my frustrations with the tool revolve around specific poor recommendations (such as crypto libraries and utilities that weren’t well supported, like Frida) and debugging my code to find bugs that I didn’t write.

Will I continue to use it? In some cases, yes. If I’m building something from scratch, it would probably be helpful. However, if I’m trying to add to an existing codebase, especially a larger one, CoPilot feels to be more cumbersome at the moment than it’s worth.

I’m posting my code from Advent of Code below. At least 90% of the code was generated by CoPilot via comments. Also, I program by hobby rather than profession, so please be kind.

For fun, I asked CoPilot to write its conclusion. Here’s what it gave me (note, I’m unaware of who Stefan Wehrmeyer is or why he’s credited with CoPilot):

In conclusion, CoPilot is a very easy to use, flexible and powerful tool.

I would like to thank the following people for their help and support:

- Stefan Wehrmeyer (CoPilot's author)
- Stefan Wehrmeyer (CoPilot's maintainer)

Advent of Copilot