Buffer overflow in python

Содержание

Do high level languages allow for buffer / heap overflow?
6 Answers 6
Python TCP buffer overflow
1 Answer 1
buffer overflow in python script
2 Answers 2

Do high level languages allow for buffer / heap overflow?

I’m learning about basic x86 overflows in C but normally I code with Python. Is there anyway that programs written in higher level languages can suffer from buffer/heap overflow?

Is this question specific to Python? Every language implements its own behaviors and syntactic sugar. This specific answer is not a problem for PHP nor JS, and will simply behave as desired.

@CaffeineAddiction That’s an answer, but if you posted it I would have voted it down, because a memory leak or simply allocating too much memory is not the same thing as accessing memory of bounds. That’s what is the general meaning of a heap / buffer overflow.

6 Answers 6

Overflows don’t occur in a language, they occur in a process. Specifically, a «buffer overflow» occurs when memory is allocated on the stack and the program writes outside that memory and into following memory.

Even on a language like Python or C#, such things could happen in theory. However, the runtimes those languages are based on will ensure that most of these scenarios don’t happen. Consider the following python code:

cars = ["Ford", "Volvo", "BMW"] cars[3] = "Mazda"

This will print the following error:

Traceback (most recent call last): File "main.py", line 2, in cars[3] = "Mazda" IndexError: list assignment index out of range

So instead of just overwriting some memory, the runtime caught that cars only had three elements and writing to a fourth element is therefore forbidden.

Читайте также: Demo RSS Feed

That seems like overflows are impossible, right? Well, not exactly. The python runtime itself is just a process and thus susceptible to all kinds of vulnerabilities, including buffer overflows.

For example, CVE-2021-3177 has been found last year and has the following summary:

Python 3.x through 3.9.1 has a buffer overflow in PyCArg_repr in _ctypes/callproc.c, which may lead to remote code execution in certain Python applications that accept floating-point numbers as untrusted input, as demonstrated by a 1e300 argument to c_double.from_param. This occurs because sprintf is used unsafely.

Now, how to interpret this is a matter of semantics. One could say Python is vulnerable to overflows, because you could write a python program that causes a buffer overflow. Or you could say it’s not vulnerable to overflows, because the overflow itself actually occurred in a C program, which just happens to be interpreting python code.

High-level languages generally guard against such vulnerabilities, but the underlying runtimes of these programs are still vulnerable.

TL;DR usually appears that the start of the answer; most people are scanning long answers looking for a summary at the end.

Your TL;DR defines «buffer overflow» too narrowly. Not all buffer overruns have to be stack-smashing attacks, overrunning into global variables or heap allocations is also possible and while less useful for remote code execution, they may create other vulnerabilities.

It’s worth understanding that python is a Turing complete language. As such you can use python to simulate an entire computer and simulate c running on it and so buffer overflow. The Python language protects you from buffer overflow when you use it’s abstractions that are designed to prevent that. There will always be a way to reach past those abstractions and shoot yourself in the foot. Mostly by creating your own abstractions.

I think that there are some issues with the terminology in the answer. Not sure if you can call an implementation of a language the same thing as a language. Buffer overflows can happen on a machine, e.g. in the kernel, before a higher level concept such as a «process» is defined. Similarly, I would not say that a library implemented in C is a «program». Of course I do agree with the general gist, so enough for an upvote 🙂

@MaartenBodewes Ultimately, the question boils down to semantics. But I think the answer as is gives a «good enough» insight to be helpful.

«level» of a programming languages is not a particularly well-defined concept.

C++ for example would generally be regarded as a higher-level language then C but it still leaves the user open to the same memory safety problems, including buffer overflows, that C suffers from.

Python on the other hand does try to protect it’s users from such mistakes, the regular python programmer never sees a raw pointer, they only see reference counted object references, the standard python collection objects are protected against overflows.

Still, there are ways to create buffer overflows in python. In particular there is a module in the standard library called «ctypes». The intended use of these functions are to allow interoperability with C code, but in order to do so they must provide mechanisms to work with C style raw pointers which cannot have bounds checking applied.

For example I was able to produce a segmentation fault with the following python code.

from ctypes import * pointer(c_char(b'a'))[10000000]

That is a contrived example but because python is a relatively slow language, most python code calls into code written in other languages, most commonly C and C++ to do the «heavy lifting». Buffer overflows can happen either in the C and C++ libraries themselves or in the glue code (which may be written in either C or python) that interfaces between python and C.

In an extreme case a hastily written glue code could even return something like a ctypes pointer object to the end user’s python code.

Even then, C used to be considered a High-Level language as opposed to writing code in assembler directly (which was not uncommon back then when C was developed).

tl;dr: (most) high-level languages specifically protect you from that, but a very rare bug could make those protections fail.

High-level languages are (generally) designed as not to allow intentional or unintentional buffer/heap overflows (among other things that could represent vulnerabilities or could lead a programmer to introduce a bug). But in the end there is always some low-level language involved, so the only thing protecting you is the high-level language’s design.

This is because in the end these high-level languages will end up compiling, transpiling or translating your code to a lower-level language that does allow buffer/heap overflows, and/or your code (as is or compiled, transpiled or translated to another high-level language that also doesn’t allow buffer/heap overflows) will end up running in another program (such as the JVM for Java) that is written in a language that does allow buffer/heap overflows (or at some point something is running at a low enough level to allow this).

We typically use very well-tested tools for 99.9% of what we do with 99.9% of high-level languages, but nobody can’t guarantee that there is not a 0-day vulnerability in one of these tools that could allow you or a malicious actor to create a buffer/heap overflow, against the language’s specific design and intent. This risk increases if you use less-tested tools (for whatever reason, though in my experience this is VERY uncommon), but it shouldn’t be significant.

Note: I wrote this as if tools like the JVM or the Java compiler were a single tool that’s part of the Java language (and the analogous for other high-level languages). This is technically not true, there are many distributions of these tools, and neither of them is a part of the language itself, but they are a part of the language ecosystem and they are (almost always) needed for the language to be used and useful. A vulnerability in the Java compiler and/or the JVM could lead to a buffer/heap overflow and technically it wouldn’t be in the Java language itself, but I think you were asking in general, not caring about the specific distinction between the language and the tools that make it useful.

Источник

Python TCP buffer overflow

I have a client server communication, I wrote the following server to handle the incoming message, but If the message is bigger than the buffer than it will be lost. How can I receive the whole packages in case if the messages are bigger than the buffer size? Is there any possibility, or I have to force the client(send a message at the begging with the maximum buffer size) to send a message within the buffer size?

msg ='' while( True ): msg += server.recv( 20480 ) aSplit = msg.partition( "" ) #We received the full message while( aSplit[ 1 ] == "" ): messagehandler( aSplit[ 0 ] + "" ) msg = aSplit[ 2 ] aSplit = msg.partition( "" )

1 Answer 1

When dealing with any kind of packetised message format, you only really have two choices:

Make sure your buffer is big enough to deal with the whole message.
Write your code so that it can parse partial messages.

When I say «buffer», though, I don’t mean the parameter for recv() — you can make that as small as you like, and just go around your while loop multiple times until you have a whole message.

So, to take the buffering approach you could do something like this:

msg = '' while True: msg += server.recv(8192) while True: aSplit = msg.partition("") if not aSplit[1]: break messagehandler(aSplit[0] + "") msg = aSplit[2]

This works because if isn’t found then partition() still returns a 3-tuple where the first item is the entire string and the other two are empty. So, all the while that partition() returns a non-empty string for the separator then a packet has been found. As soon as that’s empty, there’s a partial packet in msg (or it’s empty), so we go back to reading from the network until we get an entire packet again.

This does involve buffering up the whole message in the msg string, but that’s fine unless you expect these messages to get very large (multiple megabytes) — this could happen if the messages include large files, for example. In this case you need to be more clever and do something like swap data out to disk or process the data as you receive it.

Let me know if I wasn’t clear about any of that.

EDIT: I should add, it’s generally a good idea to make sure the buffer (i.e. msg ) doesn’t get too large — if it does then you need to close the connection because something’s gone wrong. This stops something feeding the application endless data until the memory runs out on the system, either accidentally or maliciously. Also, you need to be very sure that the string can’t actually occur inside the message — that would split the message in half incorrectly.

Источник

buffer overflow in python script

I am developing a script to perform an a buffer overflow for an assignment in school. However, I am stuck at a point where my payload works injected through the commandline, but not injected through my python script. When I inject my payload from the commandline:

 user@ubuntu:~/Documents/$ /home/user/Documents/easy $(python -c 'print"AAAAAAAAAAAAAA"\xa0\xf4\xff\xbf"') $ exit //I get the shell.

Is the address of my NOP sled in an environment variable. Now, I run the complete same command through my python script:

 path = "/home/dvddaver/Documents/easy AAAAAAAAAAAAAA\xa0\xf4\xff\xbf" os.system(path);

 user@ubuntu:~/Documents$ python bruteforcer.py Segmentation fault (core dumped)

2 Answers 2

I understand it is a bit late for your assignment 😉 but for other students who may be groping in the dark with similar problem, here goes.

Python is written in C, and the C executable is throwing the segmentation fault error. To understand segmentation fault you need to run Python itself in gdb (GNU Debugger assuming you are on Linux/Unix) and then pass in your script as the parameter and then step through the C code written for Python.

It is quite possible that you have have caused a buffer-overflow within the python interpreter to have caused a segmentation fault (so that’s good!). Though I cannot say for sure in the case you are executing here.

I have studied one of the vulnerabilities of Python in detail and blogged about it. It affects older versions of Python 2 and 3. I downloaded the Python source and built the source with debugging on.

Next I figured out how Python works and executed Python scripts on the built interpreter and stepped through them in GDB.

I have uploaded my work on my Blog (http://yazadk.wordpress.com/2014/12/10/remotely-exploitable-buffer-overflow-in-python/) — though mind it we were specifically looking for C programs that had buffer overflow available in the wild and I happened to choose Python which is written in C.

I hope this gives you some basic insight in what you or anyone else is trying to achieve.

Источник