Decode Encoded Input

Now, we'll do our actual decoding and write the decoded data into our pointer.

#![allow(unused)]
fn main() {
use std::{
    alloc::{alloc, Layout},
    str::from_utf8_unchecked,
};


pub fn decode(mut encoded_input: &[u8]) -> Vec<u8> {
    /* ... Code from above ... */
    loop {
        let mut parts = encoded_input.splitn(2, |&c| c == b'%');
        let not_escaped_part = parts.next().expect("Did not get any unescaped data");
        let rest = parts.next();
}

The first line here, loop { is a loop statement, which will run the block inside the loop forever, until either a break or return happens.

Each time we run our loop, we split our encoded input on the first '%' character, splitting it into at most 2 parts (hence splitn). The first part is any data at the beginning of the encoded input before the first '%', the second part is the rest of the encoded input after the first '%', if there is any. parts is an iterator over sub-slices of type &[u8].

Iterators over an Item type (in this case, &[u8]) are types that have a next() function to return the next Item if there is one, and None if there are no more items (using an Option type, with the same semantics as Python's Optional[] and C++'s std::option). Rust's iterators are extremely similar to Python's, and somewhat similar to C++'s in theory (they both allow you to iterate over entries of some collection), although not in implementation.

#![allow(unused)]
fn main() {
        if rest.is_none() && decoded == decoded_ptr {
            return encoded_input.to_vec();
        } else {
}

Next, we check to see if we have anything in rest. If not, the % was at the end of the input, or there was no '%'. If this is the case and we haven't written any decoded data to our output, we can return the encoded input as the result, converting the slice to a Vec (which performs an allocation).

#![allow(unused)]
fn main() {
            for not_escaped_byte in not_escaped_part {
                unsafe { *decoded_ptr = *not_escaped_byte };
                unsafe { decoded_ptr = decoded_ptr.add(1) };
            }
}

If we didn't return early, we always write our "not escaped" data to our output. This operation is exactly equivalent to *ptr = b; ptr++; in C, we simply need to wrap this operation in unsafe, because we are dereferencing a raw pointer. ++ and += are also not defined on raw pointers, but we can use the .add() method, which is

#![allow(unused)]
fn main() {
            if let Some(rest) = rest {
                if let Some(&[first, second]) = rest.get(0..2) {
                    if let Ok(first_val) =
                        u8::from_str_radix(unsafe { from_utf8_unchecked(&[first]) }, 16)
                    {
                        if let Ok(second_val) =
                            u8::from_str_radix(unsafe { from_utf8_unchecked(&[second]) }, 16)
                        {
                            unsafe { *decoded_ptr = (first_val << 4) | second_val };
                            unsafe { decoded_ptr = decoded_ptr.add(1) };
                            encoded_input = &rest[2..];
                        } else {
                            unsafe { *decoded_ptr = b'%' };
                            unsafe { decoded_ptr = decoded_ptr.add(1) };
                            unsafe { *decoded_ptr = first };
                            unsafe { decoded_ptr = decoded_ptr.add(1) };
                            encoded_input = &rest[1..];
                        }
                    } else {
                        unsafe { *decoded_ptr = b'%' };
                        unsafe { decoded_ptr = decoded_ptr.add(1) };
                        encoded_input = rest;
                    }
                } else {
                    unsafe { *decoded_ptr = b'%' };
                    unsafe { decoded_ptr = decoded_ptr.add(1) };

                    for rest_byte in rest {
                        unsafe { *decoded_ptr = *rest_byte };
                        unsafe { decoded_ptr = decoded_ptr.add(1) };
                    }

                    break;
                }
            } else {
                break;
            }
        }
    }

    unsafe { Vec::from_raw_parts(decoded, decoded_len, decoded_len) }
}
}

The rest of the code is somewhat difficult to break up coherently. There are only a few important new concepts, however.

Practical Fuzzing

Decode Encoded Input