Tor Internals
modified: 2015-05-11 17:58:43

This is my scatch pad for the moment about the internals of Tor.


The topology of the source:


Each circuit maintains the encryption keys, therefore I assume all streams under that circuit are encrypted with the same key.

A cell is a static sized message. It is always of length CELL_PAYLOAD_SIZE and it is used to communicate between nodes. The entire cell payload is encrypted which makes it impossible to trim it to the actual size of the payload, and the encryption should render compression useless.

A relay cell is a cell that contains data. The data is from a stream which can be coming from an application (connected to Tor) or from a destination headed back to the application. This stream is associated with a circuit. The circuit is associated with a channel, which resides on a connection.

When talking about direction as in or out we are talking about from the application proxy's point of view (CONN_TYPE_AP). If the data's direction is out then it is headed towards the destination, and inward means its headed back toward the application. You may also find the wording of exitward, which means towards the exit and away from the AP (application proxy).

The AP is just connection to the application, although it likely involves SOCKS or some kind of interface. From our perspective it still results in a stream of bytes that get routed to and out of an exit relay.

The word relay and node can be exchanged in usage.

The encryption is applied for each layer from the AP towards the exit, and unpeeled as the cell moves towards the exit, but from the exit towards the AP the encryption is layed back on at each hop.

The origin refers to the first relay in the path of the data. The origin should (need verification) be the AP (application proxy) connection.


    p_chan              - headed towards AP
    n_chan              - headed towards EXIT

The types of connection are:

    CONN_TYPE_OR        - onion router
    CONN_TYPE_EXT_OR    - extended onion router?
    CONN_TYPE_EXIT      - exit (destination)
    CONN_TYPE_AP        - application (source)
    CONN_TYPE_DIR       - directory?
    CONN_TYPE_CONTROL   - control?

There exist two major execution paths for data arrival. The first is data arriving from the onion router network and the second is data arriving from the application proxy or exit. Each is handled a little differently but both paths converge at specific points of functionality.

The path of data from Tor external source or destination (AP or EXIT). This is the execution path for data that arrives from an application (through SOCKS) or from an exit connection. An exit connection is the actual connection from the exit node to the destination specified by the application. The execution path for data arriving from the onion router is missing here:

    connection_handle_read_impl (connection.c)
        # not a whole lot happens here; we handle linked connections directly; handle listeners;
        # and edge connections
        connection_handle_listener_read (connection.c)
            # handles all the listeners (OR, AP, CONTROL, ...)
            connection_add (main.h -- macro indirection)
            connection_init_accepted_conn (connection.c)
                connection_tls_start_handshake (connection_or.c)
                    # this can handle either end of the handshake (but in this case we are handling
                    # a connection receive)
                    connection_tls_continue_handshake (connection_or.c)
                        tor_tls_handshake ()

        connection_process_inbuf (connection.c)
            [CONN_TYPE_OR || ???]
                # this is data coming off the Tor network, which may go back onto it, or it
                # may go out an AP or EXIT connection; it needs to be decrypted
                connection_or_process_inbuf (connection_or.c)
                    channel_tls_handle_cell (channel.c)
                        channel_queue_cell (channel.c)
                            [indirection to handler]
            [CONN_TYPE_AP || CONN_TYPE_EXIT]
                # this is data coming from the AP or EXIT connection; it needs to be encrypted
                connection_edge_process_inbuf (connetion_edge.c)
                    connection_edge_package_raw_inbuf (relay.c)
                        connection_edge_send_command (relay.c)
                            relay_send_command_from_edge (relay.c)
                                [packages command and payload into cell]
                                circuit_package_relay_cell (relay.c)
                                    [encrypt it and append it to a cell queue]
                                    append_cell_to_circuit_queue (relay.c)
                                        [cell is packed and placed on appropriate channel queue for circuit]

The path of data from Tor's internal network. This is data that is destined to exit from the exit node and reach it's final destination, or data that has just reached the guard and is about to be piped back to the application using the Tor network:

        connection_process_inbuf (connection.c)
            connection_or_process_inbuf (connection_or.c)
                connection_or_process_cells_from_inbuf (connection_or.c)
                    channel_tls_handle_cell (channeltls.c)
                        channel_queue_cell ()
                            [indirection to handler]

A handler execution path. This is the cell (data) coming off the Tor network and may be going to back into the Tor network (to the next relay), out of the exit, or out to the AP (application (proxy)):

        command_process_cell (command.c)
                command_process_relay_cell (command.c)
                    circuit_receive_relay_cell (relay.c)
                        relay_crypt (relay.c)
                            # at this point we decypt the cell and we check if we recognize
                            # the cell as unencrypted; depending on the direction of the cell
                            # (if it is headed towards or away from AP/EXIT) we may undo multiple
                            # layers of the encryption, but ultimately we determine if it is
                            # destined for us, if we need to drop it, or forward it along the
                            # circuit
                        [IT IS RECOGNIZED (decryption revealed a valid digest and recognized field)]
                            connection_edge_process_relay_cell ()
                                # the payload will be transmitted out the EXIT or AP
                        [IT IS NOT RECOGNIZED] (decryption revealed an invalid digest and recognized field)
                            append_cell_to_circuit_queue ()
                                # the cell will continue onward to the next hop


What is an edge connection, exactly? (edge_connection_t)

What is a linked connection for? (see connection_handle_read_impl)

How does circuit setup happen?

Cell Commands:

    CELL_RELAY      - should hold data from and to; AP and EXIT

A handler:

    command_process_cell (command.c)
        command_process_relay_cell (command.c)
            circuit_receive_relay_cell (relay.c)
                connection_edge_process_relay_cell (relay.c)

Some terminology:

    channel         - ???
    cell            - fixed size packet of sorts between nodes; fixed size
                      helps prevent attacker from gaining knowledge about the
                      size of packets of the application layer protocol
    connection      - represents an IP/TCP connection
    stream          - stream of data over a circuit (HTTP/SMTP/.. connection)
    circuit         - multiple streams are multiplexed over a circuit

The bit bucket implementation seems to reside in connection.c.