Both input and frame-rate are mirrored. Basically the way the system works is first it sets up a network connection with the players devices connected as clients on my server (will eventually incorporate a switch to P2P). One device is the initiator (the one who created the game session).
Both devices start the core and immediately pause after the first call to flip buffers, storing the frame number and timestamp for when the pause completed. They then do a "hand shake" of sorts, where basically the initiating device says, "I'm ready!", and the other device (when ready) responds with "I'm resuming in 10 seconds from timestamp X at frame number X". The initiating device uses this information to calculate when to resume. They then enter an asynchronous communication loop with each other, where both devices send repeated messages with the following information:
1) N64 digital button states (queue of states if more than one change between messages)
2) N64 analog state (or queue of states if message time exceeds a minimum time delay between snapshots)
3) Video frame number (counted as a 1-up integer) with a timestamp when that frame occurred
Upon receiving the message, each device applies the queue of button and analog states (need an option for touchscreen controls for picking player number), then compares the video frame number and timestamp with their own numbers to determine if they are running faster or slower than the other device. If faster, emulation speed is decreased (minimum 0%). If slower, emulation speed is increased (maximum 100%). In theory, this causes a fast device to slow down when playing with a slow device (not really working correctly yet though).