;-*- Mode:Text -*-

Architectural Specification
Programmer's Reference

This should produce two documents: an Architectural Specification that
unambiguously defines the hardware, and a Programmer's Reference that
defines the "virtual machine" as seen from user-level compiled code.

The Architectural Specification describes every detail of the
hardware in somewhat flat terms.



Right now, everything is piled in here.

;;;;;;;;;;;;;;;;

OVERVIEW:
	29332 ALU; dual-port register ram; function-call hardware;
	LISP compiled directly into 64-bit machine instructions;
	three-address / three-opcode instructions;
	instruction and data caches; 2^26 word virtual memory space;
	hardware type-checking and GC assist; 16MB local memory;
	nubus interface.

;;;;;;;;;;;;;;;;

SUMMARY OF HARDWARE FEATURES:

ALU
	29332
	LDB and non-LDB
	instruction, control, status inputs
	status outputs
	passaround
register ram
	dual port, 4K x 33
	control and use of 33rd bit
	12-bit address from call-hardware;
	8/12-bit from indirect
call hardware
	frame select registers O, A, R, plus 4-bit immediate G
		form 12-bit register ram address
		for sources and destination.
	automatic frame management
	trap when full
	sources / destinations
		2 bits select O, A, R, G
		4 bit offset
		1 bit func / reg (right and dest only)
	value of O, A, R, G for sources is before the instruction's call-hardware
		operation; value for dest is after.
	Legal instruction sequences
		i.e. return-return is not legal. (must use tail-recursion to avoid
		  call followed by return)
virtual memory
	main memory
		16MB: 4M x 32
	addressed by VMA
		normal, early start
	data through MD
		read / write
		control logic stops or traps on access
	trap protocol for MD / VMA access
	map
		26 bit virtual address
		map to local main memory or nubus
		one level, 4K clusters
		64K x 28
		bits:
			addr (20), access, write, volatility (2), local/nubus
instruction memory
	64-bit instructions; 23- or 24-bit PC
	instruction cache
		two cache sets, 2K x 64 each; 4 word blocks
		first 4K words of instruction space is fast memory
	instruction cache is read-only
		write access through main memory; requires cache flush
	cache fill is through virtual memory map
		instruction space appears in top of virtual memory space
		filled from alternating 32-bit words
exceptions
	traps (execution of interrupted instruction is modified)
		(trap routine computes new value to be stored in dest cycle
		 of re-executed instruction)
		alu overflow
		data type
	interrupts (interrupted instruction is re-executed exactly)
		page-fault
		GC
		transport
		nubus error
		nubus interrupts
		local interrupts
		call hardware full / error
		reset
	machine control register has trap-enable bit(s).
	all exceptions vector to PC zero.
	active trap and interrupt requests read in unencoded form from func source.
	when traps are enabled, any trap requests causes a trap and disables
	  traps; cause of trap request must be reset before traps are enabled.
functional sources
	status
	md
	trap requests
	call-hardware
	microsecond / stat counters
functional destinations
	control
	vma
	md
	map
	call-hardware
	Icache control
	microsecond / stat counters
nubus interface
	nubus access is indicated by map
software single-step trace
	execute one instruction and then trap
debug interface
	single-step execution
	test register on MFO, on M board
	IR
	VMA
	MD
	PC
booting
	Execute from boot prom.
	Prom is on main-memory bus; fills IR / Icache with same
	timing as running from main memory.
	Control-reg bit set on RESET, forces instructions to fetch
	from boot prom instead of main memory.

;;;;;;;;;;;;;;;;

Instructions:
	all instructions have three separate opcode fields:
		Instruction Category (ICAT): indicates how other fields are used.
		PC source (NEXTPC): selects PC for next instruction.
		Continuation (CHOP): call-hardware operation.
		Illegal combinations may cause the machine to halt or trap,
		  and must not cause damage.
	ICAT: (3 bits)
		ALU	all possible ALU chip inputs are available
		ALUI	8, 16 or 24-bit immediate data combined with ALU operation
		LOADI	32-bit immediate data; ALU op is "Y<-R"
		ADDR	23-bit address for jump or call; ALU op is "Y<-R"
		ALUX	same as ALU but with different set of ALU ops.
		ALUIX	same as ALUX but with different set of ALU ops.
	NEXTPC: (2 bits)
		IR:	All or part of PC comes from Instruction Register
		Disp:	Dispatch; PC taken from ALU output reg at end of current instruction
			(result of instruction fetched two cycles back)
			If JCOND bit 0 is 1, the low 4 bits are forced to zero.
		Ret:	Return PC from call-hardware
		PC+1:	All or part of PC is current PC + 1.
	CHOP: (3 bits)
		no-op	no call hardware operation
		open	allocate new register frame; address in open frame
		call	activate open frame and do function-call protocol
		open-call: open and call together
		t-open	tail-recursive open
		t-call	tail-recursive call
		return	function return
		;; cancel-open: undo an open or t-open.
	Specially decoded combinations:
		CALLZ	8-bit address of 16-word multiple for call to cluster zero.
		BRANCH	12-bit addr within current 4096-word page, for local jump.

Specially decoded operations:
	Certain combinations of the Instruction Opcode, PC-source and Continuation
	modify the combination of bits selected for the PC-source.

	Opcode	PC	Cont	Effect
	======	==	====	======
	any	Disp	any	Dispatch: if Jump-condition-select bit zero is 1,
				  PC bits 0-3 are forced to zero.
	CALLZ	IR	any	CallZ: PC bits 0-3 and 12-11 are forced to zero.
	BRANCH	IR	any	Branch: PC bits 8-22 are forced to select PC+1;
				  if the selected jump-status is false, PC bits
				  0-7 are also forced to PC+1.  I.E., if the jump
				  condition is true, it is branch-within-256-word-page.
				  Jump status is the condition selected by the PRECEDING
				  instruction; it is always conditional, but condition
				  "always" is available.

Misc: (fixed)
	register-ram boxed bit control: (2 bits)
		select what value is used for boxed bit of register-ram destination
		0: 0
		1: 1
		2: boxed bit from left source
		3: boxed bit from right source
	Type-checking control: (3 bits)
		Controls data-type trap for this instruction.
		Selects what values of left and right ALU source data types and
		boxed bits will cause the instruction to be aborted (trap).
		List combinations ...
	Statistics: (1 bit)
		Bit may be selected for statistics counter, trap and/or halt
		(VERIFY that we still want it ...)

Source / Dest specification:
Destination: (7 bits) (always present)
	select functional or register-ram destination.
Left source: (6 bits)
	select ALU left-side source; always from register-ram
Right source: (7 bits)
	select ALU right-side source; functional-source or register-ram
Return-destination: (7 bits)
	function-return destination field saved by call-hardware as part
	of function-call protocol.  Destination of functional-destination D-RET
	causes current return-destination from call-hardware to be used
	as instruction's destination field, in place of indicated destination.
4-bit-immediate: (4 bits)
	Source / dest fields select a register-ram location or functional source
	or destination.  The 7 bit fields have a bit that selects functional vs.
	register ram; the 6 bit field always selects the register ram.
	Used as register-ram address:
		2 bits: select one of Open, Active, Return or Global register frame addresses.
		4 bits: offset within frame
		O, A and R select 8 bit registers that are combined with the 4 bit
		  offset to form a 12-bit registe ram address.
		G combines the 4-bit immediate field from the instruction with the
		  4 bit offset within frame to form an 8 bit address that selects one
		  of the first 256 register ram locations.
	Use as functional source / dest:
		2 bits of frame select and 4 bits of offset combine to form
		  6-bit functional source / dest address.
	Destination value of functional-destination all-ones (3F) is used as the
		"garbage" location for instructions that don't need a destination.
		location for instructions that don't write a normal dest location
	4-bit-immediate:
		used only when G is selected in the register frame field.

ALU operation: (9 bits)
	9 bits of 29332 ALU instruction including 2-bit byte-width select.
	For machine instructions that don't specify the ALU operation,
	the operation is forced to copy the right (MFO) source to the dest.
ALU position/width (11 bits)
	Select bit-field position and width for certain ALU operations.

Jump address: (23 bits)
	8- or 23-bit jump or call address.  If an 8-bit field is used, it is
	in bits 0-7 for BRANCH and bits 4-11 for CALLZ.

Jump condition select: (3 bits)
	Instructions that don't include a valid jump condition select
	  may not be followed by a BRANCH; the status is garbage.
	Select one of:
	Indir	From machine control register Jump-Status bit (indirect)
	Always	always true; required for unconditional jump
	C	alu status "carry"
	C-	carry inverted
	Z	alu status "zero"
	Z-	zero inverted
	C+Z	carry OR zero
	(C+Z)-	C+Z inverted

;;;;;;;;;;;;;;;;


All jumps and calls select IR as the IR-specified PC-mux select.
Jump-local forces PC:8-22 to select PC+1.
Call-zero forces PC:0-3,12-22 to zero.
Dispatch-x16 selects Dispatch, and forces PC:0-3 to zero.
Trap forces zero for all bits.

conditional jump timing:
 IR0: compute result that will be used as jump condition
 IR1: select jump condition
 IR2: cond-jump jump-addr
 IR3: new code at jump-addr


MD / VMA boxed bits:

(destinations)

vma-start-read (all cases)
	set MD_BOXED from IR:55, VMA_BOXED from IR:54.
vma-start-write
	set VMA_BOXED from IR:54, preserve MD_BOXED.
md-start-write
	set MD_BOXED from IR:55, VMA_BOXED from IR:54.
md
	set MD_BOXED from IR:55.
vma
	set VMA_BOXED from IR:54.

If a DEST to MD or VMA is aborted because the previous mem cycle was a write
and it hasn't reached the point at which it knows if it will trap or not,
the machine must be frozen (cache-miss style) at or before the DEST cycle
that would clobber MD or VMA, such that if the trap is not taken, the DEST is
still asserted and the instruction can be completed as if nothing happened.

The decision to freeze the machine because a write-in-progress is followed
by a DEST to VMA or MD may be as simple as decoding a write to any functional
destination.  It is required that if the trap is taken, the DEST to VMA or MD
is inhibited, so that the trap routine can read the old values of them.


The timing for writing MD_BOXED and VMA_BOXED can be the same as for
writing MD and VMA.

It is not necessary for MD_BOXED and VMA_BOXED to be writable
indirectly; they can be restored by writing "md" and "vma".
They should be readable through any kind of status register.

Note that the bits cannot be written indirectly with the output
of the P-board boxed-bit mux.

;;;;;;;;;;;;;;;;

example:

MD <- foo
VMA-start-write <- bar
no-op
VMA-start-read-early <- bletch

The -start-write cannot be immediately followed by the -start-read-early
because both be driving the DEST during the same cycle.

Assume that the VMA-start-write causes a page-fault or GC trap.
The VMA-start-write instruction completely normally.  The
VMA-start-read-early must be frozen at or before its DEST cycle,
until it is known whether or not the VMA-start-write will cause a trap.
If it does cause a trap, the VMA-start-read-early must be aborted
(trapped before it's commit point, with the early DEST inhibited)
so that it can be reexecuted normally after the trap is handled.
It is not necessary for the trap-on-write to abort the instruction
that caused it.

;;;;;;;;;;;;;;;;

If an instruction that did VMA-start-read-early is trapped for
any reason (in particular, a case in which the early-DEST write was
NOT inhibited), the memory (local or Nubus) cycle that is started
by the -start-read-early must be aborted before it affects any
nubus device, and also, the MD and MD_BOXED must not be clobbered.