Non-Von 1

nonvon_box_small_comp

I’ve always wanted my own supercomputer. Let’s be honest, what self-respecting geek doesn’t? Unfortunately, I’m usually poor, and I live in a space that’s ~300 ft^2 (that I share with someone else), so actually owning anything considered a supercomputer is out of the question. Fortunately, “Supercomputers” from the 1980’s weren’t actually all that complicated, and cheap FPGA boards have gotten pretty good. And thus, I give you the Non-Von1.

What is the Non-Von 1?

For those out there that love both retro computing and weird computer architectures, this one is for you. The “Non-Von” was a “Non-Von Neumann” computer that came out of Columbia University in the early 1980’s. Most computers are considered “Von Neumann” computers, and consist of a unified memory, holding instructions and data, that a computer repeatedly fetches and processes from. The Non-Von works like a content-addressable memory, with lots of very simple processors, each having their own local memory. It is a single-instruction/multiple-data (SIMD) machine, with instructions being simultaneously broadcast to all of the processing elements (PEs) in the machine. The PEs are arranged in a binary-tree structure, with each PE connecting to a parent and two child nodes. The top of the tree connects to a typical computer that executes instructions on the Non-Von cluster.

nv1-layout1

The Processing Element

Each PE is a very simple 8-bit microprocessor. It has 64 bytes of RAM, 8 8-bit registers, 8 1-bit registers, an 8-bit comparator and a bit-serial ALU. It also has 3 network connections, one going upstream to a “parent” node, and left and right downstream links to “child” nodes.nonvon-pe

My Non-Von 1

After reading a paper about this wonderfully weird machine, I decided that the architecture was simple enough that I could probably actually build one myself. I also needed an excuse to practice Verilog, and just sitting down and programming is the best way to do that. And I wanted to be able to say that I owned my own 1980’s supercomputer =)

The final implementation is a 31-node Non-Von1 in a Spartan3E-1200 FPGA board (a Digilent Nexys2). It uses a 19.2 kbps serial link and a simple FSM to interface with a computer. It implements the full instruction set of the original Non-Von1, although I guessed on the actual binary encoding of instructions, so it is likely not binary-compatible with the original. It also leaves out some of the reconfigurable-switch features that allowed PE’s to communicate directly with their “neighbors,” rather than through a common parent node. If I get around to it, I’ll add this in later.

nexys2_400

The Nexys2: The FPGA board used for this

How do you talk to it?

As mentioned earlier, the 31-node cluster can be communicated with via a 19.2 kbps serial link. I wanted a way to actually program it, without resorting to typing in lots of 1’s and 0’s, so I also wrote a python library for it. This allows me to just type in the actual instructions, and python translates this into binary and handles the communication. This was invaluable for debugging because it not only allows me to easily write code for it, but I can interact with the machine in real-time from a command line.

What are you going to do with it?

Beats me. It was originally designed to be a super-fast database machine, with each “record” having its own processor and with final implementations using up to 1-million nodes (man, I wish I had a bigger FPGA!). Mine only has 31 nodes, so I could use it as a way to store roughly half of the phonebook in my cell phone. I think I could also code up a pretty sweet game of Asteroids for it, but I haven’t played around with it much. At the moment, all I have is a bit of test code I wrote to verify that everything worked more or less as it should. I can read/write to all 31 nodes, and the ALU and RAM are all working properly. I also really want to build a nice looking case for it and set it out on my desk to look pretty.

How about programming it?

Here is the instruction set.

  • ENABLE – Enable all of the PE’s
  • RECV8 [LC,RC,P] – A8 <= IO8 from LC, RC or P
  • RECV1 [LC,RC,P] – A1 <= IO1 from LC, RC or P
  • LOADA8 – A8 <= [other reg]
  • LOADB8 – B8 <= [other reg]
  • LOADA1 – A1 <= [other reg]
  • LOADB1 – B1 <= [other reg]
  • STOREA8 – [other reg] <= A8
  • STOREB8 – [other reg] <= B8
  • STOREA1 – [other reg] <= A1
  • STOREB1 – [other reg] <= B1
  • READRAM – A8 <= RAM[MAR[5:0]]
  • WRITERAM – RAM[MAR[5:0]] <= A8
  • ADD1 – {C1,A1} <= A1 + B1 + C1
  • SUB1 – {C1, A1} <= A1 – B1 – C1
  • ROTRA – Rotate-right A8
  • ROTLA – Rotate-left A8
  • ROTRB – Rotate-right B8
  • ROTLB – Rotate-left B8
  • Logic1 – class of logic instructions (occupies 1 5-bit opcode)
    • CLEAR
    • SET
    • NEGATE
    • AND
    • OR
    • XOR
    • EQUAL
    • NAND
  • LOGIC2 – class of logic instructions (occupies 1 5-bit opcode)
    • NOR
  • BROADCAST8 – A8 <= incoming byte from master computer
  • BROADCAST1 – A1 <= incoming bit from master computer
  • REPORT8 – Master computer receives A8 of only enabled PE
  • REPORT1 – Master computer receives A1 of only enabled PE
  • SEND8 – [LC,RC] IO8 <= A8 of Parent node
  • SEND1 – [LC,RC] IO1 <= A1 of Parent node
  • COMPARE – A1 <= (A8==B8), C1 <= (A8 > B8)
  • RESOLVE – This is one of the most important instructions. If the PE’s are numbered like a binary tree, for all PEs are still enabled AND have A1==1’b1, the lowest-numbered PE in that subset will remain with A1==1’b1, and all higher-numbered PEs will have A1<= 1’b0. This is how you can make sure that only a single PE is still enabled when you issue a “REPORT” instruction.

What if I want to build my own?

Yay open-source! The code isn’t exactly polished, but in the interest of promoting weird retro computer architectures, I’ve provided the python library I wrote for it and the verilog code for the Processing Elements. Wire together as many as you’d like! Use it to catalog all of your WhiteSnake and Duran Duran tapes!

Python Libary

nv.py

Non-Von 1 Processing Element

nonvontop.v