The question would be whether this approach still works when it is scaled to thousands or even millions of qubits. The team is optimistic that that is the case, but we will see.
In some quantum error correcting codes, there is a large set of operators that, when there are currently no errors, measuring these will not change the state (well, assuming the measurement is made without error), but would result in some information about the kind of error if there is an error, and this info can be used to choose what operations to take to correct the error.
For a number of such schemes, there’s a choice of a strategy of what schedule to check which of the measurements with, and how to correct the errors.
Disclaimer: am one of the authors, but not a main contributor. I wrote the simulator they used and made some useful suggestions on how to use it to extract information they wanted for training the models more efficiently, but know nothing of transformers.
In a quantum computer, your logical quantum state is encoded in lots of physical qubits (called data qubits) in some special way. The errors that occur on these qubits are indeed arbitrary, and for enough physical qubits are indeed not practically classically simulatable.
To tackle these errors, we do "syndrome measurement" i.e. interact the data qubits with another set of physical qubits (called syndrome qubits), in a special way, and then measure the syndrome qubits. The quantum magic that happens is that the arbitrary errors get projected down to a countable and finite set of classical errors on the data and syndrome qubits!!! Without this magic result we would have no hope for quantum computers.
Anyway, this is where a decoder - a classical algorithm running on a classical computer - comes in. OP is a decoder. It takes the syndrome qubit measurements and tries to figure out what classical errors occurred and what sort of correction, if any, is needed on the data qubits.
>One error-correction round in the surface code. The X and Z stabilizer information updates the decoder’s internal state, encoded by a vector for each stabilizer. The internal state is then modified by multiple layers of a syndrome transformer neural network containing attention and convolutions.
I can't seem to find a detailed description of the architecture beyond this bit in the paper and the figure it references. Gone are the days when Google handed out ML methodologies like candy... (note: not criticizing them for being protective of their IP, just pointing out how much things have changed since 2017)
These measurements are classical data, and a computation is required in order to infer the most likely error that led to the measured syndrome. This process is known as decoding.
This work is a model that acts as a decoding algorithm for a very common quantum code -- the surface code. The surface code is somewhat like the quantum analog of a repetition code in a sense.
The Steane code is the simplest triangular color code. i.e. you can arrange all the qubits on a 2D triangular lattice, and only do nearest neighbor interactions [1]. The surface code is a similar quantum code, in which the qubits can also be placed on a 2D lattice, except that lattice is made up of squares.
Why do we care about 2D surfaces and nearest neighbor interactions. Because it makes building quantum hardware easier.
EDIT:
[1] The Steane code's picture is shown here. https://errorcorrectionzoo.org/c/steane Seven data qubits are on the vertices of the triangles. 2 syndrome qubits on each of the faces.
Instead, think of it more like a completely different set of operations than classical computers that, if you were to try and replicate/simulate them using a classical computer, you would have no choice but to try all possible combinations in order to do so. Even that is oversimplifying, but I find it at least doesn't hint at "like computers, but faster", and is as close as making the parallelism pov "correct" as you're going to get.
What these operations do is pretty exotic and doesn't really map onto any straightforward classical computing primitives, which puts a pretty harsh limit of what you can ask them to do. If you are clever enough, you can mix and match them in order to do some useful stuff really quickly, much faster than you ever could with classical computers. But that only goes for the stuff you can make them do in the first place.
That's pretty much the extent I believe someone can "understand" quantum computing without delving into the actual math of it.
"If you take nothing else from this blog: quantum computers won't solve hard problems instantly by just trying all solutions in parallel." - Scott Aaronson
This short comic he helped author actually summarizes the core idea fairly well https://www.smbc-comics.com/comic/the-talk-3
A large violin provides little answers.
This video is the simplest explanation that I have found for Quantum Computing which doesn't do the whole pop-sciency "is both zero and one at the same time" nonsense.
For a lot of technology, most really, the best way to study how to improve it is to make the best thing you know how to and then work on trying to make it better. That's what's been done with all the current quantum computing attempts. Pretty much all of the industry labs with general purpose quantum computers can in fact run programs on them, they just haven't reached the point where they're running programs that are useful beyond proving out and testing the system.
Quantum computing may or may not get industrial results in the next N years, but those folks do theory, they often if not usually (in)validate it by experiment: it’s science.
So it remains for you to show that AI.ε ~= QC.ε since JvN proved the case for a system made of similar parts, that is vacuum tubes, with the same error probability.
(p.s. thanks for the link)
Has he remarked on it and my search-fu failed?
Quantum computer parts list:
- Everything you need
- A bunch of GPUs