decouple¶
It is used to break combinatorial paths on both data and control signals of the DTI interface, without sacrificing the throughput. It adds a latency of one clock cycle to the path, but does not impact the throughput no matter the pattern in which the data is written and read from it. It is transparent to the functionality of the design.

decouple
(din, *, depth=2) → din¶ The
decouple()
gear is basically a FIFO with no combinatorial loops between its input and output. It is used in the following cases:When there is a loop in the data path, where it is used to prevent the combinatorial loops
For pipelining:
Example design features a
rng()
generator, whose output values are led to an incrementer. Both the value generation and the addition are performed in a single clock cycle.rng_vals = drv(t=Uint[4], seq=[6])  rng  flatten rng_vals_incr = rng_vals + 1 rng_vals_incr  check(ref=[1, 2, 3, 4, 5, 6])
In order to reduce the combinatorial path lengths in the design, we might split these two operations in two clock cycles. The
decouple()
gear cuts the combinatorial paths on both data and control interface signals, it does not impact the design throughput, but adds a single clock cycle of latency:rng_vals = drv(t=Uint[4], seq=[6])  rng  flatten rng_vals_incr = (rng_vals  decouple) + 1 rng_vals_incr  check(ref=[1, 2, 3, 4, 5, 6])
For balancing latencies on the datapath branches:
Consider a datapath consisting of two branches whose outputs are later concatenated together. First branch performs some arithmetic operations with registers added for pipelining, and has a latency of two clock cycles. The second branch does nothing to the data and has zero latency. Due to the mismatch in the pipeline depths of the two branches, the resulting throughput is 1 data value per 3 clock cycles.
inp = drv(t=Uint[4], seq=[1, 2, 3]) branch1 = dreg(dreg(inp + 1) * 3) branch2 = inp ccat(branch1, branch2)  check(ref=[(6, 1), (9, 2), (12, 3)])
By introducing a decoupler on the second branch (default
depth
of two is enough here), we have achieved maximum throughput after the initial latency.inp = drv(t=Uint[4], seq=[1, 2, 3]) branch1 = dreg(dreg(inp + 1) * 3) branch2 = inp  decouple ccat(branch1, branch2)  check(ref=[(6, 1), (9, 2), (12, 3)])