Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Phi nodes #58

Open
cernec1999 opened this issue Feb 1, 2025 · 5 comments
Open

feat: Phi nodes #58

cernec1999 opened this issue Feb 1, 2025 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@cernec1999
Copy link
Collaborator

Introduce phi nodes using the following algorithm:

  • Traverse the predecessor blocks when processing a new block (and we should be traversing using DfsPostOrder, so any BasicBlock that is a parent of a child block should already be traversed)
  • If the predecessor block has any AstNode left on the stack, introduce phi nodes, with their arguments as the predecessor RegionId as merge variabes.
  • If there is only one predecessor for the phi node we created, we can possibly simply resolve the phi node immediately, although this might not be robust for constructs like loops.

We may also have to consider weird edge cases like the following:

temp.arr = {1, temp.foo == 2 ? temp.bar : temp.baz, 3};

Sample algorithm:

            for pred in self.function.get_predecessors(block_id).map_err(|e| {
                FunctionDecompilerError::FunctionError {
                    source: e,
                    backtrace: Backtrace::capture(),
                    context: ctx.get_error_context(),
                }
            })? {
                let exec = ctx
                    .block_ast_node_stack
                    .get(&pred)
                    .expect("Critical error: stack should always be set for each basic block");

                // iterate the stack and introduce phi nodes
                for frame in exec.iter() {
                    match frame {
                        ExecutionFrame::StandaloneNode(node) => {
                            let current_region_id = self
                                .block_to_region
                                .get(&block_id)
                                .expect("[Bug] The region should exist.");

                            // For now, print debug
                            println!(
                                "Introducing Phi node for block {} to block {}. The value resolves to: {}",
                                pred,
                                block_id,
                                emit(node.clone())
                            );
                        }
                        _ => {
                            return Err(FunctionDecompilerError::Other {
                                message: "Expected StandaloneNode".to_string(),
                                context: ctx.get_error_context(),
                                backtrace: Backtrace::capture(),
                            });
                        }
                    }
                }
            }
@cernec1999 cernec1999 self-assigned this Feb 1, 2025
@cernec1999 cernec1999 added the enhancement New feature or request label Feb 1, 2025
@cernec1999 cernec1999 mentioned this issue Feb 2, 2025
3 tasks
@cernec1999
Copy link
Collaborator Author

For a region with more than 2 predecessors, it might be worthwhile to change the ControlFlowType logic to be something different. We might see this scenario in switch statements.

@cernec1999
Copy link
Collaborator Author

The output is a little bit wonky here:

    // RegionId(7)
    if (<temp.ciphertype>#45 == "RC4") 
    {
            }
    // RegionId(6)
    else
    {
        // RegionId(9)
        if (<temp.ciphertype>#45 == "AES") 
        {
                    }
        // RegionId(8)
        else
        {
                    }
    }
    phi<idx=0, regions=(RegionId(9), RegionId(8), RegionId(6))> = phi<idx=1, regions=(RegionId(9), RegionId(8), RegionId(6))>;

Specifically, when we have "empty" if/else blocks (not actually empty, because AstNode still exists on the stacks in that BasicBlock) the formatting will mess up.

  • Maybe we can introduce comments in those blocks with the remaining AST nodes in that region?
  • Does this even matter? It could just be helpful for debugging purposes.

@cernec1999
Copy link
Collaborator Author

Sample output for short-circuit.gs2bc:

function onCreated()
{
    <temp.thingOne>#8 = true;
    <temp.thingTwo>#9 = true;
    <temp.thingThree>#10 = false;
    // RegionId(1)
    if (<temp.thingOne>#7)
    {
    }
    phi<idx=1, regions=(RegionId(2))> = phi<idx=2, regions=(RegionId(2))>;
    // RegionId(4)
    if (<temp.thingOne>#7)
    {
        // RegionId(5)
        if (<temp.thingTwo>#4)
        {
        }
    }
    phi<idx=3, regions=(RegionId(6))> = phi<idx=4, regions=(RegionId(6))>;
    // RegionId(8)
    if (<temp.thingOne>#7)
    {
    }
    phi<idx=4, regions=(RegionId(9))> = phi<idx=5, regions=(RegionId(9))>;
    // RegionId(11)
    if (<temp.thingOne>#7)
    {
        // RegionId(12)
        if (<temp.thingTwo>#4)
        {
        }
    }
    phi<idx=6, regions=(RegionId(13))> = phi<idx=7, regions=(RegionId(13))>;
    // RegionId(15)
    if (<temp.thingOne>#7)
    {
    }
    // RegionId(17)
    if (phi<idx=8, regions=(RegionId(15))>)
    {
    }
    phi<idx=8, regions=(RegionId(18))> = phi<idx=9, regions=(RegionId(18))>;
    // RegionId(20)
    if (<temp.thingOne>#7)
    {
    }
    // RegionId(22)
    if (phi<idx=10, regions=(RegionId(20))>)
    {
        // RegionId(23)
        if (<temp.thingThree>#2)
        {
        }
    }
    phi<idx=11, regions=(RegionId(24))> = phi<idx=12, regions=(RegionId(24))>;
    // RegionId(26)
    if (<temp.thingOne>#7)
    {
    }
    // RegionId(28)
    if (phi<idx=13, regions=(RegionId(26))>)
    {
        // RegionId(29)
        if (<temp.thingThree>#2)
        {
            // RegionId(30)
            if (<temp.thingOne>#7)
            {
                fn_call#32 = returnAndShortCircuit(true, false);
            }
        }
    }
    phi<idx=15, regions=(RegionId(31))> = phi<idx=16, regions=(RegionId(31))>;
    fn_call#34 = echo("simpleAndTwo: " @ <temp.simpleAndTwo>#12);
    fn_call#35 = echo("simpleAndThree: " @ <temp.simpleAndThree>#15);
    fn_call#36 = echo("simpleOrTwo: " @ <temp.simpleOrTwo>#18);
    fn_call#37 = echo("simpleOrThree: " @ <temp.simpleOrThree>#21);
    fn_call#38 = echo("complex: " @ <temp.complex>#24);
    fn_call#39 = echo("complexTwo: " @ <temp.complexTwo>#27);
    return 0x0;
}

Some thoughts:

  • For any unprocessed nodes in each BasicBlock, perhaps we can move them over to Region. Then, perform phi injection at a later step so that regions are aware of any unprocessed nodes.
  • Instead of storing RegionId in the phi nodes, we should instead store the AST node in the phi node so that later passes can have an easier time with phi node elimination.

@cernec1999
Copy link
Collaborator Author

When we iterate the predecessor blocks, we shouldn't rely on StandaloneNode to be the only possible frame type left on the stack. Current code:

for (i, frame) in exec.iter().rev().enumerate() {
    match frame {
        ExecutionFrame::StandaloneNode(n) => {
            // Ensure we have a slot in predecessor_regions for this index.
            if predecessor_regions.len() <= i {
                // If not, extend the vector so that index i is available.
                predecessor_regions.resize(i + 1, Vec::new());
            }
            // Record the region info (e.g., pred.1 and pred.2) for this phi candidate.
            predecessor_regions[i].push((pred.1, pred.2, n.clone()));
        }
        // TODO: Bug. BuildingArray is another possible frame type.
        _ => {
            return Err(FunctionDecompilerError::Other {
                message: "Expected StandaloneNode".to_string(),
                context: ctx.get_error_context(),
                backtrace: Backtrace::capture(),
            });
        }
    }
}

Therefore, it may be wise to store the ExecutionFrame in the PhiNode instead of the AstNode, since we may be building an array when we introduce the Phi.

@cernec1999
Copy link
Collaborator Author

Consider the following enhanced for loop:

function enhancedForLoop()
{
    lit#8 = "baz";
    lit#9 = "bar";
    lit#10 = "foo";
    <temp.arr>#11 = {lit#10, lit#9, lit#8};
    lit#12 = 0x0;
    <temp.i>#13 = lit#12;
    lit#14 = 0x0;
    // RegionId(2)
    if (<temp.elem>#2 : <temp.arr>#5)
    {
        fn_call#15 = echo#3(<temp.elem>#2);
        <temp.i>#16++;
        lit#17 = lit#14 + 0x1;
        goto RegionId(1);
    }
    // RegionId(3)
    else
    {
        lit#18 = "done enhanced for";
        fn_call#19 = echo#3(lit#18);
        lit#20 = "num elems: ";
        fn_call#21 = echo#3(lit#20 @ <temp.i>#7);
        lit#22 = 0x0;
        return lit#22;
    }
}

If we want to conform to SSA, we need to introduce a phi node within the for loop to merge lit#17 and lit#14 at each iteration. We should figure out how and where to exactly insert the phi node.

Instead, it should look something like this:

lit#14 = 0; // initial value outside the loop
loop_header:
    lit#23 = φ(lit#14, lit#17) // lit#23 is the "current" value, merging the initial value and the back edge
    // ... use lit#23 in the loop ...
    lit#17 = lit#23 + 1; // new value computed in the loop body
    goto loop_header;

The above case is indicative that we should re-think our phi node resolution strategy. Here's a proposal for a new strategy:

  • Do not introduce phi nodes while resolving BasicBlocks. Instead, allow the program to freely analyze the AST and whenever it encounters a value that doesn't exist in its current block stack, insert an UnresolvedPhi AST Node.
  • A later pass will analyze the UnresolvedPhi nodes in the AST, and turn them into regular phi instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant