can't deserialize nested field with same name + other fields #76

scottlamb · 2020-05-02T05:22:32Z

This onvif-rs crate was unable to parse the message below, with an error "unknown event EndDocument". Note that crate is using yaserde 137af01.

#[test]
fn security_capabilities_extension() {
    let ser = r#"
        <?xml version="1.0" encoding="UTF-8"?>
        <tt:Extension xmlns:tt="http://www.onvif.org/ver10/schema">
          <tt:TLS1.0>false</tt:TLS1.0>
          <tt:Extension>
            <tt:Dot1X>false</tt:Dot1X>
            <tt:SupportedEAPMethod>0</tt:SupportedEAPMethod>
            <tt:RemoteUserHandling>false</tt:RemoteUserHandling>
          </tt:Extension>
        </tt:Extension>
    "#;
    let des: tt::SecurityCapabilitiesExtension = yaserde::de::from_str(&ser).unwrap();
}

I got it down to a failing yaserde test. It's quite similar to the existing de_same_field_name_sub_sub test. Looks like an onvif-rs owner/contributor @DmitrySamoylov added this as part of #51 talking about a related problem. But it seems to still be broken when there are these other fields present.

#[test]
fn de_same_field_name_but_some_other_fields_or_something() {
  #[derive(Default, PartialEq, Debug, YaDeserialize)]
  pub struct FooOuter {
      pub other: bool,
      pub foo: Option<FooInner>,
  }

  #[derive(Default, PartialEq, Debug, YaDeserialize)]
  pub struct FooInner {
      pub blah: bool,
  }

  convert_and_validate!(
    r#"
    <foo>
      <other>false</other>
      <foo>
        <blah>false</blah>
      </foo>
    </foo>
    "#,
    FooOuter,
    FooOuter {
      other: false,
      foo: Some(FooInner {
        blah: false,
      }),
    }
  );
}

With yaserde at that revision (137af01), I see the same error as in my "real" message.

With yaserde at current master (2726de5), it still doesn't work, but the symptom is different: it goes into a busy-loop during this test.

I haven't diagnosed further than this.

The text was updated successfully, but these errors were encountered:

MarcAntoine-Arnaud · 2020-05-05T07:59:16Z

Hi @scottlamb,

Thank you for the issue.
I have tested a little bit and that's true it not works.

That works:

#[test]
fn de_same_field_name_but_some_other_fields_or_something() {
  #[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
  #[yaserde(rename="foo")]
  pub struct FooOuter {
      pub other: bool,
      #[yaserde(rename="faa")]
      pub foo: Option<FooInner>,
  }

  #[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
  pub struct FooInner {
      pub blah: bool,
  }

  let content = r#"
    <foo>
      <other>false</other>
      <faa>
        <blah>false</blah>
      </faa>
    </foo>
  "#;

  let model = FooOuter {
    other: false,
    foo: Some(FooInner {
      blah: false,
    }),
  };

  serialize_and_validate!(model, content);
  deserialize_and_validate!(content, model, FooOuter);
}

Changing faa with foo will not be able to deserialise.

scottlamb · 2020-05-05T13:50:08Z

Thanks for confirming!

In yaserde_derive/src/de/expand_struct.rs, it seems suspicious to me that it's matching on named_element. Couldn't it recognize its own element by position? Ie, its StartElement should come first, and the name is only for validation / error-checking the caller. The EndElement should come at the same depth as the StartElement, and again the name is only for validation. I don't think its element name in any other position should be treated specially.

scottlamb · 2020-06-20T03:06:43Z

I've been looking into this issue a bit.

First, I wrote something incorrect here:

In yaserde_derive/src/de/expand_struct.rs, it seems suspicious to me that it's matching on named_element.

I was referring to the line below. It's not using named_element; it's shadowing it. That's confusing but not actually problematic.

https://github.com/media-io/yaserde/blob/9a2aec0abee8c6969bedd54ac98e5f6a7ca55414/yaserde_derive/src/de/expand_struct.rs#L345

But this one seems wrong:

https://github.com/media-io/yaserde/blob/9a2aec0abee8c6969bedd54ac98e5f6a7ca55414/yaserde_derive/src/de/expand_struct.rs#L366

What I'd like to do is to (as I mentioned before) not treat the same element name specially. Either it's at the opening depth (and we can call Deserializer::expect_end_element) or it's not (and we shouldn't have any special treatment).

More generally, I'd like to set a more precise contract around YaDeserialize::deserialize about the state of the Deserializer before and afterward. I'd like to say something like this:

The caller guarantees that the next StartElement refers to this data structure (so the callee can examine its attributes). The callee shouldn't make any assumptions about its OwnedName. In practice, I think XML Schema decouples the notion of element name and type so this would be consistent with that.
On success, the callee guarantees that all events relating to this data structure, including the EndElement, have been consumed.
On failure, the callee makes no guarantees about the state of the Deserializer.

but this is a change in a few ways that I think requires a major version bump. Your thoughts? In particular:

it looks like structs and enums don't consume their EndElements today. Changing the contract in this way lets me call Deserializer::expect_end_element from within them, which I think will catch problems sooner and more easily. But unless the versions of yaserde and yaserde_derived are expected to exactly match, this is an incompatible change.
from looking at de_custom in yaserde/tests/deserializer.rs, it looks like it's expected for folks to implement YaDeserialize themselves, and the expectations are...very loose. Not only does it not consume the EndElement, but it doesn't even consume the stuff within it. The calling struct/enum could always consume elements until the depth matches its entry point. But that's extra code bloat (and onvif-rs's schema stuff is huge now). And I think it can conceal bugs. So I'd like to tighten this instead.
I don't see any reasonable way to keep flattened structs working when called like root_flatten_struct and like flatten_attribute today. It either needs to support only one mode (and if attributes are supported, it'd need to be the one where the parent's start and end are included) or it needs to know what mode it's called in (so a new trait method or something). Is the root one an important use case, or could we just drop it?

In any case, I'll send a PR to improve debuggability a bit. Initialize logging in each of the tests as described here (with a new env_logger dev dependency). Add/tweak some log calls to make it more obvious what's going on. These helped me quite a bit in understanding how the code works.

scottlamb · 2020-06-20T04:32:23Z

I don't see any reasonable way to keep flattened structs working when called like root_flatten_struct and like flatten_attribute today. It either needs to support only one mode ... or it needs to know what mode it's called in

My preference would be to not support the root_flatten_struct style, but if it is necessary: it just occurred to me that flattened structs could go into "root mode" iff the first element they see is the a StartDocument rather than a StartElement. It's confusing to me anyway that Deserializer discards the StartDocument but includes the EndDocument. This too would be a contract change though. Would StartDocument be included just when sent to a flattened struct (so the deserializer needs to be set up in a different way for that) or also for structs/enums (and so they need logic to swallow it)?

Anyway, I'll send you some hopefully-uncontroversial cleanup/debugging stuff and await your opinion on contract changes.

I'll likely have more contract questions for you later, but this seems like a good first round to address this immediate problem.

DmitrySamoylov · 2020-06-20T06:02:37Z

@MarcAntoine-Arnaud is there any reason why using pull parser? Is it for performance reasons? I'm just thinking that code could be much cleaner if using xml-tree 🤔 . There are places where you need tree structure but there is not one and you have to preserve some state while iterating over XmlEvents.

* Log the depth of elements as they're fetched * Log the starting depth of structs/enums and their Rust symbol names (not just XML element names, which may differ significantly) * Log every element in the struct/enum match loop at trace level. * Log file/line numbers at a few key points in the tests. This is helpful in finding failures happen in some of the longer tests. This logging helps me understand the data flow as I play with changes for luminvent#76.

MarcAntoine-Arnaud · 2020-06-20T14:13:23Z

Hello @scottlamb and @DmitrySamoylov !

Very interesting discussion in this thread.
I'm quite open to big changes in this library, so feel free to suggest the ideal mode !
It was my first library using proc_macro 2 years ago, and I have already think about restart from scratch with a more stable design.

I have used xml-rs because basically the serde-xml was based on it, and helped me to start the project.

Do you have time next week for a meeting to discuss on all needed points ?

Best
Marc-Antoine

scottlamb · 2020-06-20T15:18:12Z

Hmm, meetings are a little hard for me. I'm working on this for my Moonfire NVR project. It's my personal project, and my employer's copyright release agreement specifies I can't use their resources / time for it. I generally interpret that as not during the standard workday unless I take a vacation day. I'm a little too behind at work for that right now, so a meeting would have to be on the weekend or early morning / late evening Pacific time.

scottlamb · 2020-06-24T03:55:32Z

Would you still like to meet (and I apologize for my inconvenient time requirements) or should we converse over email/issues instead?

I find it useful to have a more extended example that works easily out of the box. In particular, one that: * supports authentication for everything. My cameras require it for almost all operations, so eg get_stream_uri didn't work. * uses real argument parsing so usage is more obvious. * automatically finds the right URIs for the different SOAP services. Also unlike the README examples, my cameras don't let you send everything to the root path. * supports a couple longer operations. They're admittedly a little arbitrary. Not everything works as of this commit, due to luminvent/yaserde#76 and at least one other problem.

scottlamb · 2020-07-04T06:46:36Z

I just started a much more ambitious proposal in #84; I'm eager to hear your thoughts on it.

`Extension` inside `Extension` causes infinite loop at deserialization: luminvent/yaserde#76 #64

MarcAntoine-Arnaud · 2021-05-31T10:24:56Z

Is it possible to check if the #112 fixes it ?

DerNoob24 · 2021-08-02T13:43:47Z

I use the version 0.7.1, which includes this commit, and the problem is still not fixed when using such a example

<?xml version="1.0" encoding="utf-8"?>
<railml xmlns="https://www.railml.org/schemas/2018"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.4">
    <ocp name="ocp_name">
        <xsi:name value="name_value" long="name_long" />
    </ocp>
</railml>

MarcAntoine-Arnaud added the bug Something isn't working label May 5, 2020

scottlamb mentioned this issue Jun 20, 2020

debugging and cleanups #83

Merged

scottlamb mentioned this issue Jun 24, 2020

extended camera example lumeohq/onvif-rs#47

Merged

scottlamb mentioned this issue Jul 4, 2020

RFC: new YaDeserialize interface #84

Open

DmitrySamoylov added a commit to lumeohq/onvif-rs that referenced this issue Mar 11, 2021

Comment out the second layer of extensions

14ba9b3

`Extension` inside `Extension` causes infinite loop at deserialization: luminvent/yaserde#76 #64

scottlamb mentioned this issue Apr 30, 2021

Nested XML with the same element names is not deserialized correctly. #110

Closed

ephraimkunz mentioned this issue May 5, 2021

Fix bug where nested structs with the same element names don't deserialize properly. #112

Merged

seimonw mentioned this issue Oct 6, 2022

GetAudioEncoderConfigurationOptionsResponse isn't decoded lumeohq/onvif-rs#99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't deserialize nested field with same name + other fields #76

can't deserialize nested field with same name + other fields #76

scottlamb commented May 2, 2020

MarcAntoine-Arnaud commented May 5, 2020

scottlamb commented May 5, 2020

scottlamb commented Jun 20, 2020

scottlamb commented Jun 20, 2020

DmitrySamoylov commented Jun 20, 2020

MarcAntoine-Arnaud commented Jun 20, 2020

scottlamb commented Jun 20, 2020

scottlamb commented Jun 24, 2020

scottlamb commented Jul 4, 2020

MarcAntoine-Arnaud commented May 31, 2021

DerNoob24 commented Aug 2, 2021

can't deserialize nested field with same name + other fields #76

can't deserialize nested field with same name + other fields #76

Comments

scottlamb commented May 2, 2020

MarcAntoine-Arnaud commented May 5, 2020

scottlamb commented May 5, 2020

scottlamb commented Jun 20, 2020

scottlamb commented Jun 20, 2020

DmitrySamoylov commented Jun 20, 2020

MarcAntoine-Arnaud commented Jun 20, 2020

scottlamb commented Jun 20, 2020

scottlamb commented Jun 24, 2020

scottlamb commented Jul 4, 2020

MarcAntoine-Arnaud commented May 31, 2021

DerNoob24 commented Aug 2, 2021