Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't deserialize nested field with same name + other fields #76

Open
scottlamb opened this issue May 2, 2020 · 11 comments
Open

can't deserialize nested field with same name + other fields #76

scottlamb opened this issue May 2, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@scottlamb
Copy link
Contributor

This onvif-rs crate was unable to parse the message below, with an error "unknown event EndDocument". Note that crate is using yaserde 137af01.

#[test]
fn security_capabilities_extension() {
    let ser = r#"
        <?xml version="1.0" encoding="UTF-8"?>
        <tt:Extension xmlns:tt="http://www.onvif.org/ver10/schema">
          <tt:TLS1.0>false</tt:TLS1.0>
          <tt:Extension>
            <tt:Dot1X>false</tt:Dot1X>
            <tt:SupportedEAPMethod>0</tt:SupportedEAPMethod>
            <tt:RemoteUserHandling>false</tt:RemoteUserHandling>
          </tt:Extension>
        </tt:Extension>
    "#;
    let des: tt::SecurityCapabilitiesExtension = yaserde::de::from_str(&ser).unwrap();
}

I got it down to a failing yaserde test. It's quite similar to the existing de_same_field_name_sub_sub test. Looks like an onvif-rs owner/contributor @DmitrySamoylov added this as part of #51 talking about a related problem. But it seems to still be broken when there are these other fields present.

#[test]
fn de_same_field_name_but_some_other_fields_or_something() {
  #[derive(Default, PartialEq, Debug, YaDeserialize)]
  pub struct FooOuter {
      pub other: bool,
      pub foo: Option<FooInner>,
  }

  #[derive(Default, PartialEq, Debug, YaDeserialize)]
  pub struct FooInner {
      pub blah: bool,
  }

  convert_and_validate!(
    r#"
    <foo>
      <other>false</other>
      <foo>
        <blah>false</blah>
      </foo>
    </foo>
    "#,
    FooOuter,
    FooOuter {
      other: false,
      foo: Some(FooInner {
        blah: false,
      }),
    }
  );
}

With yaserde at that revision (137af01), I see the same error as in my "real" message.

With yaserde at current master (2726de5), it still doesn't work, but the symptom is different: it goes into a busy-loop during this test.

I haven't diagnosed further than this.

@MarcAntoine-Arnaud
Copy link
Contributor

Hi @scottlamb,

Thank you for the issue.
I have tested a little bit and that's true it not works.

That works:

#[test]
fn de_same_field_name_but_some_other_fields_or_something() {
  #[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
  #[yaserde(rename="foo")]
  pub struct FooOuter {
      pub other: bool,
      #[yaserde(rename="faa")]
      pub foo: Option<FooInner>,
  }

  #[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
  pub struct FooInner {
      pub blah: bool,
  }

  let content = r#"
    <foo>
      <other>false</other>
      <faa>
        <blah>false</blah>
      </faa>
    </foo>
  "#;

  let model = FooOuter {
    other: false,
    foo: Some(FooInner {
      blah: false,
    }),
  };

  serialize_and_validate!(model, content);
  deserialize_and_validate!(content, model, FooOuter);
}

Changing faa with foo will not be able to deserialise.

@MarcAntoine-Arnaud MarcAntoine-Arnaud added the bug Something isn't working label May 5, 2020
@scottlamb
Copy link
Contributor Author

Thanks for confirming!

In yaserde_derive/src/de/expand_struct.rs, it seems suspicious to me that it's matching on named_element. Couldn't it recognize its own element by position? Ie, its StartElement should come first, and the name is only for validation / error-checking the caller. The EndElement should come at the same depth as the StartElement, and again the name is only for validation. I don't think its element name in any other position should be treated specially.

@scottlamb
Copy link
Contributor Author

I've been looking into this issue a bit.

First, I wrote something incorrect here:

In yaserde_derive/src/de/expand_struct.rs, it seems suspicious to me that it's matching on named_element.

I was referring to the line below. It's not using named_element; it's shadowing it. That's confusing but not actually problematic.

https://github.com/media-io/yaserde/blob/9a2aec0abee8c6969bedd54ac98e5f6a7ca55414/yaserde_derive/src/de/expand_struct.rs#L345

But this one seems wrong:

https://github.com/media-io/yaserde/blob/9a2aec0abee8c6969bedd54ac98e5f6a7ca55414/yaserde_derive/src/de/expand_struct.rs#L366

What I'd like to do is to (as I mentioned before) not treat the same element name specially. Either it's at the opening depth (and we can call Deserializer::expect_end_element) or it's not (and we shouldn't have any special treatment).

More generally, I'd like to set a more precise contract around YaDeserialize::deserialize about the state of the Deserializer before and afterward. I'd like to say something like this:

  • The caller guarantees that the next StartElement refers to this data structure (so the callee can examine its attributes). The callee shouldn't make any assumptions about its OwnedName. In practice, I think XML Schema decouples the notion of element name and type so this would be consistent with that.
  • On success, the callee guarantees that all events relating to this data structure, including the EndElement, have been consumed.
  • On failure, the callee makes no guarantees about the state of the Deserializer.

but this is a change in a few ways that I think requires a major version bump. Your thoughts? In particular:

  • it looks like structs and enums don't consume their EndElements today. Changing the contract in this way lets me call Deserializer::expect_end_element from within them, which I think will catch problems sooner and more easily. But unless the versions of yaserde and yaserde_derived are expected to exactly match, this is an incompatible change.
  • from looking at de_custom in yaserde/tests/deserializer.rs, it looks like it's expected for folks to implement YaDeserialize themselves, and the expectations are...very loose. Not only does it not consume the EndElement, but it doesn't even consume the stuff within it. The calling struct/enum could always consume elements until the depth matches its entry point. But that's extra code bloat (and onvif-rs's schema stuff is huge now). And I think it can conceal bugs. So I'd like to tighten this instead.
  • I don't see any reasonable way to keep flattened structs working when called like root_flatten_struct and like flatten_attribute today. It either needs to support only one mode (and if attributes are supported, it'd need to be the one where the parent's start and end are included) or it needs to know what mode it's called in (so a new trait method or something). Is the root one an important use case, or could we just drop it?

In any case, I'll send a PR to improve debuggability a bit. Initialize logging in each of the tests as described here (with a new env_logger dev dependency). Add/tweak some log calls to make it more obvious what's going on. These helped me quite a bit in understanding how the code works.

@scottlamb
Copy link
Contributor Author

I don't see any reasonable way to keep flattened structs working when called like root_flatten_struct and like flatten_attribute today. It either needs to support only one mode ... or it needs to know what mode it's called in

My preference would be to not support the root_flatten_struct style, but if it is necessary: it just occurred to me that flattened structs could go into "root mode" iff the first element they see is the a StartDocument rather than a StartElement. It's confusing to me anyway that Deserializer discards the StartDocument but includes the EndDocument. This too would be a contract change though. Would StartDocument be included just when sent to a flattened struct (so the deserializer needs to be set up in a different way for that) or also for structs/enums (and so they need logic to swallow it)?

Anyway, I'll send you some hopefully-uncontroversial cleanup/debugging stuff and await your opinion on contract changes.

I'll likely have more contract questions for you later, but this seems like a good first round to address this immediate problem.

@DmitrySamoylov
Copy link
Contributor

@MarcAntoine-Arnaud is there any reason why using pull parser? Is it for performance reasons? I'm just thinking that code could be much cleaner if using xml-tree 🤔 . There are places where you need tree structure but there is not one and you have to preserve some state while iterating over XmlEvents.

scottlamb added a commit to scottlamb/yaserde that referenced this issue Jun 20, 2020
* Log the depth of elements as they're fetched
* Log the starting depth of structs/enums and their Rust symbol names
  (not just XML element names, which may differ significantly)
* Log every element in the struct/enum match loop at trace level.
* Log file/line numbers at a few key points in the tests.
  This is helpful in finding failures happen in some of the longer
  tests.

This logging helps me understand the data flow as I play with changes
for luminvent#76.
scottlamb added a commit to scottlamb/yaserde that referenced this issue Jun 20, 2020
* Log the depth of elements as they're fetched
* Log the starting depth of structs/enums and their Rust symbol names
  (not just XML element names, which may differ significantly)
* Log every element in the struct/enum match loop at trace level.
* Log file/line numbers at a few key points in the tests.
  This is helpful in finding failures happen in some of the longer
  tests.

This logging helps me understand the data flow as I play with changes
for luminvent#76.
@MarcAntoine-Arnaud
Copy link
Contributor

Hello @scottlamb and @DmitrySamoylov !

Very interesting discussion in this thread.
I'm quite open to big changes in this library, so feel free to suggest the ideal mode !
It was my first library using proc_macro 2 years ago, and I have already think about restart from scratch with a more stable design.

I have used xml-rs because basically the serde-xml was based on it, and helped me to start the project.

Do you have time next week for a meeting to discuss on all needed points ?

Best
Marc-Antoine

@scottlamb
Copy link
Contributor Author

Hmm, meetings are a little hard for me. I'm working on this for my Moonfire NVR project. It's my personal project, and my employer's copyright release agreement specifies I can't use their resources / time for it. I generally interpret that as not during the standard workday unless I take a vacation day. I'm a little too behind at work for that right now, so a meeting would have to be on the weekend or early morning / late evening Pacific time.

@scottlamb
Copy link
Contributor Author

Would you still like to meet (and I apologize for my inconvenient time requirements) or should we converse over email/issues instead?

scottlamb added a commit to scottlamb/onvif-rs that referenced this issue Jun 24, 2020
I find it useful to have a more extended example that works easily
out of the box. In particular, one that:

* supports authentication for everything. My cameras require it for
  almost all operations, so eg get_stream_uri didn't work.
* uses real argument parsing so usage is more obvious.
* automatically finds the right URIs for the different SOAP services.
  Also unlike the README examples, my cameras don't let you send
  everything to the root path.
* supports a couple longer operations. They're admittedly a little
  arbitrary.

Not everything works as of this commit, due to luminvent/yaserde#76
and at least one other problem.
scottlamb added a commit to scottlamb/onvif-rs that referenced this issue Jun 25, 2020
I find it useful to have a more extended example that works easily
out of the box. In particular, one that:

* supports authentication for everything. My cameras require it for
  almost all operations, so eg get_stream_uri didn't work.
* uses real argument parsing so usage is more obvious.
* automatically finds the right URIs for the different SOAP services.
  Also unlike the README examples, my cameras don't let you send
  everything to the root path.
* supports a couple longer operations. They're admittedly a little
  arbitrary.

Not everything works as of this commit, due to luminvent/yaserde#76
and at least one other problem.
@scottlamb
Copy link
Contributor Author

I just started a much more ambitious proposal in #84; I'm eager to hear your thoughts on it.

@MarcAntoine-Arnaud
Copy link
Contributor

Is it possible to check if the #112 fixes it ?

@DerNoob24
Copy link

I use the version 0.7.1, which includes this commit, and the problem is still not fixed when using such a example

<?xml version="1.0" encoding="utf-8"?>
<railml xmlns="https://www.railml.org/schemas/2018"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.4">
    <ocp name="ocp_name">
        <xsi:name value="name_value" long="name_long" />
    </ocp>
</railml>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants