Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the Cobrix handle the Easytrieve layout.? #516

Closed
AnveshAeturi opened this issue Sep 9, 2022 · 20 comments · Fixed by #546
Closed

Does the Cobrix handle the Easytrieve layout.? #516

AnveshAeturi opened this issue Sep 9, 2022 · 20 comments · Fixed by #546
Labels
accepted Accepted for implementation enhancement New feature or request

Comments

@AnveshAeturi
Copy link

Background [Optional]

I am having the Easytrive layout which is having the Packed unsigned fields (data-type U in Easytrieve), binary unsigned fields (data-type B in Easytrieve) and Alpha-numeric fields (data-type A in Easytrieve and storing Hexbit). The Data file that we are trying to convert is EBCDIC data.

Question

Is there a way we can convert this data thru Cobrix by providing the above mentioned Easytrieve layout? @yruslan

@AnveshAeturi AnveshAeturi added the question Further information is requested label Sep 9, 2022
@AnveshAeturi AnveshAeturi changed the title Does the Cobrix handle the Binary unsigned fields Does the Cobrix handle the Easytrieve layout.? Sep 9, 2022
@yruslan
Copy link
Collaborator

yruslan commented Sep 12, 2022

Hi, could you attach an example copybook and a link to the documentation for the data type, please?

@AnveshAeturi
Copy link
Author

The cobol copybook says X (2) but however the data itself is coming from an Easytrieve with a data type of U (Packed Unsigned).
Example is VARIABLE PIC X (2). The data stored is actually an unsigned packed field (definition of U in Easytrieve)

Data Type Link: https://www.mvsforums.com/manuals/EZT_PL_APP_63_MASTER.pdf
Page#35 - Library 2-11 is the footer on the page

@AnveshAeturi
Copy link
Author

Easyterieve_Layout_sample.xlsx

Hi @yruslan , This is the Excel which we created from the Easytrieve layout. only sample fields are added here.

@yruslan
Copy link
Collaborator

yruslan commented Sep 22, 2022

I see. The data types look parsable at first glance. The only thing you need a proper copybook that matches the data in order to parse records like that. And for that you would need a mapping between Easyretrieve data types and Cobol data types.
For instance an Easyretrieve type U with length 4 can have a PIC 9(4) (or PIC 9(9) if the encoding is binary)

Do I understand it correctly that the fields specified in the Excel file are not all fields of the record? Field 'CRSCON' with length 1 at offset 10 is followed by CRADTR at offset 20. It means there are other fields between CRSCON and CRADTR that fill the rest 9 bytes.

@mike-childs
Copy link

Hello. I am adding a comment because I also need to request this same support for Unsigned Packed fields in the mainframe records.
Here is what is meant by "Unsigned Packed" :
An Easytrieve U (Unsigned Packed) field is the same as a normal Packed field, but without the sign-nibble on the end.

For example, let's say we have an account date value of '20220425'.
As a Packed number, that field would be defined in COBOL like this:
ACCT-DATE PIC 9(8) COMP-3.
. . .and in memory, that field would contain this:
X'020220425F'

As a U (Unsigned Packed) number, that field would be defined in COBOL like this:
ACCT-DATE PIC X(4).
. . .and in memory, that field would contain this:
X'20220425'

Unsigned Packed (U) fields must be defined in COBOL as PIC X fields because COBOL does not support the Unsigned Packed format.
It is invalid data to COBOL.
Therefore, when COBOL programmers encounter Unsigned Packed fields in their data, they have to write special code to convert it to a normal Packed value by inserting the sign nibble at the end, then processing it as a Packed field.

The Unsigned Packed field cannot be declared as a COBOL BINARY (COMP) field because it does not contain a binary value. It contains a Packed value without the sign nibble.
If you took our example data above and defined it as Binary in COBOL . . . 'PIC 9(8) COMP', the X'20220425' value is now treated as a Binary value, which is 539,100,197.

Adding support for Unsigned Packed fields would be pretty simple in Cobrix. You could add a "Unsigned Packed" flag to the 'decodeBCDIntegralNumber' function that handles Packed (COMP-3) values, and just leave out the sign nibble if it's the Unsigned Packed format.
You could add a Cobrix special parm, like COMP-UP, (similar to what you did for COMP-9).
Then, users could code this in their COBOL copybook for the Unsigned Packed field:
ACCT-DATE PIC X(4) COMP-UP.

Please let me know if you'd like to chat more about this. Thank you very much.

@yruslan
Copy link
Collaborator

yruslan commented Dec 8, 2022

Hi @mike-childs,

Makes sense. I might ask a couple of more questions as we go.

The first one,

When you have ACCT-DATE PIC X(4). in unsigned packed format, does this mean that the maximum number of digits of the packed number is 4, or it means the field occupies 4 bytes, so it can have 8 digits?

@yruslan
Copy link
Collaborator

yruslan commented Dec 8, 2022

I see the answer to the question in your description. Sorry.
I think adding a special USAGE like COMP-UP would indeed be the best way to do.
Or maybe COMP-3U (since is is like COMP-3, just without the sign nibble).

@yruslan yruslan added enhancement New feature or request accepted Accepted for implementation and removed question Further information is requested labels Dec 8, 2022
@mike-childs
Copy link

Hi @yruslan,
Yes, COMP-3U would also be excellent. And yes, the 'X(4)' length refers to 4-bytes in memory (8 digits). And please do feel free to ask questions. I have experience with this topic.
Thank you very much for accepting this request. It will be extremely helpful for us, (and others).

@yruslan
Copy link
Collaborator

yruslan commented Dec 8, 2022

Great, thanks for the answer and for such a detailed description!

Will implement it soon.

@yruslan
Copy link
Collaborator

yruslan commented Dec 8, 2022

One more question. Would it be okay if PIC required for packed numerics to be

PIC 9(4) COMP-3U.

not

PIC X(4) COMP-3U.

?
This is because the parser relies heavily on numeric data types usage of '9' in PIC.

@mike-childs
Copy link

Yes, requiring the '9' (as in 'PIC 9(4) COMP-3U') makes perfect sense because the field should contain only numeric data. The field would have all the same rules as a normal Packed field, other than the lack of a sign nibble.
Thanks you.

yruslan added a commit that referenced this issue Dec 8, 2022
This parses 'unsigned packed' format, that is BCD without the sign nibble.
yruslan added a commit that referenced this issue Dec 15, 2022
This parses 'unsigned packed' format, that is BCD without the sign nibble.
@yruslan
Copy link
Collaborator

yruslan commented Dec 15, 2022

This is added. You can try building spark-cobol from master. Let me know if it works as expected.

@mike-childs
Copy link

Thank you very much @yruslan! We have a story in our backlog to pull in the latest Cobrix version and do thorough testing with the new COMP-3U type parm. I will add an update here once we have done that work. We really appreciate you adding this functionality.

@mike-childs
Copy link

Hello @yruslan. We have finished our testing with the new COMP-3U parm, and it correctly converted the Unsigned-Packed fields. I have attached a screen shot showing my input and output and test results. Please let me nkow if you need any further information. Thank you very much.

CobrixTestResults

@yruslan
Copy link
Collaborator

yruslan commented Jan 5, 2023

Hi @mike-childs , Thanks a lot for confirming! Glad it works as expected.

@diddyp20
Copy link

@yruslan I am getting the below error when updating the copybook to COMP-3U.

za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 28: Invalid input 'COMP-3U' at position 28:64

@yruslan
Copy link
Collaborator

yruslan commented Mar 22, 2023

Use spark-cobol 2.6.4.
If you are already using the latest Cobrix, let me know how your copybook statement looks like for that field.

@diddyp20
Copy link

diddyp20 commented Mar 22, 2023

@yruslan I have upgraded spark-cobol 2.6.4 and getting this error:

java.lang.NoClassDefFoundError: scala/$less$colon$less

here is the command:

class_poc_df = spark.read.format("cobol")
.option("copybook",class_copybook)
.option("record_format", "D")
.option("schema_retention_policy", "collapse_root")
.option("drop_value_fillers", "false")
.load(class_data)

@yruslan
Copy link
Collaborator

yruslan commented Mar 22, 2023

The error suggests that you are using spark-cobol build for a different Scala version from your Spark environment.

Use the artifact that matches your Scala version:

  • spark-cobol_2.11
  • spark-cobol_2.12
  • spark-cobol_2.13

or build the one that matches your environment exactly using 'sbt assembly' (the full command is in README)

@AnveshAeturi
Copy link
Author

Hi @diddyp20 I have faced the similar error copybook at line 28: Invalid input 'COMP-3U' at position 28:64 in the past wrt to my copybooks. check the field alignment in the copybook, It should be aligned properly wrt to data inside the file. Hopefully that should solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants