segment filter not working in 2.6.2 version #560

saikumare-a · 2023-01-03T15:08:38Z

Background [Optional]

Hi,
we are receiving the multisegment ascii file and we would like to filter the data for a particular segment based on a column.

as per documentation, tried using option("segment_filter").

even after using this filter, observing no filtration of data is happening. can you help on checking on this?

saikumare-a · 2023-01-03T15:53:56Z

Hi @yruslan,

just tested this in 2.6.1 and working fine, but not working in 2.6.2. can you check into this

yruslan · 2023-01-03T18:51:15Z

Hi,
What's your full spark.read code snippet?

saikumare-a · 2023-01-04T03:23:31Z

Hi @yruslan,

below are the options used

final_options = {'copybook': ‘<copybook_path>', 'generate_record_id': 'false', 'drop_value_fillers': 'false', 'drop_group_fillers': 'false', 'pedantic': 'true', 'encoding': 'ascii', 'variable_size_occurs': 'true', 'record_format': 'D', 'segment_field': 'BASE_RCRD_ID', 'segment_filter': 'ABC'}

df=spark.read.format("cobol").options(**final_options).load()

yruslan · 2023-01-04T15:17:24Z

Yeah, I can see why it is happening. You can workaround by filtering your data frame using .filter(col("BASE_RCRD_ID") === "ABC") for now.

saikumare-a · 2023-01-11T11:32:06Z

Hi @yruslan ,

is this issue fixed or any timeline by when this could be fixed?

saikumare-a · 2023-01-20T06:27:02Z

Hi @yruslan,

any luck with looking into this?. Thanks in advance!!

yruslan · 2023-01-20T15:28:30Z

Not yet. Please, use the workaround for now.

yruslan · 2023-02-02T08:29:38Z

This should be fixed in 2.6.3 released yesterday.

saikumare-a added the question Further information is requested label Jan 3, 2023

saikumare-a changed the title ~~segment filter not working in "D" record_format~~ segment filter not working in 2.6.2 version Jan 4, 2023

yruslan added bug Something isn't working accepted Accepted for implementation and removed question Further information is requested labels Jan 4, 2023

yruslan added a commit that referenced this issue Jan 24, 2023

#560 Fix 'segment_filter' for ASCII text files parallelized by Spark.

29761dc

yruslan added a commit that referenced this issue Jan 25, 2023

#560 Fix 'segment_filter' for ASCII text files parallelized by Spark.

9fae505

yruslan closed this as completed Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segment filter not working in 2.6.2 version #560

segment filter not working in 2.6.2 version #560

saikumare-a commented Jan 3, 2023 •

edited

Loading

saikumare-a commented Jan 3, 2023

yruslan commented Jan 3, 2023

saikumare-a commented Jan 4, 2023 •

edited

Loading

yruslan commented Jan 4, 2023

saikumare-a commented Jan 11, 2023

saikumare-a commented Jan 20, 2023

yruslan commented Jan 20, 2023

yruslan commented Feb 2, 2023

segment filter not working in 2.6.2 version #560

segment filter not working in 2.6.2 version #560

Comments

saikumare-a commented Jan 3, 2023 • edited Loading

Background [Optional]

saikumare-a commented Jan 3, 2023

yruslan commented Jan 3, 2023

saikumare-a commented Jan 4, 2023 • edited Loading

yruslan commented Jan 4, 2023

saikumare-a commented Jan 11, 2023

saikumare-a commented Jan 20, 2023

yruslan commented Jan 20, 2023

yruslan commented Feb 2, 2023

saikumare-a commented Jan 3, 2023 •

edited

Loading

saikumare-a commented Jan 4, 2023 •

edited

Loading