[ale] Regex Assistance
Alex Carver
agcarver+ale at acarver.net
Mon May 13 21:20:51 EDT 2019
The alternative if you happen to know how many GS fields there are and
it's a constant number is to just write out the long capture
/RS ([0-9A-Z]+?) GS([0-9A-Z]+? GS ...... RS/
On 2019-05-13 17:41, Alex Carver via Ale wrote:
> It's going to depend on the regex engine.
>
> If you search on the topic of multiple group pattern matching you'll
> find that some engines (like perl) can do it and automatically return
> multiple group references.
>
> qr/(?:(pattern).*?)+/;
>
> Other engines can't do it and require explicit group notation to extract.
>
> So, in theory, with the right engine you could do this with your
> subprocessed string (and possibly without it, too).
>
> qr/(?:([0-9A-Z]+ \x1D).*?)+/;
>
>
> All of this is untested.
>
>
> On 2019-05-13 17:02, Calvin Harrigan via Ale wrote:
>> On 5/13/2019 7:14 PM, Byron Jeff wrote:
>>> sed -e 's//Actual question about regex?/'
>>>
>>> BAJ
>>>
>>> On Mon, May 13, 2019 at 07:01:12PM -0400, Calvin Harrigan via Ale wrote:
>>>> _______________________________________________
>>>> Ale mailing list
>>>> Ale at ale.org
>>>> https://mail.ale.org/mailman/listinfo/ale
>>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>>> http://mail.ale.org/mailman/listinfo
>>
>> I know right? Sorry...
>>
>> The source string seems to be getting sanitized by my email client.
>> There are some special characters in it, so I've improvised. RS =
>> Record Set characters (0x1E), GS = Group Set (0x1D), EOT = End of
>> Transmission (0x04),
>>
>> CR = Carriage return (0x0D), LF = Line Feed (0x0A). There are no
>> whitespace, I've only included them for readability.
>>
>> Some assembly required... I've also attached a file with the correct
>> contents.
>>
>> "[)>" Can be considered a start marker. Everything else I want to
>> capture into separate groups.
>>
>> [)>
RS
>>
>> 06 GS
>>
>>
Y7130700000000Y GS
>>
>>
P84469826 GS
>>
>>
12V654663145
GS
>>
>> T1118360000100078
GS
>>
>> S100078
GS
>>
>> 2D122618
1 GS
>>
>> PCXMG29N04D
RS EOT CR LF
>>
>> So far I've been able to create a group that contains everything between
>> the two RS tags/bytes/chars. After that extraction I can split it on
>> the GS boundaries, but I would like to be able to do it all in one
>> expression.
>>
>> Group set extraction = ^\[\)\>\x1e(.+)\x1e\x04$
>>
>> SubGroup Split =
>>
>> OneExpressionToRuleThemAll =
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> https://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
More information about the Ale
mailing list