[ale] Regex Assistance

Alex Carver agcarver+ale at acarver.net
Mon May 13 21:20:51 EDT 2019


The alternative if you happen to know how many GS fields there are and
it's a constant number is to just write out the long capture

/RS ([0-9A-Z]+?) GS([0-9A-Z]+? GS ...... RS/

On 2019-05-13 17:41, Alex Carver via Ale wrote:
> It's going to depend on the regex engine.
> 
> If you search on the topic of multiple group pattern matching you'll
> find that some engines (like perl) can do it and automatically return
> multiple group references.
> 
> qr/(?:(pattern).*?)+/;
> 
> Other engines can't do it and require explicit group notation to extract.
> 
> So, in theory, with the right engine you could do this with your
> subprocessed string (and possibly without it, too).
> 
> qr/(?:([0-9A-Z]+ \x1D).*?)+/;
> 
> 
> All of this is untested.
> 
> 
> On 2019-05-13 17:02, Calvin Harrigan via Ale wrote:
>> On 5/13/2019 7:14 PM, Byron Jeff wrote:
>>> sed -e 's//Actual question about regex?/'
>>>
>>> BAJ
>>>
>>> On Mon, May 13, 2019 at 07:01:12PM -0400, Calvin Harrigan via Ale wrote:
>>>> _______________________________________________
>>>> Ale mailing list
>>>> Ale at ale.org
>>>> https://mail.ale.org/mailman/listinfo/ale
>>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>>> http://mail.ale.org/mailman/listinfo
>>
>> I know right?  Sorry...
>>
>> The source string seems to be getting sanitized by my email client.
>> There are some special characters in it, so I've improvised.  RS =
>> Record Set characters (0x1E), GS = Group Set (0x1D), EOT = End of
>> Transmission (0x04),
>>
>> CR = Carriage return (0x0D), LF = Line Feed (0x0A). There are no
>> whitespace, I've only included them for readability.
>>
>> Some assembly required...  I've also attached a file with the correct
>> contents.
>>
>> "[)>" Can be considered a start marker. Everything else I want to 
>> capture into separate groups.
>>
>> [)>
 RS
>>
>> 06 GS
>>
>> 
Y7130700000000Y GS
>>
>> 
P84469826 GS
>>
>> 
12V654663145
 GS
>>
>> T1118360000100078
 GS
>>
>> S100078
 GS
>>
>> 2D122618
1 GS
>>
>> PCXMG29N04D
 RS EOT CR LF
>>
>> So far I've been able to create a group that contains everything between
>> the two RS tags/bytes/chars.  After that extraction I can split it on
>> the GS boundaries, but I would like to be able to do it all in one
>> expression.
>>
>> Group set extraction = ^\[\)\>\x1e(.+)\x1e\x04$
>>
>> SubGroup Split =
>>
>> OneExpressionToRuleThemAll =
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> https://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
> 



More information about the Ale mailing list