Ok so I think I understand the situation now.<br><br>I've run into some similar situations where the filesystem has become read only because it could not get the proper amount of I/O in time (since the Lun was being shared with other servers which were giving it intense I/O)<br>
<br>There is a kernel parameter which will adjust the timeout values for the SCSI disk device (vmware increases the scsi timeout from something like 15ms to 140ms with the install of VMware tools)<br><br><a href="http://communities.vmware.com/thread/257251">http://communities.vmware.com/thread/257251</a><br>
<br><br>So if you often get this problem in your iScsi San setup, you can look at modifying the timeout values for that particular iScsi disk (I know you arn't using ESX in this case)<br><br><br>As for the journal aborting error, I'd have to stick with fsck. FSCK is going to read the journal and replay what hasn't been committed . The aborting error makes me think that there was some wrong data written to the journal then maybe the iScsi disk was unavailable to compelte its I/O request and left an incomplete journal write.<br>
<br>FSCK will prompt you if you would want to ignore/remove this bad half written journal entry.<br><br>As for your other volumes, they have have just timeouted with their iScsi backend device and not had any requested I/O writes during this timeout.<br>
<br>That would make it so that when you remounted the logical volume, ext3 did not see any improper journal write requests and see the filesystem as fine.<br><br><br><div class="gmail_quote">On Tue, Jan 11, 2011 at 2:16 PM, Brian Pitts <span dir="ltr"><<a href="mailto:brian@polibyte.com">brian@polibyte.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">On 01/11/2011 01:29 PM, Drew Wade wrote:<br>
> Jim,<br>
><br>
> You really need to fsck that volume to correct the problem.<br>
><br>
> Since it is in read only mode, you need to umount it or umount -l it if<br>
> it doesn't respond to umount.<br>
><br>
> Then fsck the logical volume. Then once that completes you need to<br>
> remount it. Even if the application is happily running, if it tries to<br>
> retrieve a piece of data on a errored filesystem you risk feeding it bad<br>
> data which it could send downstream (to a DB or another post processing<br>
> application) and run into even more problems.<br>
><br>
> I'd suggest telling the customer that you need to fix the filesystem and<br>
> get some outage time from the app owners, then umount and fsk it. If<br>
> that fails, you'll have to go to single user mode and fsck it (just make<br>
> sure you unmount it even in single user mode before the fsck).<br>
<br>
</div>FYI it was me asking the question, not Jim.<br>
<br>
On three other volumes that suffered the same problem, I already<br>
unmounted and remounted without an fsck. These are ext3 filesystems with<br>
the journal in ordered mode.<br>
<br>
I am not sure if I should be treating this situation (someone unplugging<br>
the connection to the storage array) any differently than I would treat<br>
someone unplugging the power cables from the server itself. In that<br>
case, I would expect the journal to keep filesystem metadata consistent.<br>
There might be corrupt data if a program was writing inside an existing<br>
file (instead of extending), but this isn't something fsck can fix.<br>
<br>
<a href="http://www.ibm.com/developerworks/library/l-fs7.html" target="_blank">http://www.ibm.com/developerworks/library/l-fs7.html</a><br>
<a href="http://www.redhat.com/support/wpapers/redhat/ext3/tuning.html" target="_blank">http://www.redhat.com/support/wpapers/redhat/ext3/tuning.html</a><br>
<div><div></div><div class="h5"><br>
--<br>
All the best,<br>
Brian Pitts<br>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</div></div></blockquote></div><br>