Sunday, March 11, 2012

Error 605

I'm currently helping in error-checking problematic SQL-server 2000
installation.
They've started to get this error in the log usually when the reindex-job is
running on the database but sometimes at other times too:
DBCC CHECKDB (ProductionDB) executed by DOMAIN\user01 found 0 errors and
repaired 0 errors.
Getpage: bstat=0x9, sstat=0, cache
pageno is/should be: objid is/should be:
Getpage: bstat=0x9, sstat=0, cache
Getpage: bstat=0x9, sstat=0, cache
pageno is/should be: objid is/should be:
Getpage: bstat=0x9, sstat=0, cache
(1:1772011)/(1:1772011) 0/930818378
pageno is/should be: objid is/should be:
pageno is/should be: objid is/should be:
(1:1772011)/(1:1772011) 0/930818378
(1:1772011)/(1:1772011) 0/930818378
(1:1772011)/(1:1772011) 0/930818378
... IAM indicates that page is allocated to this object
... IAM indicates that page is allocated to this object
... IAM indicates that page is allocated to this object
... IAM indicates that page is allocated to this object
Error: 605, Severity: 21, State: 1
Attempt to fetch logical page (1:1772011) in database 'ProductionDB' belongs
to object '0', not to object 'Table1'..
The server is a failover-clustered server with SQL 2000 and Windows 2000.
Before this problem they had a problematic HD that they replaced. The
problem seemed to be gone after the replacement but resurfaced a little
later.
I started to think that the problem was related to the write-cache, so I
checked the write-cache on raid-controller and it was turned off. Wanting to
know if there was a possibility that the write-cache on the HDs was the
problem I got the answer that there is no cache on the disks or at least no
possablity to turn of the disk cache on them?
I also get this message in the beginning of the sqliostress-testlog:
"*** WARNING: Write caching ALLOWED"
I still had a feeling that the problem was hardware related and ran
sqliostress.exe and it came through fine. Thinking that the problem just
appears under load I ran 5 simultanious instances of sqliostress and got a
problem in one of the logs:
Pattern for page 1 is A
ERROR: LSN not found for page 1. Currently at slot 1 page 1 in log
searching for LSN 3
---
ERROR: Did not find expected pattern in file for page 1.
Bytes read = 8192
Potential torn write
---
Sector: 0 LSN: 3 Page: 0
[AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...[sic]
ERROR: Appears to be a premature truncation of file. Pages expected: 32000,
Pages Read: 1, Last Error: 0
Current LSN is 32001
Verifing complete.
Ok, what next? Is the problem the HD-cache or something else in the
raid-system that possibly could create this problem or should I look
somewhere else?Anyone?|||Hi John, your assumptions are the same as the ones I would have
probably jumped to. Lets assume that the DB is inconsistent (perhaps
down to that faulty drive) ....
You say you are running the DBCC checkdb command, however have you
tried actually repairing the db using on of the repair parameters. I
would probably start off with the REPAIR_FAST option (if it is a
production db), but you might have to bite the bullet and use
REPAIR_ALLOW_DATA_LOSS might be an idea to do it with WITH ALL_ERRORMSGS
Br,
Mark Broadbent
mcdba , mcse+i
=============|||Mark Broadbent wrote:
> Hi John, your assumptions are the same as the ones I would have
> probably jumped to. Lets assume that the DB is inconsistent (perhaps
> down to that faulty drive) ....
> You say you are running the DBCC checkdb command, however have you
> tried actually repairing the db using on of the repair parameters. I
> would probably start off with the REPAIR_FAST option (if it is a
> production db), but you might have to bite the bullet and use
> REPAIR_ALLOW_DATA_LOSS might be an idea to do it with WITH
> ALL_ERRORMSGS
and I should add.. make sure you have a current backup to rely upon
before you start this process.
--
Br,
Mark Broadbent
mcdba , mcse+i
=============|||Thanks for your answer Mark!
Yes, I've tried running dbcc repair and it showed no problems.
After these error is reported I always also run a dbcc checkdb which
doesen't indicate any problems.
The problem is that the nightly reindex job is failing about half the times
with this error (it runs 3 times a week).
I started to belive that there was an "erranous errormessage" and that the
data wasn't damaged becaus the checks and repairs never reported any
problems.
However two weeks ago the same problem appeared and this time there was some
damage to one of the tables...after a repair with dataloss the problem was
fixed though (luckily it wasn't any table that contained uniqe data this
time...).
I also have tried to create a new empty database and transfer the data
there, in the case that the disk problems that has been fixed had made some
sort of "invisible" damage to the database that the checks couldn't fix. It
didn't fix the problem however.
The breakthrough as I see it is that I think that I've managed to reproduced
the problem by running sqliostress in five parallel instances and got one a
possible torn page in one of the logs...that should indicate a problem in
the disk-subsystem...right?
"Mark Broadbent" <no-spam-please_mark.broadbent@.virgin.net> wrote in message
news:#kMJelDqDHA.644@.TK2MSFTNGP11.phx.gbl...
> Hi John, your assumptions are the same as the ones I would have
> probably jumped to. Lets assume that the DB is inconsistent (perhaps
> down to that faulty drive) ....
> You say you are running the DBCC checkdb command, however have you
> tried actually repairing the db using on of the repair parameters. I
> would probably start off with the REPAIR_FAST option (if it is a
> production db), but you might have to bite the bullet and use
> REPAIR_ALLOW_DATA_LOSS might be an idea to do it with WITH ALL_ERRORMSGS
>
> --
> Br,
> Mark Broadbent
> mcdba , mcse+i
> =============|||John Horn wrote:
> Thanks for your answer Mark!
> Yes, I've tried running dbcc repair and it showed no problems.
> After these error is reported I always also run a dbcc checkdb which
> doesen't indicate any problems.
> The problem is that the nightly reindex job is failing about half the
> times with this error (it runs 3 times a week).
> I started to belive that there was an "erranous errormessage" and
> that the data wasn't damaged becaus the checks and repairs never
> reported any problems.
> However two weeks ago the same problem appeared and this time there
> was some damage to one of the tables...after a repair with dataloss
> the problem was fixed though (luckily it wasn't any table that
> contained uniqe data this time...).
> I also have tried to create a new empty database and transfer the data
> there, in the case that the disk problems that has been fixed had
> made some sort of "invisible" damage to the database that the checks
> couldn't fix. It didn't fix the problem however.
> The breakthrough as I see it is that I think that I've managed to
> reproduced the problem by running sqliostress in five parallel
> instances and got one a possible torn page in one of the logs...that
> should indicate a problem in the disk-subsystem...right?
>
>
> "Mark Broadbent" <no-spam-please_mark.broadbent@.virgin.net> wrote in
> message news:#kMJelDqDHA.644@.TK2MSFTNGP11.phx.gbl...
> > Hi John, your assumptions are the same as the ones I would have
> > probably jumped to. Lets assume that the DB is inconsistent (perhaps
> > down to that faulty drive) ....
> > You say you are running the DBCC checkdb command, however have you
> > tried actually repairing the db using on of the repair parameters. I
> > would probably start off with the REPAIR_FAST option (if it is a
> > production db), but you might have to bite the bullet and use
> > REPAIR_ALLOW_DATA_LOSS might be an idea to do it with WITH
> > ALL_ERRORMSGS
> >
> >
> > --
> >
> > Br,
> > Mark Broadbent
> > mcdba , mcse+i
> > =============
Yes I certainly think you are looking in the right place. I would
expect it to be either the disks or controller -the former you would
expect to fail in array eventually if there was a problem.
I once had a problem with a server (non sql) where every now and again
a disk would fail OR the server would crash. Every time we would chkdsk
/f it and get the thing going. All hardware looked fine (compaq insight
didnt report problems). We software monitored it for yonks -fine...
again a crash would happen. Failed Disks were replaced left/right and
centre ....then eventually we replaced the external disk array and
the raid controller, and to be honest I cant remember which one
resolved the problem, -but the problem WAS resolved. Your best bet is
to swap out the controller first and then take it from there.
Good Luck with troubleshooting!
--
Br,
Mark Broadbent
mcdba , mcse+i
=============

No comments:

Post a Comment