Discussion:
ARIES questions
(too old to reply)
chris
2011-04-09 01:31:29 UTC
Permalink
After going over the slides on ARIES (and reading wiki's article on it),
there are a few things I don't understand:

1) Why is the analysis pass necessary? Couldn't we just get redoLSN from
the last checkpoint and start the REDO pass right away, figuring out the
losers while doing the REDO pass?

2) Why do we need to figure out the most recent LSN for each loser entry
during the analysis pass? Why is this information useful?

3) Why does the checkpoint save a list of active transactions? Since all
actions from redoLSN to the tail of the log must be redone (unless
already present on disk), why do we need to differentiate between active
and done transactions? Why does the checkpoint also save the LSN of the
most recent log record for each active transaction?

It seems to me that all that is necessary is a list of all dirty buffer
pages and their recLSNs. Then the system can just do a REDO pass from
the redoLSN, in which it figures out the loser transactions whose
changes must be later compensated; then the UNDO pass to compensate
these changes. Thus, only two passes and less information checkpointed
are needed. Am I missing something?

3) Why does each log entry record a reference to the transaction's
previous entry? (and why does a CLR include a reference to the previous
entry?) Are these just for convenience? Couldn't this information be
determined while scanning through the log?

Thanks,
Chris
Ken Salem
2011-04-10 00:31:05 UTC
Permalink
1) Why is the analysis pass necessary? Couldn't we just get redoLSN from the last checkpoint and start the REDO pass right away, figuring out the
losers while doing the REDO pass?
Yes, it is possible to combine the analysis and redo passes.
2) Why do we need to figure out the most recent LSN for each loser entry during the analysis pass? Why is this information useful?
Because that is the first log record that will need to be looked at when undoing the loser transaction. And that log record will
point to the next (earlier) log record for that transaction, and so on.
3) Why does the checkpoint save a list of active transactions? Since all actions from redoLSN to the tail of the log must be redone (unless already
present on disk), why do we need to differentiate between active and done transactions? Why does the checkpoint also save the LSN of the most recent
log record for each active transaction?
You need the list of active transactions because some of them might need to be undone - if it happens that they have no log
entries after the checkpoint you would otherwise not be aware of them.

The LSN's are saved because you need them (as in question #2) if the transaction turns out to be a loser.
It seems to me that all that is necessary is a list of all dirty buffer pages and their recLSNs. Then the system can just do a REDO pass from the
redoLSN, in which it figures out the loser transactions whose changes must be later compensated; then the UNDO pass to compensate these changes. Thus,
only two passes and less information checkpointed are needed. Am I missing something?
3) Why does each log entry record a reference to the transaction's previous entry? (and why does a CLR include a reference to the previous entry?) Are
these just for convenience? Couldn't this information be determined while scanning through the log?
That info identifies which log records need to be viewed to undo a transaction. Yes, you could also learn
this by scanning backwards through the log and retrieving and looking at every log entry.
But by having this information, you can avoid retrieving and looking at log blocks that don't contain any
information that is needed for undo - which, in practice, can be most of the log blocks.

-KMS

Continue reading on narkive:
Loading...