Saturday, January 22, 2005

Computing. Or is that biology?

The structure of an RNA molecule can be described by a pair (S, P) where
S = s1 s2 ... sn and P = p1 p2 ... pn for some positive integer n ? 450.
Each element si of S is one of the letters A, C, G, U.
Each element pi of P is an asterisk, a left parenthesis, or a right parenthesis.
Furthermore, the sequence P is a sequence of balanced parentheses. This means, after ignoring the asterisks (if any), each left parenthesis can be matched by a unique succeeding right parenthesis and each right parentheses can be matched by a unique preceding left parenthesis.
For example, below shows two RNA structures: A1 = (S1, P1) (on the left) and A2 = (S2, P2) (on the right).
ACCCGAACUU AAUAUCCCGAAU ((**)*(*)) *(**)(**)*()
Given two RNA structures A = (S,P) and A' = (S',P'), A' is called a substructure of A if there exist integer i ? j such that S' = si ... sj and P' = pi ... pj. (Note that the parentheses of P' should be balanced since A' = (S',P') is an RNA structure.) For example, A3 = (S3, P3) which is
CCCG (**)
is a substructure of A1, but A4 = (S4, P4) which is
ACCCGAACU ((**)*(*)
is not a substructure of A1 because P4 is not a sequence of balanced parentheses. For two RNA structures X and Y, if Z is a substructure of both of them, Z is called a common substructure of X and Y, A longest common substructure is a common substructure with maximum length among all common substructures. For example, for the above A1 and A2, their longest common substructure has length 5 and is
CCCGA (**)*
Note that the following example is not a common substructure of A1 and A2, because the parentheses are not balanced.
CCCGAA (**)*(
In this task, given two RNA structures of length at most 450, you are required to report the length of a longest common substructure. Input The input file RNA.IN consists of 4 lines. The first two lines are the RNA sequence (first line) and the parenthesis sequence (second line) for the first RNA. The last two lines are the RNA sequence (third line) and the parenthesis sequence (fourth line) for the second RNA. Remember the length of each sequence is at most 450. Output The output file RNA.OUT contains a single integer which represents the length of a longest common substructure. Input/Output Example For the above example, the input and output files contain Sample Input
ACCCGAACUU
((**)*(*))
AAUAUCCCGAAU
*(**)(**)*()
Sample Output
5

0 Comments:

Post a Comment

<< Home