Faculty of Engineering, Environment and Computing 7071CEM Assignment Brief Jan-May 2021
Faculty of Engineering, Environment and Computing 7071CEM Assignment Brief Jan-May 2021
Faculty of Engineering, Environment and Computing 7071CEM Assignment Brief Jan-May 2021
Computing 7071CEM
Task:
Develop a vertical search engine similar to Google Scholar that only retrieves papers/books
published by a member of Coventry University. That is, at least one of the co-authors must be
from CU. To that end, you crawl Google Scholar profiles of academic staff at CU and index their
papers in their profiles. The seed page for your crawler, i.e. the first page to crawl, is the Google
Scholar page for Coventry University:
https://scholar.google.co.uk/citations?view_op=view_org&hl=en&org=9117984065169182779
Your system crawls this page and the links provided for each member of staff there to access
their Google Scholar profiles. Then for each profile, it goes through the publications and
construct the inverted index using the information about those publications. Because of low
rate of changes to this information, your crawler may be scheduled to look for new
information, say, once per week, but it should ideally be able to do so automatically, as a
scheduled task.
From the user’s point of view, your system has an interface that is similar to the Google Scholar
main page, where the user can type in their queries/keywords about the resources they want
to find. Then, your system will display the results, sorted by relevance, in a similar way Google
Scholar does. However, only publications with at least one co-author from CU are retrieved.
You may further specialise your search engine to a specific field, e.g., computer science,
mechanical engineering, bioinformatics or whatever you would like.
You can use any general purpose programming language of your choice although Python is
recommended because of its rich library and sample codes developed in the labs.
To earn 70 or more, the system is expected to be a working search engine with reasonable
accuracy and speed. This ensures that the system contains fully working crawler and query
processor components. In addition, it must have at least one, and preferably both, of the other
two components, i.e. the inverted index and the text classification components, in fully working
status.
Please note that to show that your system meets each of the above-mentioned requirements,
your report must provide sufficient evidence including clear description, complete source code,
and complete screenshots where applicable.
Notes:
1. You are expected to use the Coventry University APA style for referencing. For support and
advice on this students can contact Centre for Academic Writing (CAW).
2. Please notify your registry course support team and module leader for disability support.
3. Any student requiring an extension or deferral should follow the university process as outlined
here.
4. The University cannot take responsibility for any coursework lost or corrupted on disks, laptops
or personal computer. Students should therefore regularly back-up any work and are advised to
save it on the University system.
5. If there are technical or performance issues that prevent students submitting coursework
through the online coursework submission system on the day of a coursework deadline, an
appropriate extension to the coursework submission deadline will be agreed. This extension will
normally be 24 hours or the next working day if the deadline falls on a Friday or over the
weekend period. This will be communicated via your Module Leader.
6. You are encouraged to check the originality of your work by using the draft Turnitin links on
Aula.
7. Collusion between students (where sections of your work are similar to the work submitted by
other students in this or previous module cohorts) is taken extremely seriously and will be
reported to the academic conduct panel. This applies to both courseworks and exam answers.
8. A marked difference between your writing style, knowledge and skill level demonstrated in class
discussion, any test conditions and that demonstrated in a coursework assignment may result in
you having to undertake a Viva Voce in order to prove the coursework assignment is entirely
your own work.
9. If you make use of the services of a proof reader in your work you must keep your original
version and make it available as a demonstration of your written efforts.
10. You must not submit work for assessment that you have already submitted (partially or in full),
either for your current course or for another qualification of this university, with the exception
of resits, where for the coursework, you maybe asked to rework and improve a previous
attempt. This requirement will be specifically detailed in your assignment brief or specific
course or module information. Where earlier work by you is citable, i.e. it has already been
published/submitted, you must reference it clearly. Identical pieces of work submitted
concurrently may also be considered to be self-plagiarism.