Educational guide
IDENTIFYING DATA 2023_24
Subject MACHINE LEARNING APPLIED TO CYBERSECURITY AND CYBERCRIME Code 01747021
Study programme
1747 - Máster Universitario de Investigación en Ciberseguridad
Descriptors Credit. Type Year Period
3 Optional First Second
Language
Ingles
Prerequisites
Department ING.ELECTR.DE SIST. Y AUTOMATI
Coordinador
FIDALGO FERNANDEZ , EDUARDO
E-mail efidf@unileon.es
vgonc@unileon.es
Lecturers
GONZÁLEZ CASTRO , VICTOR
FIDALGO FERNANDEZ , EDUARDO
Web http://agora.unileon.es/
General description In this course, we study some applications of Machine Learning, Computer Vision and Natural Language Processing in the field of Cybersecurity and to fight against cybercrimes. During the course, we will explain the main methods and concepts related to several classifiers and image and text descriptors and how they can be applied to the fight against cybercrimes (for example, recognising people, classifying text in an email as spam or not spam, or detecting traffic generated by a botnet). These applications will be implemented in the lab using Python.
Tribunales de Revisión
Tribunal titular
Cargo Departamento Profesor
Presidente ING.ELECTR.DE SIST. Y AUTOMATI BLAZQUEZ QUINTANA , LUIS FELIPE
Secretario ING.ELECTR.DE SIST. Y AUTOMATI ALAIZ MORETON , HECTOR
Vocal ING.ELECTR.DE SIST. Y AUTOMATI FUERTES MARTINEZ , JUAN JOSE
Tribunal suplente
Cargo Departamento Profesor
Presidente ING.ELECTR.DE SIST. Y AUTOMATI PRADA MEDRANO , MIGUEL ANGEL
Secretario ING.ELECTR.DE SIST. Y AUTOMATI FOCES MORAN , JOSE MARIA
Vocal ING.ELECTR.DE SIST. Y AUTOMATI GARCIA RODRIGUEZ , ISAIAS

Competencies
Type A Code Competences Specific
  A18801
  A18804
  A18812
Type B Code Competences Transversal
  B5729
  B5730
  B5731
  B5732
  B5740
Type C Code Competences Nuclear

Learning aims
Competences
Programming and analysing tasks in different programming languages in the area of computer and communications security. A18804
A18812
B5729
B5732
B5740
Applying the biometric properties in the area of computer and communications security A18812
B5729
B5730
B5731
B5732
Know the basic concepts of social engineering A18812
B5731
Knowing the scientific method. Aptitude for gathering of information and relevant references and writing of scientific papers. Organization and presentation of contributions to scientific conferences. A18801
A18812
B5731

Contents
Topic Sub-topic
Block I: IMAGE CLASSIFICATION Lesson 1. IMAGE CLASSIFICATION FOR THE FIGHT AGAINST CYBERCRIME.
Concepts. Preprocessing. Image descriptors. Deep Learning for image classification. Applications of image classification for the fight against cybercrime

Lesson 2. PEOPLE RECOGNITION FOR THE FIGHT AGAINST CYBERCRIME.
Face detection and recognition. People detection. Perceptual hashing
Block II. TEXT CLASSIFICATION Lesson 3. TEXT CLASSIFICATION FOR THE FIGHT AGAINST CYBERCRIME.
Texct descriptors. Word embeddings. Deep Learning models for Natural Language Processing. Applications

Lesson 4. DETECTION AND CLASSIFICATION OF SPAM.
Application of text classification models for Spam detection

Lesson 5. AUTOMATIC DETECTION OF PHISHING
Application of text classification models for phishing detection
Block III. OTHER APPLICATIONS Lesson 6. DETECTION OF BOTNET NETWORKS.
Basic concepts of TCP/IP model. Network traffic description. Application of classification models for botnet detection

Planning
Methodologies  ::  Tests
  Class hours Hours outside the classroom Total hours
Practicals using information and communication technologies (ICTs) in computer rooms 18 19 37
 
Problem solving, classroom exercises 2 0 2
Case study 0 6 6
Presentations / expositions 4 16 20
 
Lecture 8 0 8
 
Mixed tests 2 0 2
 
(*)The information in the planning table is for guidance only and does not take into account the heterogeneity of the students.

Methodologies
Methodologies   ::  
  Description
Practicals using information and communication technologies (ICTs) in computer rooms The practices will be carried out following the scripts left in Agora. They comprise guided labs based on programming, used to evaluate and learn more in-depth methods and techniques discussed during the lectures. Doubts will be solved by the teacher in the classroom, by email, in scheduled face-to-face meetings or through synchronous remote sessions. The solution will be delivered to the students through a task enabled for it in Agora.
Problem solving, classroom exercises During some lectures, some exercises might be done, if required, to understand basic concepts from Machine Learning, Computer Vision or Natural Language Processing.
Case study The student will select one of the topics from the laboratory sessions. Then, the student will solve the problem using other strategies (encoding/classifier) with certain limitations indicated by the professor. The student will study the newer strategy through the proposed research paper.
Presentations / expositions The students will prepare a presentation they will give in front of all the students and the instructor, containing the new solution chosen for the problem solved.
Lecture Theoretical sessions in the classroom using slides that will be recorded in videos. Presentations or documents corresponding to the materials of each lesson will be left in Agora. Some lessons may be accompanied by videos related to the concepts presented, some recorded by the teachers and others from internet resources that the teachers consider especially appropriate. Some lessons might be accompanied by a questionnaire with questions, which can be both theoretical and practical, whose deliveries will be evaluated. Datacamp platform courses, or similar ones, could be used to reinforce some of the lessons taught, with some courses being optional and others being mandatory.

Personalized attention
 
Presentations / expositions
Problem solving, classroom exercises
Case study
Practicals using information and communication technologies (ICTs) in computer rooms
Lecture
Description
Students can ask for personalised attention through email at any point in the course. Such attention will be provided via videoconference for remote students if necessary.

Assessment
  Description Qualification
Presentations / expositions It will be evaluated several parts of the subject project that will be presented to the students, including the final presentation to be given to the instructor. 10
Practicals using information and communication technologies (ICTs) in computer rooms Each laboratory will contain deliverables that will be assessed, and a grade will be set to each laboratory session. 50
Lecture A small questionnaire will be done to evaluate some concepts given in the lectures. 15
Mixed tests After the questionnaire, the students will do the second part of the evaluation, which will consist of reproducing one laboratory session using the provided template but changing some parts (encoding, classifier, preprocessing, data...) 25
Others Some voluntary activities can be offered to the students. These will be evaluated to get additional points to the grade of the subject
 
Other comments and second call
  • Delayed deliveries will get a pensalisation in the grade.
  • To pass the course in continuous assessment (i.e. the first call) it will be necessary to get at least 5 out of 10 points.
  • Students can compensate grades between parts, as long as the minimum grade on a part is 3 out of 10.
  • Students who do not pass the course in continuous assessment (i.e. the first call) will be able to submit labs which were not submitted or which were not passed during the first call.
  • To pass the course in second call, it will be applied the same instructions than in the first call.

    For students of the ONLINE modality of the master's degree:

    In reference to the supervision programs used (SMOWL) during the exams of the official calls of the distance modality, browsing in pages external to that of the exam itself, unless expressly indicated, may result in failure in said activity, at the discretion of the faculty.

    In the event that problems arise in student identification, teachers may require additional assessment activities via videoconference. The conditions of these tests may be conditioned by connectivity, lighting, etc. It is the responsibility of the student to follow the instructions received in this regard, as well as to protect their privacy, performing the exam in an appropriate environment (isolated, with good connection, lighting ). , ....). Recommendations for students in the use of SMOWL can be found at the following link: http://bit.ly/3ZrtxVs


Sources of information
Access to Recommended Bibliography in the Catalog ULE

Basic
  • Cunha. (2022). Deep learning with Python (2a ed) - François Chollet - Manning, outubro 2021, 504 pp. Interações: Sociedade e as novas modernidades, 42, 113-115. https://doi.org/10.31211/interacoes.n42.2022.r1
  • Geron. (2017). Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems  (1st ed.). O’Reilly.
  • Bird. (2009). Natural language processing with Python (Klein & E. Loper, Eds.; 1st edition). O’Reilly.
Complementary

  • Research papers whose references will be shared and updated on each course. 
  • Lewis Tunstall. (s. f.). Natural Language Processing with Transformers. https://www.oreilly.com/library/view/natural-language-processing/9781098136789/
  • Howard. (2020). Deep learning for Coders with fastai and PyTorch?: AI applications without a PhD  (Gugger, Ed.). O’Reilly Media Inc.
  • Huang, Hussain, A., Wang, Q.-F., & Zhang, R. (2019). Deep Learning for Natural Language Processing. En Deep Learning: Fundamentals, Theory and Applications (Vol. 2, pp. 111-138). Springer International Publishing AG. https://doi.org/10.1007/978-3-030-06073-2_5
  • Goodfellow. (2016). Deep learning  (Bengio & A. Courville, Eds.). The MIT Press.



Recommendations


Subjects that it is recommended to have taken before
FOUNDATIONS OF MACHINE LEARNING AND APPLICATIONS IN CYBERSECURITY / 01747013
DEEP LEARNING FOUNDATIONS / 01747015
 
Other comments
Knowledge of Python programming language.