In this task, modified from previous years, systems developed must develop one or more components to classify tweets that contain professions and occupations in Spanish tweets and detect the text span of the reported professions and occupations. This task presents multiple challenges. Firstly, the classification task needs to take into account class imbalance where only around 24% of the tweets contain profession or occupation mentions. Secondly, span detection will require advanced named entity recognition approaches.
Participants will be provided with a labeled training set containing tweet texts as well as professions and occupations annotations. They can choose participating in one or both subtasks.
For more information, please visit https://temu.bsc.es/smm4h-spanish/
- Training data: 8,000 tweets
- Test data: 2,000 tweets
Register your team here : https://forms.gle/1qs3rdNLDxAph88n6
Link to Codalab : Available Feb 1 2021
Subtask 7a : Tweet classification
Given a tweet, participants of this subtask will be required to submit only the binary annotations Profession/noProfession (1/0)
Evaluation Metric : Submissions will be ranked by Precision, Recall and F1-score for the Profession class (1)
Subtask 7b : Profession/occupation span detection
Participants of the Profession span detection subtask will be required to the spans of expressed professions and occupations.
Evaluation Metric : Submissions will be ranked by Precision, Recall and F1-score for the Profession class where the spans overlap entirely.
Contact information: Antonio Miranda (firstname.lastname@example.org)
Frequently asked questions (FAQs):
Do I have to participate in all the subtasks?
No. You may choose to participate in one or both subtasks. During evaluation, you will be allowed to make two submissions for each subtask.