Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Job Website Crawling #7

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

yunju2
Copy link
Collaborator

@yunju2 yunju2 commented Feb 2, 2024

작업 내용

  • 웹 크롤링 python 모듈 생성
    • Recruitment Class 추가
    • 채용 정보 크롤링 기능 추가
    • JSON 형태로 가공된 채용 정보 Java 백엔드 서버로 전송
  • 채용 정보 수신 및 처리를 위한 REST API 엔드포인트 구현

@yunju2 yunju2 self-assigned this Feb 2, 2024
Copy link
Collaborator

@f-lab-lyan f-lab-lyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

크롤러를 스프링서버에서 Scheduler를 써서 돌리려면 파이썬 파일을 jar안에 넣어서 불러줘야 하는데, 이 부분은 어떻게 하실 예정인가요?


private final PyhonRunner pyhonRunner;

@Scheduled(cron = "0 0 12 * * ?")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

좀 더 서비스가 한가한 시간에 하면 좋지 않을까요? :-)

}

@Test
void runMyPythonRunner() throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

좀 더 제대로 테스트를 하고 싶으면, 어떻게 할 수 있을까요?


private Process runPythonCrawling() throws IOException, InterruptedException {
ProcessBuilder pythonBuilder = new ProcessBuilder(
"python3","../crawling/src/main/python/crawl_job_website.py"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jar로 묶으면, "../crawling/src/main/python/crawl_job_website.py" 경로가 없기 때문에 실패할 텐데요. 이걸 어떻게 해결할 수 있을지 생각해보시면 좋을 것 같습니다.

힌트는 gradle에서 빌드를 할 때, python의 빌드결과를 먼저 app의 resource에 넣어주면, jar에 들어가게 되고, https://www.baeldung.com/spring-classpath-file-access#resource-utils 등을 사용해서 python 파일을 읽으면 됩니다.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants